Linux-CIFS Archive on lore.kernel.org
 help / color / Atom feed
* list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
@ 2019-10-16 19:27 David Wysochanski
  2019-10-17  0:17 ` Ronnie Sahlberg
  0 siblings, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-16 19:27 UTC (permalink / raw)
  To: linux-cifs; +Cc: Sorenson, Frank

I think this has been there for a long time, since we first saw this
on a 4.18.0 based kernel but I just noticed the bug recently.
I just retested on 5.4-rc3 and it's still there.  Easy to repro with a
fairly simple but invasive server restart test - takes only maybe a
couple minutes on my VM.


From Frank Sorenson:

mount off a samba server:

    # mount //vm1/share /mnt/vm1
-overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds


on the client, start 10 'find' loops:

    # export test_path=/mnt/vm1
    # do_find() { while true ; do find $test_path >/dev/null 2>&1 ; done }

    # for i in {1..10} ; do do_find & done


optional:  also start something to monitor for when the hang occurs:

    # while true ; do count=$(grep smb2_reconnect /proc/*/stack -A3 |
grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ; sleep 2
; done



On the samba server:  restart smb.service (loop it in case it requires
more than one restart):

    # while true ; do echo "$(date): restarting" ; systemctl restart
smb.service ; sleep 5 ; done | tee /var/tmp/smb_restart_log.out




[  430.454897] list_del corruption. prev->next should be
ffff98d3a8f316c0, but was 2e885cb266355469
[  430.464668] ------------[ cut here ]------------
[  430.466569] kernel BUG at lib/list_debug.c:51!
[  430.468476] invalid opcode: 0000 [#1] SMP PTI
[  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted
5.4.0-rc3+ #19
[  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
0f 0b
[  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
[  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX: 0000000000000000
[  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI: ffff98d3b7a17908
[  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09: 0000000000000285
[  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12: ffff98d3aabb89c0
[  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15: ffff98d3b24c4480
[  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
knlGS:0000000000000000
[  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4: 00000000000406f0
[  430.510426] Call Trace:
[  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
[  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
[  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
[  430.517452]  ? try_to_wake_up+0x212/0x650
[  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
[  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
[  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[  430.525116]  kthread+0xfb/0x130
[  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
[  430.528514]  ? kthread_park+0x90/0x90
[  430.530019]  ret_from_fork+0x35/0x40
[  430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul joydev
virtio_balloon ghash_clmulni_intel i2c_piix4 nfsd nfs_acl lockd
auth_rpcgss grace sunrpc xfs libcrc32c virtio_net net_failover
crc32c_intel virtio_console serio_raw virtio_blk ata_generic failover
pata_acpi qemu_fw_cfg
[  430.552782] ---[ end trace c91d4468f8689482 ]---
[  430.554948] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[  430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
0f 0b
[  430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
[  430.567181] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX: 0000000000000000
[  430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI: ffff98d3b7a17908
[  430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09: 0000000000000285
[  430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12: ffff98d3aabb89c0
[  430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15: ffff98d3b24c4480
[  430.581624] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
knlGS:0000000000000000
[  430.584881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4: 00000000000406f0


crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
0xffffffffc062dc26 <cifs_reconnect+0x226>:      movb
$0x0,0xbb36b(%rip)        # 0xffffffffc06e8f98 <GlobalMid_Lock>
/mnt/build/kernel/fs/cifs/connect.c: 572
0xffffffffc062dc2d <cifs_reconnect+0x22d>:      mov    %r12,%rdi
0xffffffffc062dc30 <cifs_reconnect+0x230>:      callq
0xffffffff8d9d5a20 <mutex_unlock>
/mnt/build/kernel/fs/cifs/connect.c: 574
0xffffffffc062dc35 <cifs_reconnect+0x235>:      testb
$0x1,0xbb300(%rip)        # 0xffffffffc06e8f3c <cifsFYI>
0xffffffffc062dc3c <cifs_reconnect+0x23c>:      je
0xffffffffc062dc43 <cifs_reconnect+0x243>
/mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
0xffffffffc062dc3e <cifs_reconnect+0x23e>:      data32 data32 data32
xchg %ax,%ax
/mnt/build/kernel/fs/cifs/connect.c: 575
0xffffffffc062dc43 <cifs_reconnect+0x243>:      mov    0x8(%rsp),%rbp
0xffffffffc062dc48 <cifs_reconnect+0x248>:      mov    0x0(%rbp),%r14
0xffffffffc062dc4c <cifs_reconnect+0x24c>:      cmp    %r13,%rbp
0xffffffffc062dc4f <cifs_reconnect+0x24f>:      jne
0xffffffffc062dc56 <cifs_reconnect+0x256>
0xffffffffc062dc51 <cifs_reconnect+0x251>:      jmp
0xffffffffc062dc90 <cifs_reconnect+0x290>
0xffffffffc062dc53 <cifs_reconnect+0x253>:      mov    %rax,%r14
/mnt/build/kernel/./include/linux/list.h: 190
0xffffffffc062dc56 <cifs_reconnect+0x256>:      mov    %rbp,%rdi
0xffffffffc062dc59 <cifs_reconnect+0x259>:      callq
0xffffffff8d4e6b00 <__list_del_entry_valid>
0xffffffffc062dc5e <cifs_reconnect+0x25e>:      test   %al,%al


fs/cifs/connect.c
566         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
567         if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
568             mid_entry->mid_state = MID_RETRY_NEEDED;
569         list_move(&mid_entry->qhead, &retry_list);
570     }
571     spin_unlock(&GlobalMid_Lock);
572     mutex_unlock(&server->srv_mutex);
573
574     cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
577         list_del_init(&mid_entry->qhead);
578         mid_entry->callback(mid_entry);
579     }
580
581     if (cifs_rdma_enabled(server)) {
582         mutex_lock(&server->srv_mutex);
583         smbd_destroy(server);
584         mutex_unlock(&server->srv_mutex);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-16 19:27 list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 David Wysochanski
@ 2019-10-17  0:17 ` Ronnie Sahlberg
  2019-10-17  9:05   ` Ronnie Sahlberg
  0 siblings, 1 reply; 31+ messages in thread
From: Ronnie Sahlberg @ 2019-10-17  0:17 UTC (permalink / raw)
  To: David Wysochanski; +Cc: linux-cifs, Frank Sorenson

I can not reproduce this :-(

I have run it for a few hours, restarting samba in a loop with up to 30 threads.


Can you check 
1, If this only reproduce for you for the root of the share or it also reproduces for a subdirectory?
2, Does it reproduce also if you use "nohandlecache" mount option?
   This disables the use of cached open of the root handle, i.e. open_shroot()
3, When this happens, can you check the content of the mid entry and what these fields are:
   mid->mid_flags, mid->handle (this is a function pointer, what does it point to)
   mid->command.   Maybe print the whole structure.

regards
ronnie sahlberg




----- Original Message -----
> From: "David Wysochanski" <dwysocha@redhat.com>
> To: "linux-cifs" <linux-cifs@vger.kernel.org>
> Cc: "Frank Sorenson" <sorenson@redhat.com>
> Sent: Thursday, 17 October, 2019 5:27:02 AM
> Subject: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> 
> I think this has been there for a long time, since we first saw this
> on a 4.18.0 based kernel but I just noticed the bug recently.
> I just retested on 5.4-rc3 and it's still there.  Easy to repro with a
> fairly simple but invasive server restart test - takes only maybe a
> couple minutes on my VM.
> 
> 
> From Frank Sorenson:
> 
> mount off a samba server:
> 
>     # mount //vm1/share /mnt/vm1
> -overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds
> 
> 
> on the client, start 10 'find' loops:
> 
>     # export test_path=/mnt/vm1
>     # do_find() { while true ; do find $test_path >/dev/null 2>&1 ; done }
> 
>     # for i in {1..10} ; do do_find & done
> 
> 
> optional:  also start something to monitor for when the hang occurs:
> 
>     # while true ; do count=$(grep smb2_reconnect /proc/*/stack -A3 |
> grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
> reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ; sleep 2
> ; done
> 
> 
> 
> On the samba server:  restart smb.service (loop it in case it requires
> more than one restart):
> 
>     # while true ; do echo "$(date): restarting" ; systemctl restart
> smb.service ; sleep 5 ; done | tee /var/tmp/smb_restart_log.out
> 
> 
> 
> 
> [  430.454897] list_del corruption. prev->next should be
> ffff98d3a8f316c0, but was 2e885cb266355469
> [  430.464668] ------------[ cut here ]------------
> [  430.466569] kernel BUG at lib/list_debug.c:51!
> [  430.468476] invalid opcode: 0000 [#1] SMP PTI
> [  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted
> 5.4.0-rc3+ #19
> [  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> 0f 0b
> [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> [  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> 0000000000000000
> [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> ffff98d3b7a17908
> [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> 0000000000000285
> [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> ffff98d3aabb89c0
> [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> ffff98d3b24c4480
> [  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> knlGS:0000000000000000
> [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> 00000000000406f0
> [  430.510426] Call Trace:
> [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
> [  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
> [  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
> [  430.517452]  ? try_to_wake_up+0x212/0x650
> [  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
> [  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
> [  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
> [  430.525116]  kthread+0xfb/0x130
> [  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
> [  430.528514]  ? kthread_park+0x90/0x90
> [  430.530019]  ret_from_fork+0x35/0x40
> [  430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
> ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
> ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> iptable_mangle iptable_raw iptable_security nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul joydev
> virtio_balloon ghash_clmulni_intel i2c_piix4 nfsd nfs_acl lockd
> auth_rpcgss grace sunrpc xfs libcrc32c virtio_net net_failover
> crc32c_intel virtio_console serio_raw virtio_blk ata_generic failover
> pata_acpi qemu_fw_cfg
> [  430.552782] ---[ end trace c91d4468f8689482 ]---
> [  430.554948] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> [  430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> 0f 0b
> [  430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> [  430.567181] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> 0000000000000000
> [  430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> ffff98d3b7a17908
> [  430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> 0000000000000285
> [  430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> ffff98d3aabb89c0
> [  430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> ffff98d3b24c4480
> [  430.581624] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> knlGS:0000000000000000
> [  430.584881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> 00000000000406f0
> 
> 
> crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
> 0xffffffffc062dc26 <cifs_reconnect+0x226>:      movb
> $0x0,0xbb36b(%rip)        # 0xffffffffc06e8f98 <GlobalMid_Lock>
> /mnt/build/kernel/fs/cifs/connect.c: 572
> 0xffffffffc062dc2d <cifs_reconnect+0x22d>:      mov    %r12,%rdi
> 0xffffffffc062dc30 <cifs_reconnect+0x230>:      callq
> 0xffffffff8d9d5a20 <mutex_unlock>
> /mnt/build/kernel/fs/cifs/connect.c: 574
> 0xffffffffc062dc35 <cifs_reconnect+0x235>:      testb
> $0x1,0xbb300(%rip)        # 0xffffffffc06e8f3c <cifsFYI>
> 0xffffffffc062dc3c <cifs_reconnect+0x23c>:      je
> 0xffffffffc062dc43 <cifs_reconnect+0x243>
> /mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
> 0xffffffffc062dc3e <cifs_reconnect+0x23e>:      data32 data32 data32
> xchg %ax,%ax
> /mnt/build/kernel/fs/cifs/connect.c: 575
> 0xffffffffc062dc43 <cifs_reconnect+0x243>:      mov    0x8(%rsp),%rbp
> 0xffffffffc062dc48 <cifs_reconnect+0x248>:      mov    0x0(%rbp),%r14
> 0xffffffffc062dc4c <cifs_reconnect+0x24c>:      cmp    %r13,%rbp
> 0xffffffffc062dc4f <cifs_reconnect+0x24f>:      jne
> 0xffffffffc062dc56 <cifs_reconnect+0x256>
> 0xffffffffc062dc51 <cifs_reconnect+0x251>:      jmp
> 0xffffffffc062dc90 <cifs_reconnect+0x290>
> 0xffffffffc062dc53 <cifs_reconnect+0x253>:      mov    %rax,%r14
> /mnt/build/kernel/./include/linux/list.h: 190
> 0xffffffffc062dc56 <cifs_reconnect+0x256>:      mov    %rbp,%rdi
> 0xffffffffc062dc59 <cifs_reconnect+0x259>:      callq
> 0xffffffff8d4e6b00 <__list_del_entry_valid>
> 0xffffffffc062dc5e <cifs_reconnect+0x25e>:      test   %al,%al
> 
> 
> fs/cifs/connect.c
> 566         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> 567         if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> 568             mid_entry->mid_state = MID_RETRY_NEEDED;
> 569         list_move(&mid_entry->qhead, &retry_list);
> 570     }
> 571     spin_unlock(&GlobalMid_Lock);
> 572     mutex_unlock(&server->srv_mutex);
> 573
> 574     cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> 577         list_del_init(&mid_entry->qhead);
> 578         mid_entry->callback(mid_entry);
> 579     }
> 580
> 581     if (cifs_rdma_enabled(server)) {
> 582         mutex_lock(&server->srv_mutex);
> 583         smbd_destroy(server);
> 584         mutex_unlock(&server->srv_mutex);
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17  0:17 ` Ronnie Sahlberg
@ 2019-10-17  9:05   ` Ronnie Sahlberg
  2019-10-17 11:42     ` David Wysochanski
  0 siblings, 1 reply; 31+ messages in thread
From: Ronnie Sahlberg @ 2019-10-17  9:05 UTC (permalink / raw)
  To: linux-cifs; +Cc: Frank Sorenson, David Wysochanski



> > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > 577         list_del_init(&mid_entry->qhead);
> > 578         mid_entry->callback(mid_entry);
> > 579     }

This part (and a similar loop during shutting down the demultiplex thread) is the only place
where we add/remove to the ->qhead list without holding the GlobalMid_Lock.

I wonder if it is racing against a different thread also modifying qhead for the same mid,
like cifs_delete_mid() for example.




----- Original Message -----
> From: "Ronnie Sahlberg" <lsahlber@redhat.com>
> To: "David Wysochanski" <dwysocha@redhat.com>
> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> Sent: Thursday, 17 October, 2019 10:17:18 AM
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> 
> I can not reproduce this :-(
> 
> I have run it for a few hours, restarting samba in a loop with up to 30
> threads.
> 
> 
> Can you check
> 1, If this only reproduce for you for the root of the share or it also
> reproduces for a subdirectory?
> 2, Does it reproduce also if you use "nohandlecache" mount option?
>    This disables the use of cached open of the root handle, i.e.
>    open_shroot()
> 3, When this happens, can you check the content of the mid entry and what
> these fields are:
>    mid->mid_flags, mid->handle (this is a function pointer, what does it
>    point to)
>    mid->command.   Maybe print the whole structure.
> 
> regards
> ronnie sahlberg
> 
> 
> 
> 
> ----- Original Message -----
> > From: "David Wysochanski" <dwysocha@redhat.com>
> > To: "linux-cifs" <linux-cifs@vger.kernel.org>
> > Cc: "Frank Sorenson" <sorenson@redhat.com>
> > Sent: Thursday, 17 October, 2019 5:27:02 AM
> > Subject: list_del corruption while iterating retry_list in cifs_reconnect
> > still seen on 5.4-rc3
> > 
> > I think this has been there for a long time, since we first saw this
> > on a 4.18.0 based kernel but I just noticed the bug recently.
> > I just retested on 5.4-rc3 and it's still there.  Easy to repro with a
> > fairly simple but invasive server restart test - takes only maybe a
> > couple minutes on my VM.
> > 
> > 
> > From Frank Sorenson:
> > 
> > mount off a samba server:
> > 
> >     # mount //vm1/share /mnt/vm1
> > -overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds
> > 
> > 
> > on the client, start 10 'find' loops:
> > 
> >     # export test_path=/mnt/vm1
> >     # do_find() { while true ; do find $test_path >/dev/null 2>&1 ; done }
> > 
> >     # for i in {1..10} ; do do_find & done
> > 
> > 
> > optional:  also start something to monitor for when the hang occurs:
> > 
> >     # while true ; do count=$(grep smb2_reconnect /proc/*/stack -A3 |
> > grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
> > reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ; sleep 2
> > ; done
> > 
> > 
> > 
> > On the samba server:  restart smb.service (loop it in case it requires
> > more than one restart):
> > 
> >     # while true ; do echo "$(date): restarting" ; systemctl restart
> > smb.service ; sleep 5 ; done | tee /var/tmp/smb_restart_log.out
> > 
> > 
> > 
> > 
> > [  430.454897] list_del corruption. prev->next should be
> > ffff98d3a8f316c0, but was 2e885cb266355469
> > [  430.464668] ------------[ cut here ]------------
> > [  430.466569] kernel BUG at lib/list_debug.c:51!
> > [  430.468476] invalid opcode: 0000 [#1] SMP PTI
> > [  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted
> > 5.4.0-rc3+ #19
> > [  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > [  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> > 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> > 0f 0b
> > [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> > [  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> > 0000000000000000
> > [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > ffff98d3b7a17908
> > [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > 0000000000000285
> > [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > ffff98d3aabb89c0
> > [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > ffff98d3b24c4480
> > [  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> > knlGS:0000000000000000
> > [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > 00000000000406f0
> > [  430.510426] Call Trace:
> > [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
> > [  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
> > [  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
> > [  430.517452]  ? try_to_wake_up+0x212/0x650
> > [  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
> > [  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
> > [  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
> > [  430.525116]  kthread+0xfb/0x130
> > [  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
> > [  430.528514]  ? kthread_park+0x90/0x90
> > [  430.530019]  ret_from_fork+0x35/0x40
> > [  430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
> > ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
> > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > iptable_mangle iptable_raw iptable_security nf_conntrack
> > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> > ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul joydev
> > virtio_balloon ghash_clmulni_intel i2c_piix4 nfsd nfs_acl lockd
> > auth_rpcgss grace sunrpc xfs libcrc32c virtio_net net_failover
> > crc32c_intel virtio_console serio_raw virtio_blk ata_generic failover
> > pata_acpi qemu_fw_cfg
> > [  430.552782] ---[ end trace c91d4468f8689482 ]---
> > [  430.554948] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > [  430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> > 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> > 0f 0b
> > [  430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> > [  430.567181] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> > 0000000000000000
> > [  430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > ffff98d3b7a17908
> > [  430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > 0000000000000285
> > [  430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > ffff98d3aabb89c0
> > [  430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > ffff98d3b24c4480
> > [  430.581624] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> > knlGS:0000000000000000
> > [  430.584881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > 00000000000406f0
> > 
> > 
> > crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
> > 0xffffffffc062dc26 <cifs_reconnect+0x226>:      movb
> > $0x0,0xbb36b(%rip)        # 0xffffffffc06e8f98 <GlobalMid_Lock>
> > /mnt/build/kernel/fs/cifs/connect.c: 572
> > 0xffffffffc062dc2d <cifs_reconnect+0x22d>:      mov    %r12,%rdi
> > 0xffffffffc062dc30 <cifs_reconnect+0x230>:      callq
> > 0xffffffff8d9d5a20 <mutex_unlock>
> > /mnt/build/kernel/fs/cifs/connect.c: 574
> > 0xffffffffc062dc35 <cifs_reconnect+0x235>:      testb
> > $0x1,0xbb300(%rip)        # 0xffffffffc06e8f3c <cifsFYI>
> > 0xffffffffc062dc3c <cifs_reconnect+0x23c>:      je
> > 0xffffffffc062dc43 <cifs_reconnect+0x243>
> > /mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
> > 0xffffffffc062dc3e <cifs_reconnect+0x23e>:      data32 data32 data32
> > xchg %ax,%ax
> > /mnt/build/kernel/fs/cifs/connect.c: 575
> > 0xffffffffc062dc43 <cifs_reconnect+0x243>:      mov    0x8(%rsp),%rbp
> > 0xffffffffc062dc48 <cifs_reconnect+0x248>:      mov    0x0(%rbp),%r14
> > 0xffffffffc062dc4c <cifs_reconnect+0x24c>:      cmp    %r13,%rbp
> > 0xffffffffc062dc4f <cifs_reconnect+0x24f>:      jne
> > 0xffffffffc062dc56 <cifs_reconnect+0x256>
> > 0xffffffffc062dc51 <cifs_reconnect+0x251>:      jmp
> > 0xffffffffc062dc90 <cifs_reconnect+0x290>
> > 0xffffffffc062dc53 <cifs_reconnect+0x253>:      mov    %rax,%r14
> > /mnt/build/kernel/./include/linux/list.h: 190
> > 0xffffffffc062dc56 <cifs_reconnect+0x256>:      mov    %rbp,%rdi
> > 0xffffffffc062dc59 <cifs_reconnect+0x259>:      callq
> > 0xffffffff8d4e6b00 <__list_del_entry_valid>
> > 0xffffffffc062dc5e <cifs_reconnect+0x25e>:      test   %al,%al
> > 
> > 
> > fs/cifs/connect.c
> > 566         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > 567         if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > 568             mid_entry->mid_state = MID_RETRY_NEEDED;
> > 569         list_move(&mid_entry->qhead, &retry_list);
> > 570     }
> > 571     spin_unlock(&GlobalMid_Lock);
> > 572     mutex_unlock(&server->srv_mutex);
> > 573
> > 574     cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > 577         list_del_init(&mid_entry->qhead);
> > 578         mid_entry->callback(mid_entry);
> > 579     }
> > 580
> > 581     if (cifs_rdma_enabled(server)) {
> > 582         mutex_lock(&server->srv_mutex);
> > 583         smbd_destroy(server);
> > 584         mutex_unlock(&server->srv_mutex);
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17  9:05   ` Ronnie Sahlberg
@ 2019-10-17 11:42     ` David Wysochanski
  2019-10-17 14:08       ` Ronnie Sahlberg
  0 siblings, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-17 11:42 UTC (permalink / raw)
  To: Ronnie Sahlberg; +Cc: linux-cifs, Frank Sorenson

On Thu, Oct 17, 2019 at 5:05 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
>
>
>
> > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > 577         list_del_init(&mid_entry->qhead);
> > > 578         mid_entry->callback(mid_entry);
> > > 579     }
>
> This part (and a similar loop during shutting down the demultiplex thread) is the only place
> where we add/remove to the ->qhead list without holding the GlobalMid_Lock.
>
> I wonder if it is racing against a different thread also modifying qhead for the same mid,
> like cifs_delete_mid() for example.
>

Yes I agree, I was thinking along these same lines of reasoning as I
read the code.  I put the latest on the investigation into the bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1654538#c15

Just before the crash, when we hit the iteration of retry_list
something has gone wrong - the very first mid_entry on retry_list is a
garbage address, when normally it should be the last address from the
previous call to list_move in the loop above it.


 ----- Original Message -----
> > From: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > To: "David Wysochanski" <dwysocha@redhat.com>
> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> > Sent: Thursday, 17 October, 2019 10:17:18 AM
> > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> >
> > I can not reproduce this :-(
> >
> > I have run it for a few hours, restarting samba in a loop with up to 30
> > threads.
> >

I am not sure if it helps but I have 8 CPUs on my VM.
I also have server signing and client signing mandatory.
I can send you the smb.conf offline.


> >
> > Can you check
> > 1, If this only reproduce for you for the root of the share or it also
> > reproduces for a subdirectory?
> > 2, Does it reproduce also if you use "nohandlecache" mount option?
> >    This disables the use of cached open of the root handle, i.e.
> >    open_shroot()
> > 3, When this happens, can you check the content of the mid entry and what
> > these fields are:
> >    mid->mid_flags, mid->handle (this is a function pointer, what does it
> >    point to)
> >    mid->command.   Maybe print the whole structure.
> >

Ok I'll see what I can find out.  So far I am not sure I have
identified what else is touching the mid in between the two loops.

> > regards
> > ronnie sahlberg
> >
> >
> >
> >
> > ----- Original Message -----
> > > From: "David Wysochanski" <dwysocha@redhat.com>
> > > To: "linux-cifs" <linux-cifs@vger.kernel.org>
> > > Cc: "Frank Sorenson" <sorenson@redhat.com>
> > > Sent: Thursday, 17 October, 2019 5:27:02 AM
> > > Subject: list_del corruption while iterating retry_list in cifs_reconnect
> > > still seen on 5.4-rc3
> > >
> > > I think this has been there for a long time, since we first saw this
> > > on a 4.18.0 based kernel but I just noticed the bug recently.
> > > I just retested on 5.4-rc3 and it's still there.  Easy to repro with a
> > > fairly simple but invasive server restart test - takes only maybe a
> > > couple minutes on my VM.
> > >
> > >
> > > From Frank Sorenson:
> > >
> > > mount off a samba server:
> > >
> > >     # mount //vm1/share /mnt/vm1
> > > -overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds
> > >
> > >
> > > on the client, start 10 'find' loops:
> > >
> > >     # export test_path=/mnt/vm1
> > >     # do_find() { while true ; do find $test_path >/dev/null 2>&1 ; done }
> > >
> > >     # for i in {1..10} ; do do_find & done
> > >
> > >
> > > optional:  also start something to monitor for when the hang occurs:
> > >
> > >     # while true ; do count=$(grep smb2_reconnect /proc/*/stack -A3 |
> > > grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
> > > reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ; sleep 2
> > > ; done
> > >
> > >
> > >
> > > On the samba server:  restart smb.service (loop it in case it requires
> > > more than one restart):
> > >
> > >     # while true ; do echo "$(date): restarting" ; systemctl restart
> > > smb.service ; sleep 5 ; done | tee /var/tmp/smb_restart_log.out
> > >
> > >
> > >
> > >
> > > [  430.454897] list_del corruption. prev->next should be
> > > ffff98d3a8f316c0, but was 2e885cb266355469
> > > [  430.464668] ------------[ cut here ]------------
> > > [  430.466569] kernel BUG at lib/list_debug.c:51!
> > > [  430.468476] invalid opcode: 0000 [#1] SMP PTI
> > > [  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted
> > > 5.4.0-rc3+ #19
> > > [  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > > [  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> > > 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> > > 0f 0b
> > > [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> > > [  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> > > 0000000000000000
> > > [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > ffff98d3b7a17908
> > > [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > 0000000000000285
> > > [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > ffff98d3aabb89c0
> > > [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > ffff98d3b24c4480
> > > [  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> > > knlGS:0000000000000000
> > > [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > 00000000000406f0
> > > [  430.510426] Call Trace:
> > > [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
> > > [  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
> > > [  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
> > > [  430.517452]  ? try_to_wake_up+0x212/0x650
> > > [  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
> > > [  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
> > > [  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
> > > [  430.525116]  kthread+0xfb/0x130
> > > [  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
> > > [  430.528514]  ? kthread_park+0x90/0x90
> > > [  430.530019]  ret_from_fork+0x35/0x40
> > > [  430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
> > > ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
> > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > iptable_mangle iptable_raw iptable_security nf_conntrack
> > > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> > > ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul joydev
> > > virtio_balloon ghash_clmulni_intel i2c_piix4 nfsd nfs_acl lockd
> > > auth_rpcgss grace sunrpc xfs libcrc32c virtio_net net_failover
> > > crc32c_intel virtio_console serio_raw virtio_blk ata_generic failover
> > > pata_acpi qemu_fw_cfg
> > > [  430.552782] ---[ end trace c91d4468f8689482 ]---
> > > [  430.554948] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > [  430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> > > 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> > > 0f 0b
> > > [  430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> > > [  430.567181] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> > > 0000000000000000
> > > [  430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > ffff98d3b7a17908
> > > [  430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > 0000000000000285
> > > [  430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > ffff98d3aabb89c0
> > > [  430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > ffff98d3b24c4480
> > > [  430.581624] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> > > knlGS:0000000000000000
> > > [  430.584881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > 00000000000406f0
> > >
> > >
> > > crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
> > > 0xffffffffc062dc26 <cifs_reconnect+0x226>:      movb
> > > $0x0,0xbb36b(%rip)        # 0xffffffffc06e8f98 <GlobalMid_Lock>
> > > /mnt/build/kernel/fs/cifs/connect.c: 572
> > > 0xffffffffc062dc2d <cifs_reconnect+0x22d>:      mov    %r12,%rdi
> > > 0xffffffffc062dc30 <cifs_reconnect+0x230>:      callq
> > > 0xffffffff8d9d5a20 <mutex_unlock>
> > > /mnt/build/kernel/fs/cifs/connect.c: 574
> > > 0xffffffffc062dc35 <cifs_reconnect+0x235>:      testb
> > > $0x1,0xbb300(%rip)        # 0xffffffffc06e8f3c <cifsFYI>
> > > 0xffffffffc062dc3c <cifs_reconnect+0x23c>:      je
> > > 0xffffffffc062dc43 <cifs_reconnect+0x243>
> > > /mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
> > > 0xffffffffc062dc3e <cifs_reconnect+0x23e>:      data32 data32 data32
> > > xchg %ax,%ax
> > > /mnt/build/kernel/fs/cifs/connect.c: 575
> > > 0xffffffffc062dc43 <cifs_reconnect+0x243>:      mov    0x8(%rsp),%rbp
> > > 0xffffffffc062dc48 <cifs_reconnect+0x248>:      mov    0x0(%rbp),%r14
> > > 0xffffffffc062dc4c <cifs_reconnect+0x24c>:      cmp    %r13,%rbp
> > > 0xffffffffc062dc4f <cifs_reconnect+0x24f>:      jne
> > > 0xffffffffc062dc56 <cifs_reconnect+0x256>
> > > 0xffffffffc062dc51 <cifs_reconnect+0x251>:      jmp
> > > 0xffffffffc062dc90 <cifs_reconnect+0x290>
> > > 0xffffffffc062dc53 <cifs_reconnect+0x253>:      mov    %rax,%r14
> > > /mnt/build/kernel/./include/linux/list.h: 190
> > > 0xffffffffc062dc56 <cifs_reconnect+0x256>:      mov    %rbp,%rdi
> > > 0xffffffffc062dc59 <cifs_reconnect+0x259>:      callq
> > > 0xffffffff8d4e6b00 <__list_del_entry_valid>
> > > 0xffffffffc062dc5e <cifs_reconnect+0x25e>:      test   %al,%al
> > >
> > >
> > > fs/cifs/connect.c
> > > 566         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > 567         if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > 568             mid_entry->mid_state = MID_RETRY_NEEDED;
> > > 569         list_move(&mid_entry->qhead, &retry_list);
> > > 570     }
> > > 571     spin_unlock(&GlobalMid_Lock);
> > > 572     mutex_unlock(&server->srv_mutex);
> > > 573
> > > 574     cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > 577         list_del_init(&mid_entry->qhead);
> > > 578         mid_entry->callback(mid_entry);
> > > 579     }
> > > 580
> > > 581     if (cifs_rdma_enabled(server)) {
> > > 582         mutex_lock(&server->srv_mutex);
> > > 583         smbd_destroy(server);
> > > 584         mutex_unlock(&server->srv_mutex);
> > >
> >



--
Dave Wysochanski
Principal Software Maintenance Engineer
T: 919-754-4024

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 11:42     ` David Wysochanski
@ 2019-10-17 14:08       ` Ronnie Sahlberg
  2019-10-17 15:29         ` David Wysochanski
  0 siblings, 1 reply; 31+ messages in thread
From: Ronnie Sahlberg @ 2019-10-17 14:08 UTC (permalink / raw)
  To: Pavel Shilovsky, linux-cifs; +Cc: Frank Sorenson, David Wysochanski

List, Pavel,

So I think there are two bugs we need to fix in this small block to make it safe against a race against other threads calling cifs_delete_mid() and similar.

We need to to protect the list mutate functions and wrap them inside the GlobalMid_Lock mutex
but we can not hold this lock across the callback call.

But we still need to protect the mid_entry dereference and the ->callback call against the mid structure being freed
by DeleteMidQEntry().  We can do that by taking out an extra reference to the mid while holding the GlobalMid_Lock
and then dropping the reference again after the callback completes.


I think something like this might work :



diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index bdea4b3e8005..3a1a9b63bd9b 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -572,11 +572,19 @@ cifs_reconnect(struct TCP_Server_Info *server)
        mutex_unlock(&server->srv_mutex);
 
        cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
+       spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &retry_list) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
                list_del_init(&mid_entry->qhead);
+               kref_get(&mid_entry->refcount);
+               spin_unlock(&GlobalMid_Lock);
+
                mid_entry->callback(mid_entry);
+               cifs_mid_q_entry_release(mid_entry);
+
+               spin_lock(&GlobalMid_Lock);
        }
+       spin_unlock(&GlobalMid_Lock);
 
        if (cifs_rdma_enabled(server)) {
                mutex_lock(&server->srv_mutex);


Pavel, can you have a look at this and comment?  It is very delicate code so it needs careful review.


regards
ronnie sahlberg



----- Original Message -----
From: "David Wysochanski" <dwysocha@redhat.com>
To: "Ronnie Sahlberg" <lsahlber@redhat.com>
Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
Sent: Thursday, 17 October, 2019 9:42:08 PM
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3

On Thu, Oct 17, 2019 at 5:05 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
>
>
>
> > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > 577         list_del_init(&mid_entry->qhead);
> > > 578         mid_entry->callback(mid_entry);
> > > 579     }
>
> This part (and a similar loop during shutting down the demultiplex thread) is the only place
> where we add/remove to the ->qhead list without holding the GlobalMid_Lock.
>
> I wonder if it is racing against a different thread also modifying qhead for the same mid,
> like cifs_delete_mid() for example.
>

Yes I agree, I was thinking along these same lines of reasoning as I
read the code.  I put the latest on the investigation into the bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1654538#c15

Just before the crash, when we hit the iteration of retry_list
something has gone wrong - the very first mid_entry on retry_list is a
garbage address, when normally it should be the last address from the
previous call to list_move in the loop above it.


 ----- Original Message -----
> > From: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > To: "David Wysochanski" <dwysocha@redhat.com>
> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> > Sent: Thursday, 17 October, 2019 10:17:18 AM
> > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> >
> > I can not reproduce this :-(
> >
> > I have run it for a few hours, restarting samba in a loop with up to 30
> > threads.
> >

I am not sure if it helps but I have 8 CPUs on my VM.
I also have server signing and client signing mandatory.
I can send you the smb.conf offline.


> >
> > Can you check
> > 1, If this only reproduce for you for the root of the share or it also
> > reproduces for a subdirectory?
> > 2, Does it reproduce also if you use "nohandlecache" mount option?
> >    This disables the use of cached open of the root handle, i.e.
> >    open_shroot()
> > 3, When this happens, can you check the content of the mid entry and what
> > these fields are:
> >    mid->mid_flags, mid->handle (this is a function pointer, what does it
> >    point to)
> >    mid->command.   Maybe print the whole structure.
> >

Ok I'll see what I can find out.  So far I am not sure I have
identified what else is touching the mid in between the two loops.

> > regards
> > ronnie sahlberg
> >
> >
> >
> >
> > ----- Original Message -----
> > > From: "David Wysochanski" <dwysocha@redhat.com>
> > > To: "linux-cifs" <linux-cifs@vger.kernel.org>
> > > Cc: "Frank Sorenson" <sorenson@redhat.com>
> > > Sent: Thursday, 17 October, 2019 5:27:02 AM
> > > Subject: list_del corruption while iterating retry_list in cifs_reconnect
> > > still seen on 5.4-rc3
> > >
> > > I think this has been there for a long time, since we first saw this
> > > on a 4.18.0 based kernel but I just noticed the bug recently.
> > > I just retested on 5.4-rc3 and it's still there.  Easy to repro with a
> > > fairly simple but invasive server restart test - takes only maybe a
> > > couple minutes on my VM.
> > >
> > >
> > > From Frank Sorenson:
> > >
> > > mount off a samba server:
> > >
> > >     # mount //vm1/share /mnt/vm1
> > > -overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds
> > >
> > >
> > > on the client, start 10 'find' loops:
> > >
> > >     # export test_path=/mnt/vm1
> > >     # do_find() { while true ; do find $test_path >/dev/null 2>&1 ; done }
> > >
> > >     # for i in {1..10} ; do do_find & done
> > >
> > >
> > > optional:  also start something to monitor for when the hang occurs:
> > >
> > >     # while true ; do count=$(grep smb2_reconnect /proc/*/stack -A3 |
> > > grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
> > > reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ; sleep 2
> > > ; done
> > >
> > >
> > >
> > > On the samba server:  restart smb.service (loop it in case it requires
> > > more than one restart):
> > >
> > >     # while true ; do echo "$(date): restarting" ; systemctl restart
> > > smb.service ; sleep 5 ; done | tee /var/tmp/smb_restart_log.out
> > >
> > >
> > >
> > >
> > > [  430.454897] list_del corruption. prev->next should be
> > > ffff98d3a8f316c0, but was 2e885cb266355469
> > > [  430.464668] ------------[ cut here ]------------
> > > [  430.466569] kernel BUG at lib/list_debug.c:51!
> > > [  430.468476] invalid opcode: 0000 [#1] SMP PTI
> > > [  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted
> > > 5.4.0-rc3+ #19
> > > [  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > > [  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> > > 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> > > 0f 0b
> > > [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> > > [  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> > > 0000000000000000
> > > [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > ffff98d3b7a17908
> > > [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > 0000000000000285
> > > [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > ffff98d3aabb89c0
> > > [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > ffff98d3b24c4480
> > > [  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> > > knlGS:0000000000000000
> > > [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > 00000000000406f0
> > > [  430.510426] Call Trace:
> > > [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
> > > [  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
> > > [  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
> > > [  430.517452]  ? try_to_wake_up+0x212/0x650
> > > [  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
> > > [  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
> > > [  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
> > > [  430.525116]  kthread+0xfb/0x130
> > > [  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
> > > [  430.528514]  ? kthread_park+0x90/0x90
> > > [  430.530019]  ret_from_fork+0x35/0x40
> > > [  430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
> > > ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
> > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > iptable_mangle iptable_raw iptable_security nf_conntrack
> > > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> > > ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul joydev
> > > virtio_balloon ghash_clmulni_intel i2c_piix4 nfsd nfs_acl lockd
> > > auth_rpcgss grace sunrpc xfs libcrc32c virtio_net net_failover
> > > crc32c_intel virtio_console serio_raw virtio_blk ata_generic failover
> > > pata_acpi qemu_fw_cfg
> > > [  430.552782] ---[ end trace c91d4468f8689482 ]---
> > > [  430.554948] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > [  430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> > > 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> > > 0f 0b
> > > [  430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> > > [  430.567181] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> > > 0000000000000000
> > > [  430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > ffff98d3b7a17908
> > > [  430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > 0000000000000285
> > > [  430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > ffff98d3aabb89c0
> > > [  430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > ffff98d3b24c4480
> > > [  430.581624] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> > > knlGS:0000000000000000
> > > [  430.584881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > 00000000000406f0
> > >
> > >
> > > crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
> > > 0xffffffffc062dc26 <cifs_reconnect+0x226>:      movb
> > > $0x0,0xbb36b(%rip)        # 0xffffffffc06e8f98 <GlobalMid_Lock>
> > > /mnt/build/kernel/fs/cifs/connect.c: 572
> > > 0xffffffffc062dc2d <cifs_reconnect+0x22d>:      mov    %r12,%rdi
> > > 0xffffffffc062dc30 <cifs_reconnect+0x230>:      callq
> > > 0xffffffff8d9d5a20 <mutex_unlock>
> > > /mnt/build/kernel/fs/cifs/connect.c: 574
> > > 0xffffffffc062dc35 <cifs_reconnect+0x235>:      testb
> > > $0x1,0xbb300(%rip)        # 0xffffffffc06e8f3c <cifsFYI>
> > > 0xffffffffc062dc3c <cifs_reconnect+0x23c>:      je
> > > 0xffffffffc062dc43 <cifs_reconnect+0x243>
> > > /mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
> > > 0xffffffffc062dc3e <cifs_reconnect+0x23e>:      data32 data32 data32
> > > xchg %ax,%ax
> > > /mnt/build/kernel/fs/cifs/connect.c: 575
> > > 0xffffffffc062dc43 <cifs_reconnect+0x243>:      mov    0x8(%rsp),%rbp
> > > 0xffffffffc062dc48 <cifs_reconnect+0x248>:      mov    0x0(%rbp),%r14
> > > 0xffffffffc062dc4c <cifs_reconnect+0x24c>:      cmp    %r13,%rbp
> > > 0xffffffffc062dc4f <cifs_reconnect+0x24f>:      jne
> > > 0xffffffffc062dc56 <cifs_reconnect+0x256>
> > > 0xffffffffc062dc51 <cifs_reconnect+0x251>:      jmp
> > > 0xffffffffc062dc90 <cifs_reconnect+0x290>
> > > 0xffffffffc062dc53 <cifs_reconnect+0x253>:      mov    %rax,%r14
> > > /mnt/build/kernel/./include/linux/list.h: 190
> > > 0xffffffffc062dc56 <cifs_reconnect+0x256>:      mov    %rbp,%rdi
> > > 0xffffffffc062dc59 <cifs_reconnect+0x259>:      callq
> > > 0xffffffff8d4e6b00 <__list_del_entry_valid>
> > > 0xffffffffc062dc5e <cifs_reconnect+0x25e>:      test   %al,%al
> > >
> > >
> > > fs/cifs/connect.c
> > > 566         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > 567         if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > 568             mid_entry->mid_state = MID_RETRY_NEEDED;
> > > 569         list_move(&mid_entry->qhead, &retry_list);
> > > 570     }
> > > 571     spin_unlock(&GlobalMid_Lock);
> > > 572     mutex_unlock(&server->srv_mutex);
> > > 573
> > > 574     cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > 577         list_del_init(&mid_entry->qhead);
> > > 578         mid_entry->callback(mid_entry);
> > > 579     }
> > > 580
> > > 581     if (cifs_rdma_enabled(server)) {
> > > 582         mutex_lock(&server->srv_mutex);
> > > 583         smbd_destroy(server);
> > > 584         mutex_unlock(&server->srv_mutex);
> > >
> >



--
Dave Wysochanski
Principal Software Maintenance Engineer
T: 919-754-4024

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 14:08       ` Ronnie Sahlberg
@ 2019-10-17 15:29         ` David Wysochanski
  2019-10-17 18:29           ` Pavel Shilovskiy
  0 siblings, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-17 15:29 UTC (permalink / raw)
  To: Ronnie Sahlberg; +Cc: Pavel Shilovsky, linux-cifs, Frank Sorenson

On Thu, Oct 17, 2019 at 10:08 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
>
> List, Pavel,
>
> So I think there are two bugs we need to fix in this small block to make it safe against a race against other threads calling cifs_delete_mid() and similar.
>
> We need to to protect the list mutate functions and wrap them inside the GlobalMid_Lock mutex
> but we can not hold this lock across the callback call.
>
> But we still need to protect the mid_entry dereference and the ->callback call against the mid structure being freed
> by DeleteMidQEntry().  We can do that by taking out an extra reference to the mid while holding the GlobalMid_Lock
> and then dropping the reference again after the callback completes.
>
>
> I think something like this might work :
>
>
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index bdea4b3e8005..3a1a9b63bd9b 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -572,11 +572,19 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         mutex_unlock(&server->srv_mutex);
>
>         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> +       spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);

I think you need a reference before this and something like
if (mid->mid_flags ...)  /* check for someone else already deleting it */
;
else
>                 list_del_init(&mid_entry->qhead);

I am still tracing and I do not see the root of the problem yet.
Unsurprisingly, it looks like a use after free though.


> +               kref_get(&mid_entry->refcount);
> +               spin_unlock(&GlobalMid_Lock);
> +
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
> +
> +               spin_lock(&GlobalMid_Lock);
>         }
> +       spin_unlock(&GlobalMid_Lock);
>
>         if (cifs_rdma_enabled(server)) {
>                 mutex_lock(&server->srv_mutex);
>
>
> Pavel, can you have a look at this and comment?  It is very delicate code so it needs careful review.
>
>
> regards
> ronnie sahlberg
>
>
>
> ----- Original Message -----
> From: "David Wysochanski" <dwysocha@redhat.com>
> To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> Sent: Thursday, 17 October, 2019 9:42:08 PM
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
>
> On Thu, Oct 17, 2019 at 5:05 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> >
> >
> >
> > > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > 577         list_del_init(&mid_entry->qhead);
> > > > 578         mid_entry->callback(mid_entry);
> > > > 579     }
> >
> > This part (and a similar loop during shutting down the demultiplex thread) is the only place
> > where we add/remove to the ->qhead list without holding the GlobalMid_Lock.
> >
> > I wonder if it is racing against a different thread also modifying qhead for the same mid,
> > like cifs_delete_mid() for example.
> >
>
> Yes I agree, I was thinking along these same lines of reasoning as I
> read the code.  I put the latest on the investigation into the bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1654538#c15
>
> Just before the crash, when we hit the iteration of retry_list
> something has gone wrong - the very first mid_entry on retry_list is a
> garbage address, when normally it should be the last address from the
> previous call to list_move in the loop above it.
>
>
>  ----- Original Message -----
> > > From: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > > To: "David Wysochanski" <dwysocha@redhat.com>
> > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> > > Sent: Thursday, 17 October, 2019 10:17:18 AM
> > > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> > >
> > > I can not reproduce this :-(
> > >
> > > I have run it for a few hours, restarting samba in a loop with up to 30
> > > threads.
> > >
>
> I am not sure if it helps but I have 8 CPUs on my VM.
> I also have server signing and client signing mandatory.
> I can send you the smb.conf offline.
>
>
> > >
> > > Can you check
> > > 1, If this only reproduce for you for the root of the share or it also
> > > reproduces for a subdirectory?
> > > 2, Does it reproduce also if you use "nohandlecache" mount option?
> > >    This disables the use of cached open of the root handle, i.e.
> > >    open_shroot()
> > > 3, When this happens, can you check the content of the mid entry and what
> > > these fields are:
> > >    mid->mid_flags, mid->handle (this is a function pointer, what does it
> > >    point to)
> > >    mid->command.   Maybe print the whole structure.
> > >
>
> Ok I'll see what I can find out.  So far I am not sure I have
> identified what else is touching the mid in between the two loops.
>
> > > regards
> > > ronnie sahlberg
> > >
> > >
> > >
> > >
> > > ----- Original Message -----
> > > > From: "David Wysochanski" <dwysocha@redhat.com>
> > > > To: "linux-cifs" <linux-cifs@vger.kernel.org>
> > > > Cc: "Frank Sorenson" <sorenson@redhat.com>
> > > > Sent: Thursday, 17 October, 2019 5:27:02 AM
> > > > Subject: list_del corruption while iterating retry_list in cifs_reconnect
> > > > still seen on 5.4-rc3
> > > >
> > > > I think this has been there for a long time, since we first saw this
> > > > on a 4.18.0 based kernel but I just noticed the bug recently.
> > > > I just retested on 5.4-rc3 and it's still there.  Easy to repro with a
> > > > fairly simple but invasive server restart test - takes only maybe a
> > > > couple minutes on my VM.
> > > >
> > > >
> > > > From Frank Sorenson:
> > > >
> > > > mount off a samba server:
> > > >
> > > >     # mount //vm1/share /mnt/vm1
> > > > -overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds
> > > >
> > > >
> > > > on the client, start 10 'find' loops:
> > > >
> > > >     # export test_path=/mnt/vm1
> > > >     # do_find() { while true ; do find $test_path >/dev/null 2>&1 ; done }
> > > >
> > > >     # for i in {1..10} ; do do_find & done
> > > >
> > > >
> > > > optional:  also start something to monitor for when the hang occurs:
> > > >
> > > >     # while true ; do count=$(grep smb2_reconnect /proc/*/stack -A3 |
> > > > grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
> > > > reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ; sleep 2
> > > > ; done
> > > >
> > > >
> > > >
> > > > On the samba server:  restart smb.service (loop it in case it requires
> > > > more than one restart):
> > > >
> > > >     # while true ; do echo "$(date): restarting" ; systemctl restart
> > > > smb.service ; sleep 5 ; done | tee /var/tmp/smb_restart_log.out
> > > >
> > > >
> > > >
> > > >
> > > > [  430.454897] list_del corruption. prev->next should be
> > > > ffff98d3a8f316c0, but was 2e885cb266355469
> > > > [  430.464668] ------------[ cut here ]------------
> > > > [  430.466569] kernel BUG at lib/list_debug.c:51!
> > > > [  430.468476] invalid opcode: 0000 [#1] SMP PTI
> > > > [  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted
> > > > 5.4.0-rc3+ #19
> > > > [  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > > > [  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> > > > 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> > > > 0f 0b
> > > > [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> > > > [  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> > > > 0000000000000000
> > > > [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > > ffff98d3b7a17908
> > > > [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > > 0000000000000285
> > > > [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > > ffff98d3aabb89c0
> > > > [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > > ffff98d3b24c4480
> > > > [  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> > > > knlGS:0000000000000000
> > > > [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > > 00000000000406f0
> > > > [  430.510426] Call Trace:
> > > > [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
> > > > [  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
> > > > [  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
> > > > [  430.517452]  ? try_to_wake_up+0x212/0x650
> > > > [  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
> > > > [  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
> > > > [  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
> > > > [  430.525116]  kthread+0xfb/0x130
> > > > [  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
> > > > [  430.528514]  ? kthread_park+0x90/0x90
> > > > [  430.530019]  ret_from_fork+0x35/0x40
> > > > [  430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
> > > > ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
> > > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > > iptable_mangle iptable_raw iptable_security nf_conntrack
> > > > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> > > > ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul joydev
> > > > virtio_balloon ghash_clmulni_intel i2c_piix4 nfsd nfs_acl lockd
> > > > auth_rpcgss grace sunrpc xfs libcrc32c virtio_net net_failover
> > > > crc32c_intel virtio_console serio_raw virtio_blk ata_generic failover
> > > > pata_acpi qemu_fw_cfg
> > > > [  430.552782] ---[ end trace c91d4468f8689482 ]---
> > > > [  430.554948] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > [  430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15
> > > > 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32
> > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff
> > > > 0f 0b
> > > > [  430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> > > > [  430.567181] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX:
> > > > 0000000000000000
> > > > [  430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > > ffff98d3b7a17908
> > > > [  430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > > 0000000000000285
> > > > [  430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > > ffff98d3aabb89c0
> > > > [  430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > > ffff98d3b24c4480
> > > > [  430.581624] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000)
> > > > knlGS:0000000000000000
> > > > [  430.584881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > > 00000000000406f0
> > > >
> > > >
> > > > crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
> > > > 0xffffffffc062dc26 <cifs_reconnect+0x226>:      movb
> > > > $0x0,0xbb36b(%rip)        # 0xffffffffc06e8f98 <GlobalMid_Lock>
> > > > /mnt/build/kernel/fs/cifs/connect.c: 572
> > > > 0xffffffffc062dc2d <cifs_reconnect+0x22d>:      mov    %r12,%rdi
> > > > 0xffffffffc062dc30 <cifs_reconnect+0x230>:      callq
> > > > 0xffffffff8d9d5a20 <mutex_unlock>
> > > > /mnt/build/kernel/fs/cifs/connect.c: 574
> > > > 0xffffffffc062dc35 <cifs_reconnect+0x235>:      testb
> > > > $0x1,0xbb300(%rip)        # 0xffffffffc06e8f3c <cifsFYI>
> > > > 0xffffffffc062dc3c <cifs_reconnect+0x23c>:      je
> > > > 0xffffffffc062dc43 <cifs_reconnect+0x243>
> > > > /mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
> > > > 0xffffffffc062dc3e <cifs_reconnect+0x23e>:      data32 data32 data32
> > > > xchg %ax,%ax
> > > > /mnt/build/kernel/fs/cifs/connect.c: 575
> > > > 0xffffffffc062dc43 <cifs_reconnect+0x243>:      mov    0x8(%rsp),%rbp
> > > > 0xffffffffc062dc48 <cifs_reconnect+0x248>:      mov    0x0(%rbp),%r14
> > > > 0xffffffffc062dc4c <cifs_reconnect+0x24c>:      cmp    %r13,%rbp
> > > > 0xffffffffc062dc4f <cifs_reconnect+0x24f>:      jne
> > > > 0xffffffffc062dc56 <cifs_reconnect+0x256>
> > > > 0xffffffffc062dc51 <cifs_reconnect+0x251>:      jmp
> > > > 0xffffffffc062dc90 <cifs_reconnect+0x290>
> > > > 0xffffffffc062dc53 <cifs_reconnect+0x253>:      mov    %rax,%r14
> > > > /mnt/build/kernel/./include/linux/list.h: 190
> > > > 0xffffffffc062dc56 <cifs_reconnect+0x256>:      mov    %rbp,%rdi
> > > > 0xffffffffc062dc59 <cifs_reconnect+0x259>:      callq
> > > > 0xffffffff8d4e6b00 <__list_del_entry_valid>
> > > > 0xffffffffc062dc5e <cifs_reconnect+0x25e>:      test   %al,%al
> > > >
> > > >
> > > > fs/cifs/connect.c
> > > > 566         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > 567         if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > > 568             mid_entry->mid_state = MID_RETRY_NEEDED;
> > > > 569         list_move(&mid_entry->qhead, &retry_list);
> > > > 570     }
> > > > 571     spin_unlock(&GlobalMid_Lock);
> > > > 572     mutex_unlock(&server->srv_mutex);
> > > > 573
> > > > 574     cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > 577         list_del_init(&mid_entry->qhead);
> > > > 578         mid_entry->callback(mid_entry);
> > > > 579     }
> > > > 580
> > > > 581     if (cifs_rdma_enabled(server)) {
> > > > 582         mutex_lock(&server->srv_mutex);
> > > > 583         smbd_destroy(server);
> > > > 584         mutex_unlock(&server->srv_mutex);
> > > >
> > >
>
>
>
> --
> Dave Wysochanski
> Principal Software Maintenance Engineer
> T: 919-754-4024



-- 
Dave Wysochanski
Principal Software Maintenance Engineer
T: 919-754-4024

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 15:29         ` David Wysochanski
@ 2019-10-17 18:29           ` Pavel Shilovskiy
  2019-10-17 19:23             ` David Wysochanski
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Shilovskiy @ 2019-10-17 18:29 UTC (permalink / raw)
  To: David Wysochanski, Ronnie Sahlberg; +Cc: linux-cifs, Frank Sorenson

Hi Ronnie, David,

Thanks for looking into this. This actually reminds me of the commit 696e420bb2a66:

--------------------------------------
commit 696e420bb2a6624478105651d5368d45b502b324
Author: Lars Persson <lars.persson@axis.com>
Date:   Mon Jun 25 14:05:25 2018 +0200

    cifs: Fix use after free of a mid_q_entry

    With protocol version 2.0 mounts we have seen crashes with corrupt mid
    entries. Either the server->pending_mid_q list becomes corrupt with a
    cyclic reference in one element or a mid object fetched by the
    demultiplexer thread becomes overwritten during use.

    Code review identified a race between the demultiplexer thread and the
    request issuing thread. The demultiplexer thread seems to be written
    with the assumption that it is the sole user of the mid object until
    it calls the mid callback which either wakes the issuer task or
    deletes the mid.

    This assumption is not true because the issuer task can be woken up
    earlier by a signal. If the demultiplexer thread has proceeded as far
    as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer
    thread will happily end up calling cifs_delete_mid while the
    demultiplexer thread still is using the mid object.

    Inserting a delay in the cifs demultiplexer thread widens the race
    window and makes reproduction of the race very easy:

                    if (server->large_buf)
                            buf = server->bigbuf;

    +               usleep_range(500, 4000);

                    server->lstrp = jiffies;

    To resolve this I think the proper solution involves putting a
    reference count on the mid object. This patch makes sure that the
    demultiplexer thread holds a reference until it has finished
    processing the transaction.

    Cc: stable@vger.kernel.org
    Signed-off-by: Lars Persson <larper@axis.com>
    Acked-by: Paulo Alcantara <palcantara@suse.de>
    Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
    Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>

--------------------------------------

The similar solution of taking an extra reference should apply to the case of reconnect as well. The reference should be taken during the process of moving mid entries to the private list. Once a callback completes, such a reference should be put back thus freeing the mid.

--
Best regards,
Pavel Shilovsky

-----Original Message-----
From: David Wysochanski <dwysocha@redhat.com> 
Sent: Thursday, October 17, 2019 8:30 AM
To: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3

On Thu, Oct 17, 2019 at 10:08 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
>
> List, Pavel,
>
> So I think there are two bugs we need to fix in this small block to make it safe against a race against other threads calling cifs_delete_mid() and similar.
>
> We need to to protect the list mutate functions and wrap them inside 
> the GlobalMid_Lock mutex but we can not hold this lock across the callback call.
>
> But we still need to protect the mid_entry dereference and the 
> ->callback call against the mid structure being freed by 
> DeleteMidQEntry().  We can do that by taking out an extra reference to the mid while holding the GlobalMid_Lock and then dropping the reference again after the callback completes.
>
>
> I think something like this might work :
>
>
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 
> bdea4b3e8005..3a1a9b63bd9b 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -572,11 +572,19 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         mutex_unlock(&server->srv_mutex);
>
>         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> +       spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, 
> qhead);

I think you need a reference before this and something like if (mid->mid_flags ...)  /* check for someone else already deleting it */ ; else
>                 list_del_init(&mid_entry->qhead);

I am still tracing and I do not see the root of the problem yet.
Unsurprisingly, it looks like a use after free though.


> +               kref_get(&mid_entry->refcount);
> +               spin_unlock(&GlobalMid_Lock);
> +
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
> +
> +               spin_lock(&GlobalMid_Lock);
>         }
> +       spin_unlock(&GlobalMid_Lock);
>
>         if (cifs_rdma_enabled(server)) {
>                 mutex_lock(&server->srv_mutex);
>
>
> Pavel, can you have a look at this and comment?  It is very delicate code so it needs careful review.
>
>
> regards
> ronnie sahlberg
>
>
>
> ----- Original Message -----
> From: "David Wysochanski" <dwysocha@redhat.com>
> To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" 
> <sorenson@redhat.com>
> Sent: Thursday, 17 October, 2019 9:42:08 PM
> Subject: Re: list_del corruption while iterating retry_list in 
> cifs_reconnect still seen on 5.4-rc3
>
> On Thu, Oct 17, 2019 at 5:05 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> >
> >
> >
> > > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > 577         list_del_init(&mid_entry->qhead);
> > > > 578         mid_entry->callback(mid_entry);
> > > > 579     }
> >
> > This part (and a similar loop during shutting down the demultiplex 
> > thread) is the only place where we add/remove to the ->qhead list without holding the GlobalMid_Lock.
> >
> > I wonder if it is racing against a different thread also modifying 
> > qhead for the same mid, like cifs_delete_mid() for example.
> >
>
> Yes I agree, I was thinking along these same lines of reasoning as I 
> read the code.  I put the latest on the investigation into the bug:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> illa.redhat.com%2Fshow_bug.cgi%3Fid%3D1654538%23c15&amp;data=02%7C01%7
> Cpshilov%40microsoft.com%7C3f7f3981d44f4cac5afc08d75316efe8%7C72f988bf
> 86f141af91ab2d7cd011db47%7C1%7C0%7C637069230291690406&amp;sdata=0P%2Bj
> YJWfBSHXGUNPZEeZR4W9tOb2%2BGCC1WtDmiyzpEI%3D&amp;reserved=0
>
> Just before the crash, when we hit the iteration of retry_list 
> something has gone wrong - the very first mid_entry on retry_list is a 
> garbage address, when normally it should be the last address from the 
> previous call to list_move in the loop above it.
>
>
>  ----- Original Message -----
> > > From: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > > To: "David Wysochanski" <dwysocha@redhat.com>
> > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" 
> > > <sorenson@redhat.com>
> > > Sent: Thursday, 17 October, 2019 10:17:18 AM
> > > Subject: Re: list_del corruption while iterating retry_list in 
> > > cifs_reconnect still seen on 5.4-rc3
> > >
> > > I can not reproduce this :-(
> > >
> > > I have run it for a few hours, restarting samba in a loop with up 
> > > to 30 threads.
> > >
>
> I am not sure if it helps but I have 8 CPUs on my VM.
> I also have server signing and client signing mandatory.
> I can send you the smb.conf offline.
>
>
> > >
> > > Can you check
> > > 1, If this only reproduce for you for the root of the share or it 
> > > also reproduces for a subdirectory?
> > > 2, Does it reproduce also if you use "nohandlecache" mount option?
> > >    This disables the use of cached open of the root handle, i.e.
> > >    open_shroot()
> > > 3, When this happens, can you check the content of the mid entry 
> > > and what these fields are:
> > >    mid->mid_flags, mid->handle (this is a function pointer, what does it
> > >    point to)
> > >    mid->command.   Maybe print the whole structure.
> > >
>
> Ok I'll see what I can find out.  So far I am not sure I have 
> identified what else is touching the mid in between the two loops.
>
> > > regards
> > > ronnie sahlberg
> > >
> > >
> > >
> > >
> > > ----- Original Message -----
> > > > From: "David Wysochanski" <dwysocha@redhat.com>
> > > > To: "linux-cifs" <linux-cifs@vger.kernel.org>
> > > > Cc: "Frank Sorenson" <sorenson@redhat.com>
> > > > Sent: Thursday, 17 October, 2019 5:27:02 AM
> > > > Subject: list_del corruption while iterating retry_list in 
> > > > cifs_reconnect still seen on 5.4-rc3
> > > >
> > > > I think this has been there for a long time, since we first saw 
> > > > this on a 4.18.0 based kernel but I just noticed the bug recently.
> > > > I just retested on 5.4-rc3 and it's still there.  Easy to repro 
> > > > with a fairly simple but invasive server restart test - takes 
> > > > only maybe a couple minutes on my VM.
> > > >
> > > >
> > > > From Frank Sorenson:
> > > >
> > > > mount off a samba server:
> > > >
> > > >     # mount //vm1/share /mnt/vm1 
> > > > -overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds
> > > >
> > > >
> > > > on the client, start 10 'find' loops:
> > > >
> > > >     # export test_path=/mnt/vm1
> > > >     # do_find() { while true ; do find $test_path >/dev/null 
> > > > 2>&1 ; done }
> > > >
> > > >     # for i in {1..10} ; do do_find & done
> > > >
> > > >
> > > > optional:  also start something to monitor for when the hang occurs:
> > > >
> > > >     # while true ; do count=$(grep smb2_reconnect /proc/*/stack 
> > > > -A3 | grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
> > > > reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ; 
> > > > sleep 2 ; done
> > > >
> > > >
> > > >
> > > > On the samba server:  restart smb.service (loop it in case it 
> > > > requires more than one restart):
> > > >
> > > >     # while true ; do echo "$(date): restarting" ; systemctl 
> > > > restart smb.service ; sleep 5 ; done | tee 
> > > > /var/tmp/smb_restart_log.out
> > > >
> > > >
> > > >
> > > >
> > > > [  430.454897] list_del corruption. prev->next should be 
> > > > ffff98d3a8f316c0, but was 2e885cb266355469 [  430.464668] 
> > > > ------------[ cut here ]------------ [  430.466569] kernel BUG 
> > > > at lib/list_debug.c:51!
> > > > [  430.468476] invalid opcode: 0000 [#1] SMP PTI [  430.470286] 
> > > > CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 
> > > > 5.4.0-rc3+ #19 [  430.473472] Hardware name: Red Hat KVM, BIOS 
> > > > 0.5.1 01/01/2011 [  430.475872] RIP: 
> > > > 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 
> > > > 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 
> > > > 15 8e e8 32
> > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 
> > > > c5 ff 0f 0b [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 
> > > > 00010246 [  430.487665] RAX: 0000000000000054 RBX: 
> > > > ffff98d3aabb8800 RCX:
> > > > 0000000000000000
> > > > [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > > ffff98d3b7a17908
> > > > [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > > 0000000000000285
> > > > [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > > ffff98d3aabb89c0
> > > > [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > > ffff98d3b24c4480
> > > > [  430.501981] FS:  0000000000000000(0000) 
> > > > GS:ffff98d3b7a00000(0000)
> > > > knlGS:0000000000000000
> > > > [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> > > > [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > > 00000000000406f0
> > > > [  430.510426] Call Trace:
> > > > [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs] [  430.513350]  
> > > > cifs_readv_from_socket+0x220/0x250 [cifs] [  430.515464]  
> > > > cifs_read_from_socket+0x4a/0x70 [cifs] [  430.517452]  ? 
> > > > try_to_wake_up+0x212/0x650 [  430.519122]  ? 
> > > > cifs_small_buf_get+0x16/0x30 [cifs] [  430.521086]  ? 
> > > > allocate_buffers+0x66/0x120 [cifs] [  430.523019]  
> > > > cifs_demultiplex_thread+0xdc/0xc30 [cifs] [  430.525116]  
> > > > kthread+0xfb/0x130 [  430.526421]  ? 
> > > > cifs_handle_standard+0x190/0x190 [cifs] [  430.528514]  ? 
> > > > kthread_park+0x90/0x90 [  430.530019]  ret_from_fork+0x35/0x40 [  
> > > > 430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter 
> > > > ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat 
> > > > ip6table_mangle ip6table_raw ip6table_security iptable_nat 
> > > > nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
> > > > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter 
> > > > ebtables ip6table_filter ip6_tables crct10dif_pclmul 
> > > > crc32_pclmul joydev virtio_balloon ghash_clmulni_intel i2c_piix4 
> > > > nfsd nfs_acl lockd auth_rpcgss grace sunrpc xfs libcrc32c 
> > > > virtio_net net_failover crc32c_intel virtio_console serio_raw 
> > > > virtio_blk ata_generic failover pata_acpi qemu_fw_cfg [  
> > > > 430.552782] ---[ end trace c91d4468f8689482 ]--- [  430.554948] 
> > > > RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > [  430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 
> > > > 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 
> > > > 15 8e e8 32
> > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 
> > > > c5 ff 0f 0b [  430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS: 
> > > > 00010246 [  430.567181] RAX: 0000000000000054 RBX: 
> > > > ffff98d3aabb8800 RCX:
> > > > 0000000000000000
> > > > [  430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > > ffff98d3b7a17908
> > > > [  430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > > 0000000000000285
> > > > [  430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > > ffff98d3aabb89c0
> > > > [  430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > > ffff98d3b24c4480
> > > > [  430.581624] FS:  0000000000000000(0000) 
> > > > GS:ffff98d3b7a00000(0000)
> > > > knlGS:0000000000000000
> > > > [  430.584881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> > > > [  430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > > 00000000000406f0
> > > >
> > > >
> > > > crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
> > > > 0xffffffffc062dc26 <cifs_reconnect+0x226>:      movb
> > > > $0x0,0xbb36b(%rip)        # 0xffffffffc06e8f98 <GlobalMid_Lock>
> > > > /mnt/build/kernel/fs/cifs/connect.c: 572
> > > > 0xffffffffc062dc2d <cifs_reconnect+0x22d>:      mov    %r12,%rdi
> > > > 0xffffffffc062dc30 <cifs_reconnect+0x230>:      callq
> > > > 0xffffffff8d9d5a20 <mutex_unlock>
> > > > /mnt/build/kernel/fs/cifs/connect.c: 574
> > > > 0xffffffffc062dc35 <cifs_reconnect+0x235>:      testb
> > > > $0x1,0xbb300(%rip)        # 0xffffffffc06e8f3c <cifsFYI>
> > > > 0xffffffffc062dc3c <cifs_reconnect+0x23c>:      je
> > > > 0xffffffffc062dc43 <cifs_reconnect+0x243>
> > > > /mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
> > > > 0xffffffffc062dc3e <cifs_reconnect+0x23e>:      data32 data32 data32
> > > > xchg %ax,%ax
> > > > /mnt/build/kernel/fs/cifs/connect.c: 575
> > > > 0xffffffffc062dc43 <cifs_reconnect+0x243>:      mov    0x8(%rsp),%rbp
> > > > 0xffffffffc062dc48 <cifs_reconnect+0x248>:      mov    0x0(%rbp),%r14
> > > > 0xffffffffc062dc4c <cifs_reconnect+0x24c>:      cmp    %r13,%rbp
> > > > 0xffffffffc062dc4f <cifs_reconnect+0x24f>:      jne
> > > > 0xffffffffc062dc56 <cifs_reconnect+0x256>
> > > > 0xffffffffc062dc51 <cifs_reconnect+0x251>:      jmp
> > > > 0xffffffffc062dc90 <cifs_reconnect+0x290>
> > > > 0xffffffffc062dc53 <cifs_reconnect+0x253>:      mov    %rax,%r14
> > > > /mnt/build/kernel/./include/linux/list.h: 190
> > > > 0xffffffffc062dc56 <cifs_reconnect+0x256>:      mov    %rbp,%rdi
> > > > 0xffffffffc062dc59 <cifs_reconnect+0x259>:      callq
> > > > 0xffffffff8d4e6b00 <__list_del_entry_valid>
> > > > 0xffffffffc062dc5e <cifs_reconnect+0x25e>:      test   %al,%al
> > > >
> > > >
> > > > fs/cifs/connect.c
> > > > 566         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > 567         if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > > 568             mid_entry->mid_state = MID_RETRY_NEEDED;
> > > > 569         list_move(&mid_entry->qhead, &retry_list);
> > > > 570     }
> > > > 571     spin_unlock(&GlobalMid_Lock);
> > > > 572     mutex_unlock(&server->srv_mutex);
> > > > 573
> > > > 574     cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > 577         list_del_init(&mid_entry->qhead);
> > > > 578         mid_entry->callback(mid_entry);
> > > > 579     }
> > > > 580
> > > > 581     if (cifs_rdma_enabled(server)) {
> > > > 582         mutex_lock(&server->srv_mutex);
> > > > 583         smbd_destroy(server);
> > > > 584         mutex_unlock(&server->srv_mutex);
> > > >
> > >
>
>
>
> --
> Dave Wysochanski
> Principal Software Maintenance Engineer
> T: 919-754-4024



--
Dave Wysochanski
Principal Software Maintenance Engineer
T: 919-754-4024

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 18:29           ` Pavel Shilovskiy
@ 2019-10-17 19:23             ` David Wysochanski
  2019-10-17 19:58               ` Pavel Shilovskiy
  0 siblings, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-17 19:23 UTC (permalink / raw)
  To: Pavel Shilovskiy; +Cc: Ronnie Sahlberg, linux-cifs, Frank Sorenson

On Thu, Oct 17, 2019 at 2:29 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
>
> Hi Ronnie, David,
>
> Thanks for looking into this. This actually reminds me of the commit 696e420bb2a66:
>
> --------------------------------------
> commit 696e420bb2a6624478105651d5368d45b502b324
> Author: Lars Persson <lars.persson@axis.com>
> Date:   Mon Jun 25 14:05:25 2018 +0200
>
>     cifs: Fix use after free of a mid_q_entry
>
>     With protocol version 2.0 mounts we have seen crashes with corrupt mid
>     entries. Either the server->pending_mid_q list becomes corrupt with a
>     cyclic reference in one element or a mid object fetched by the
>     demultiplexer thread becomes overwritten during use.
>
>     Code review identified a race between the demultiplexer thread and the
>     request issuing thread. The demultiplexer thread seems to be written
>     with the assumption that it is the sole user of the mid object until
>     it calls the mid callback which either wakes the issuer task or
>     deletes the mid.
>
>     This assumption is not true because the issuer task can be woken up
>     earlier by a signal. If the demultiplexer thread has proceeded as far
>     as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer
>     thread will happily end up calling cifs_delete_mid while the
>     demultiplexer thread still is using the mid object.
>
>     Inserting a delay in the cifs demultiplexer thread widens the race
>     window and makes reproduction of the race very easy:
>
>                     if (server->large_buf)
>                             buf = server->bigbuf;
>
>     +               usleep_range(500, 4000);
>
>                     server->lstrp = jiffies;
>
>     To resolve this I think the proper solution involves putting a
>     reference count on the mid object. This patch makes sure that the
>     demultiplexer thread holds a reference until it has finished
>     processing the transaction.
>
>     Cc: stable@vger.kernel.org
>     Signed-off-by: Lars Persson <larper@axis.com>
>     Acked-by: Paulo Alcantara <palcantara@suse.de>
>     Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
>     Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
>     Signed-off-by: Steve French <stfrench@microsoft.com>
>
> --------------------------------------
>
> The similar solution of taking an extra reference should apply to the case of reconnect as well. The reference should be taken during the process of moving mid entries to the private list. Once a callback completes, such a reference should be put back thus freeing the mid.
>

Ah ok very good.  The above seems consistent with the traces I'm
seeing of the race.
I am going to test this patch as it sounds like what you're describing
and similar to what Ronnie suggested earlier:

--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
+               kref_get(&mid_entry->refcount);
                if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
                        mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list);
@@ -576,6 +577,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
                list_del_init(&mid_entry->qhead);
                mid_entry->callback(mid_entry);
+               cifs_mid_q_entry_release(mid_entry);
        }

        if (cifs_rdma_enabled(server)) {






> -----Original Message-----
> From: David Wysochanski <dwysocha@redhat.com>
> Sent: Thursday, October 17, 2019 8:30 AM
> To: Ronnie Sahlberg <lsahlber@redhat.com>
> Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
>
> On Thu, Oct 17, 2019 at 10:08 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> >
> > List, Pavel,
> >
> > So I think there are two bugs we need to fix in this small block to make it safe against a race against other threads calling cifs_delete_mid() and similar.
> >
> > We need to to protect the list mutate functions and wrap them inside
> > the GlobalMid_Lock mutex but we can not hold this lock across the callback call.
> >
> > But we still need to protect the mid_entry dereference and the
> > ->callback call against the mid structure being freed by
> > DeleteMidQEntry().  We can do that by taking out an extra reference to the mid while holding the GlobalMid_Lock and then dropping the reference again after the callback completes.
> >
> >
> > I think something like this might work :
> >
> >
> >
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> > bdea4b3e8005..3a1a9b63bd9b 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -572,11 +572,19 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         mutex_unlock(&server->srv_mutex);
> >
> >         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > +       spin_lock(&GlobalMid_Lock);
> >         list_for_each_safe(tmp, tmp2, &retry_list) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry,
> > qhead);
>
> I think you need a reference before this and something like if (mid->mid_flags ...)  /* check for someone else already deleting it */ ; else
> >                 list_del_init(&mid_entry->qhead);
>
> I am still tracing and I do not see the root of the problem yet.
> Unsurprisingly, it looks like a use after free though.
>
>
> > +               kref_get(&mid_entry->refcount);
> > +               spin_unlock(&GlobalMid_Lock);
> > +
> >                 mid_entry->callback(mid_entry);
> > +               cifs_mid_q_entry_release(mid_entry);
> > +
> > +               spin_lock(&GlobalMid_Lock);
> >         }
> > +       spin_unlock(&GlobalMid_Lock);
> >
> >         if (cifs_rdma_enabled(server)) {
> >                 mutex_lock(&server->srv_mutex);
> >
> >
> > Pavel, can you have a look at this and comment?  It is very delicate code so it needs careful review.
> >
> >
> > regards
> > ronnie sahlberg
> >
> >
> >
> > ----- Original Message -----
> > From: "David Wysochanski" <dwysocha@redhat.com>
> > To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > <sorenson@redhat.com>
> > Sent: Thursday, 17 October, 2019 9:42:08 PM
> > Subject: Re: list_del corruption while iterating retry_list in
> > cifs_reconnect still seen on 5.4-rc3
> >
> > On Thu, Oct 17, 2019 at 5:05 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> > >
> > >
> > >
> > > > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > > 577         list_del_init(&mid_entry->qhead);
> > > > > 578         mid_entry->callback(mid_entry);
> > > > > 579     }
> > >
> > > This part (and a similar loop during shutting down the demultiplex
> > > thread) is the only place where we add/remove to the ->qhead list without holding the GlobalMid_Lock.
> > >
> > > I wonder if it is racing against a different thread also modifying
> > > qhead for the same mid, like cifs_delete_mid() for example.
> > >
> >
> > Yes I agree, I was thinking along these same lines of reasoning as I
> > read the code.  I put the latest on the investigation into the bug:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> > illa.redhat.com%2Fshow_bug.cgi%3Fid%3D1654538%23c15&amp;data=02%7C01%7
> > Cpshilov%40microsoft.com%7C3f7f3981d44f4cac5afc08d75316efe8%7C72f988bf
> > 86f141af91ab2d7cd011db47%7C1%7C0%7C637069230291690406&amp;sdata=0P%2Bj
> > YJWfBSHXGUNPZEeZR4W9tOb2%2BGCC1WtDmiyzpEI%3D&amp;reserved=0
> >
> > Just before the crash, when we hit the iteration of retry_list
> > something has gone wrong - the very first mid_entry on retry_list is a
> > garbage address, when normally it should be the last address from the
> > previous call to list_move in the loop above it.
> >
> >
> >  ----- Original Message -----
> > > > From: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > > > To: "David Wysochanski" <dwysocha@redhat.com>
> > > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > > > <sorenson@redhat.com>
> > > > Sent: Thursday, 17 October, 2019 10:17:18 AM
> > > > Subject: Re: list_del corruption while iterating retry_list in
> > > > cifs_reconnect still seen on 5.4-rc3
> > > >
> > > > I can not reproduce this :-(
> > > >
> > > > I have run it for a few hours, restarting samba in a loop with up
> > > > to 30 threads.
> > > >
> >
> > I am not sure if it helps but I have 8 CPUs on my VM.
> > I also have server signing and client signing mandatory.
> > I can send you the smb.conf offline.
> >
> >
> > > >
> > > > Can you check
> > > > 1, If this only reproduce for you for the root of the share or it
> > > > also reproduces for a subdirectory?
> > > > 2, Does it reproduce also if you use "nohandlecache" mount option?
> > > >    This disables the use of cached open of the root handle, i.e.
> > > >    open_shroot()
> > > > 3, When this happens, can you check the content of the mid entry
> > > > and what these fields are:
> > > >    mid->mid_flags, mid->handle (this is a function pointer, what does it
> > > >    point to)
> > > >    mid->command.   Maybe print the whole structure.
> > > >
> >
> > Ok I'll see what I can find out.  So far I am not sure I have
> > identified what else is touching the mid in between the two loops.
> >
> > > > regards
> > > > ronnie sahlberg
> > > >
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "David Wysochanski" <dwysocha@redhat.com>
> > > > > To: "linux-cifs" <linux-cifs@vger.kernel.org>
> > > > > Cc: "Frank Sorenson" <sorenson@redhat.com>
> > > > > Sent: Thursday, 17 October, 2019 5:27:02 AM
> > > > > Subject: list_del corruption while iterating retry_list in
> > > > > cifs_reconnect still seen on 5.4-rc3
> > > > >
> > > > > I think this has been there for a long time, since we first saw
> > > > > this on a 4.18.0 based kernel but I just noticed the bug recently.
> > > > > I just retested on 5.4-rc3 and it's still there.  Easy to repro
> > > > > with a fairly simple but invasive server restart test - takes
> > > > > only maybe a couple minutes on my VM.
> > > > >
> > > > >
> > > > > From Frank Sorenson:
> > > > >
> > > > > mount off a samba server:
> > > > >
> > > > >     # mount //vm1/share /mnt/vm1
> > > > > -overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds
> > > > >
> > > > >
> > > > > on the client, start 10 'find' loops:
> > > > >
> > > > >     # export test_path=/mnt/vm1
> > > > >     # do_find() { while true ; do find $test_path >/dev/null
> > > > > 2>&1 ; done }
> > > > >
> > > > >     # for i in {1..10} ; do do_find & done
> > > > >
> > > > >
> > > > > optional:  also start something to monitor for when the hang occurs:
> > > > >
> > > > >     # while true ; do count=$(grep smb2_reconnect /proc/*/stack
> > > > > -A3 | grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
> > > > > reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ;
> > > > > sleep 2 ; done
> > > > >
> > > > >
> > > > >
> > > > > On the samba server:  restart smb.service (loop it in case it
> > > > > requires more than one restart):
> > > > >
> > > > >     # while true ; do echo "$(date): restarting" ; systemctl
> > > > > restart smb.service ; sleep 5 ; done | tee
> > > > > /var/tmp/smb_restart_log.out
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > [  430.454897] list_del corruption. prev->next should be
> > > > > ffff98d3a8f316c0, but was 2e885cb266355469 [  430.464668]
> > > > > ------------[ cut here ]------------ [  430.466569] kernel BUG
> > > > > at lib/list_debug.c:51!
> > > > > [  430.468476] invalid opcode: 0000 [#1] SMP PTI [  430.470286]
> > > > > CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted
> > > > > 5.4.0-rc3+ #19 [  430.473472] Hardware name: Red Hat KVM, BIOS
> > > > > 0.5.1 01/01/2011 [  430.475872] RIP:
> > > > > 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > > [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78
> > > > > 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f
> > > > > 15 8e e8 32
> > > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3
> > > > > c5 ff 0f 0b [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS:
> > > > > 00010246 [  430.487665] RAX: 0000000000000054 RBX:
> > > > > ffff98d3aabb8800 RCX:
> > > > > 0000000000000000
> > > > > [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > > > ffff98d3b7a17908
> > > > > [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > > > 0000000000000285
> > > > > [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > > > ffff98d3aabb89c0
> > > > > [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > > > ffff98d3b24c4480
> > > > > [  430.501981] FS:  0000000000000000(0000)
> > > > > GS:ffff98d3b7a00000(0000)
> > > > > knlGS:0000000000000000
> > > > > [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > > > 00000000000406f0
> > > > > [  430.510426] Call Trace:
> > > > > [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs] [  430.513350]
> > > > > cifs_readv_from_socket+0x220/0x250 [cifs] [  430.515464]
> > > > > cifs_read_from_socket+0x4a/0x70 [cifs] [  430.517452]  ?
> > > > > try_to_wake_up+0x212/0x650 [  430.519122]  ?
> > > > > cifs_small_buf_get+0x16/0x30 [cifs] [  430.521086]  ?
> > > > > allocate_buffers+0x66/0x120 [cifs] [  430.523019]
> > > > > cifs_demultiplex_thread+0xdc/0xc30 [cifs] [  430.525116]
> > > > > kthread+0xfb/0x130 [  430.526421]  ?
> > > > > cifs_handle_standard+0x190/0x190 [cifs] [  430.528514]  ?
> > > > > kthread_park+0x90/0x90 [  430.530019]  ret_from_fork+0x35/0x40 [
> > > > > 430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
> > > > > ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
> > > > > ip6table_mangle ip6table_raw ip6table_security iptable_nat
> > > > > nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
> > > > > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter
> > > > > ebtables ip6table_filter ip6_tables crct10dif_pclmul
> > > > > crc32_pclmul joydev virtio_balloon ghash_clmulni_intel i2c_piix4
> > > > > nfsd nfs_acl lockd auth_rpcgss grace sunrpc xfs libcrc32c
> > > > > virtio_net net_failover crc32c_intel virtio_console serio_raw
> > > > > virtio_blk ata_generic failover pata_acpi qemu_fw_cfg [
> > > > > 430.552782] ---[ end trace c91d4468f8689482 ]--- [  430.554948]
> > > > > RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > > [  430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78
> > > > > 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f
> > > > > 15 8e e8 32
> > > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3
> > > > > c5 ff 0f 0b [  430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS:
> > > > > 00010246 [  430.567181] RAX: 0000000000000054 RBX:
> > > > > ffff98d3aabb8800 RCX:
> > > > > 0000000000000000
> > > > > [  430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > > > ffff98d3b7a17908
> > > > > [  430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > > > 0000000000000285
> > > > > [  430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > > > ffff98d3aabb89c0
> > > > > [  430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > > > ffff98d3b24c4480
> > > > > [  430.581624] FS:  0000000000000000(0000)
> > > > > GS:ffff98d3b7a00000(0000)
> > > > > knlGS:0000000000000000
> > > > > [  430.584881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [  430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > > > 00000000000406f0
> > > > >
> > > > >
> > > > > crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
> > > > > 0xffffffffc062dc26 <cifs_reconnect+0x226>:      movb
> > > > > $0x0,0xbb36b(%rip)        # 0xffffffffc06e8f98 <GlobalMid_Lock>
> > > > > /mnt/build/kernel/fs/cifs/connect.c: 572
> > > > > 0xffffffffc062dc2d <cifs_reconnect+0x22d>:      mov    %r12,%rdi
> > > > > 0xffffffffc062dc30 <cifs_reconnect+0x230>:      callq
> > > > > 0xffffffff8d9d5a20 <mutex_unlock>
> > > > > /mnt/build/kernel/fs/cifs/connect.c: 574
> > > > > 0xffffffffc062dc35 <cifs_reconnect+0x235>:      testb
> > > > > $0x1,0xbb300(%rip)        # 0xffffffffc06e8f3c <cifsFYI>
> > > > > 0xffffffffc062dc3c <cifs_reconnect+0x23c>:      je
> > > > > 0xffffffffc062dc43 <cifs_reconnect+0x243>
> > > > > /mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
> > > > > 0xffffffffc062dc3e <cifs_reconnect+0x23e>:      data32 data32 data32
> > > > > xchg %ax,%ax
> > > > > /mnt/build/kernel/fs/cifs/connect.c: 575
> > > > > 0xffffffffc062dc43 <cifs_reconnect+0x243>:      mov    0x8(%rsp),%rbp
> > > > > 0xffffffffc062dc48 <cifs_reconnect+0x248>:      mov    0x0(%rbp),%r14
> > > > > 0xffffffffc062dc4c <cifs_reconnect+0x24c>:      cmp    %r13,%rbp
> > > > > 0xffffffffc062dc4f <cifs_reconnect+0x24f>:      jne
> > > > > 0xffffffffc062dc56 <cifs_reconnect+0x256>
> > > > > 0xffffffffc062dc51 <cifs_reconnect+0x251>:      jmp
> > > > > 0xffffffffc062dc90 <cifs_reconnect+0x290>
> > > > > 0xffffffffc062dc53 <cifs_reconnect+0x253>:      mov    %rax,%r14
> > > > > /mnt/build/kernel/./include/linux/list.h: 190
> > > > > 0xffffffffc062dc56 <cifs_reconnect+0x256>:      mov    %rbp,%rdi
> > > > > 0xffffffffc062dc59 <cifs_reconnect+0x259>:      callq
> > > > > 0xffffffff8d4e6b00 <__list_del_entry_valid>
> > > > > 0xffffffffc062dc5e <cifs_reconnect+0x25e>:      test   %al,%al
> > > > >
> > > > >
> > > > > fs/cifs/connect.c
> > > > > 566         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > > 567         if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > > > 568             mid_entry->mid_state = MID_RETRY_NEEDED;
> > > > > 569         list_move(&mid_entry->qhead, &retry_list);
> > > > > 570     }
> > > > > 571     spin_unlock(&GlobalMid_Lock);
> > > > > 572     mutex_unlock(&server->srv_mutex);
> > > > > 573
> > > > > 574     cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > > > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > > 577         list_del_init(&mid_entry->qhead);
> > > > > 578         mid_entry->callback(mid_entry);
> > > > > 579     }
> > > > > 580
> > > > > 581     if (cifs_rdma_enabled(server)) {
> > > > > 582         mutex_lock(&server->srv_mutex);
> > > > > 583         smbd_destroy(server);
> > > > > 584         mutex_unlock(&server->srv_mutex);
> > > > >
> > > >
> >
> >
> >
> > --
> > Dave Wysochanski
> > Principal Software Maintenance Engineer
> > T: 919-754-4024
>
>
>
> --
> Dave Wysochanski
> Principal Software Maintenance Engineer
> T: 919-754-4024

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 19:23             ` David Wysochanski
@ 2019-10-17 19:58               ` Pavel Shilovskiy
  2019-10-17 20:34                 ` David Wysochanski
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Shilovskiy @ 2019-10-17 19:58 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Ronnie Sahlberg, linux-cifs, Frank Sorenson


The patch looks good. Let's see if it fixes the issue in your setup.

--
Best regards,
Pavel Shilovsky

-----Original Message-----
From: David Wysochanski <dwysocha@redhat.com> 
Sent: Thursday, October 17, 2019 12:23 PM
To: Pavel Shilovskiy <pshilov@microsoft.com>
Cc: Ronnie Sahlberg <lsahlber@redhat.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
On Thu, Oct 17, 2019 at 2:29 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
>
> The similar solution of taking an extra reference should apply to the case of reconnect as well. The reference should be taken during the process of moving mid entries to the private list. Once a callback completes, such a reference should be put back thus freeing the mid.
>

Ah ok very good.  The above seems consistent with the traces I'm seeing of the race.
I am going to test this patch as it sounds like what you're describing and similar to what Ronnie suggested earlier:

--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
+               kref_get(&mid_entry->refcount);
                if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
                        mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list); @@ -576,6 +577,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
                list_del_init(&mid_entry->qhead);
                mid_entry->callback(mid_entry);
+               cifs_mid_q_entry_release(mid_entry);
        }

        if (cifs_rdma_enabled(server)) {


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 19:58               ` Pavel Shilovskiy
@ 2019-10-17 20:34                 ` David Wysochanski
  2019-10-17 21:44                   ` Ronnie Sahlberg
  0 siblings, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-17 20:34 UTC (permalink / raw)
  To: Pavel Shilovskiy; +Cc: Ronnie Sahlberg, linux-cifs, Frank Sorenson

Unfortunately that did not fix the list_del corruption.
It did seem to run longer but I'm not sure runtime is meaningful.

[ 1424.215537] list_del corruption. prev->next should be
ffff8d9b74c84d80, but was a6787a60550c54a9
[ 1424.232688] ------------[ cut here ]------------
[ 1424.234535] kernel BUG at lib/list_debug.c:51!
[ 1424.236502] invalid opcode: 0000 [#1] SMP PTI
[ 1424.238334] CPU: 5 PID: 10212 Comm: cifsd Kdump: loaded Not tainted
5.4.0-rc3-fix1+ #33
[ 1424.241489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1424.243770] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[ 1424.245972] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff
0f 0b
[ 1424.253409] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246
[ 1424.255576] RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000
[ 1424.258504] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI: ffff8d9b77b57908
[ 1424.261404] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908 R09: 0000000000000280
[ 1424.264336] R10: ffff9a12404b3bf0 R11: ffff9a12404b3bf5 R12: ffff8d9b6ece11c0
[ 1424.267285] R13: ffff9a12404b3d48 R14: a6787a60550c54a9 R15: ffff8d9b6fcec300
[ 1424.270191] FS:  0000000000000000(0000) GS:ffff8d9b77b40000(0000)
knlGS:0000000000000000
[ 1424.273491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1424.275831] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4: 00000000000406e0
[ 1424.278733] Call Trace:
[ 1424.279844]  cifs_reconnect+0x268/0x620 [cifs]
[ 1424.281723]  cifs_readv_from_socket+0x220/0x250 [cifs]
[ 1424.283876]  cifs_read_from_socket+0x4a/0x70 [cifs]
[ 1424.285922]  ? try_to_wake_up+0x212/0x650
[ 1424.287595]  ? cifs_small_buf_get+0x16/0x30 [cifs]
[ 1424.289520]  ? allocate_buffers+0x66/0x120 [cifs]
[ 1424.291421]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[ 1424.293506]  kthread+0xfb/0x130
[ 1424.294789]  ? cifs_handle_standard+0x190/0x190 [cifs]
[ 1424.296833]  ? kthread_park+0x90/0x90
[ 1424.298295]  ret_from_fork+0x35/0x40
[ 1424.299717] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel virtio_balloon joydev i2c_piix4 nfsd nfs_acl lockd
auth_rpcgss grace sunrpc xfs libcrc32c crc32c_intel virtio_net
net_failover ata_generic serio_raw virtio_console virtio_blk failover
pata_acpi qemu_fw_cfg
[ 1424.322374] ---[ end trace 214af7e68b58e94b ]---
[ 1424.324305] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[ 1424.326551] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff
0f 0b
[ 1424.333874] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246
[ 1424.335976] RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000
[ 1424.338842] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI: ffff8d9b77b57908
[ 1424.341668] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908 R09: 0000000000000280
[ 1424.344511] R10: ffff9a12404b3bf0 R11: ffff9a12404b3bf5 R12: ffff8d9b6ece11c0
[ 1424.347343] R13: ffff9a12404b3d48 R14: a6787a60550c54a9 R15: ffff8d9b6fcec300
[ 1424.350184] FS:  0000000000000000(0000) GS:ffff8d9b77b40000(0000)
knlGS:0000000000000000
[ 1424.353394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1424.355699] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4: 00000000000406e0

On Thu, Oct 17, 2019 at 3:58 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
>
>
> The patch looks good. Let's see if it fixes the issue in your setup.
>
> --
> Best regards,
> Pavel Shilovsky
>
> -----Original Message-----
> From: David Wysochanski <dwysocha@redhat.com>
> Sent: Thursday, October 17, 2019 12:23 PM
> To: Pavel Shilovskiy <pshilov@microsoft.com>
> Cc: Ronnie Sahlberg <lsahlber@redhat.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> On Thu, Oct 17, 2019 at 2:29 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
> >
> > The similar solution of taking an extra reference should apply to the case of reconnect as well. The reference should be taken during the process of moving mid entries to the private list. Once a callback completes, such a reference should be put back thus freeing the mid.
> >
>
> Ah ok very good.  The above seems consistent with the traces I'm seeing of the race.
> I am going to test this patch as it sounds like what you're describing and similar to what Ronnie suggested earlier:
>
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> +               kref_get(&mid_entry->refcount);
>                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
>                         mid_entry->mid_state = MID_RETRY_NEEDED;
>                 list_move(&mid_entry->qhead, &retry_list); @@ -576,6 +577,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                 list_del_init(&mid_entry->qhead);
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>
>         if (cifs_rdma_enabled(server)) {
>


-- 
Dave Wysochanski
Principal Software Maintenance Engineer
T: 919-754-4024

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 20:34                 ` David Wysochanski
@ 2019-10-17 21:44                   ` Ronnie Sahlberg
  2019-10-17 22:02                     ` Pavel Shilovskiy
  0 siblings, 1 reply; 31+ messages in thread
From: Ronnie Sahlberg @ 2019-10-17 21:44 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Pavel Shilovskiy, linux-cifs, Frank Sorenson

Dave, Pavel

If it takes longer to trigger it might indicate we are on the right path but there are additional places to fix.

I still think you also need to protect the list mutate functions as well using the global mutex, so something like this :

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index bdea4b3e8005..16705a855818 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
+               kref_get(&mid_entry->refcount);
                if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
                        mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list);
@@ -572,11 +573,18 @@ cifs_reconnect(struct TCP_Server_Info *server)
        mutex_unlock(&server->srv_mutex);
 
        cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
+       spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &retry_list) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
                list_del_init(&mid_entry->qhead);
+               spin_unlock(&GlobalMid_Lock);
+
                mid_entry->callback(mid_entry);
+               cifs_mid_q_entry_release(mid_entry);
+
+               spin_lock(&GlobalMid_Lock);
        }
+       spin_unlock(&GlobalMid_Lock);
 
        if (cifs_rdma_enabled(server)) {
                mutex_lock(&server->srv_mutex);


----- Original Message -----
From: "David Wysochanski" <dwysocha@redhat.com>
To: "Pavel Shilovskiy" <pshilov@microsoft.com>
Cc: "Ronnie Sahlberg" <lsahlber@redhat.com>, "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
Sent: Friday, 18 October, 2019 6:34:53 AM
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3

Unfortunately that did not fix the list_del corruption.
It did seem to run longer but I'm not sure runtime is meaningful.

[ 1424.215537] list_del corruption. prev->next should be
ffff8d9b74c84d80, but was a6787a60550c54a9
[ 1424.232688] ------------[ cut here ]------------
[ 1424.234535] kernel BUG at lib/list_debug.c:51!
[ 1424.236502] invalid opcode: 0000 [#1] SMP PTI
[ 1424.238334] CPU: 5 PID: 10212 Comm: cifsd Kdump: loaded Not tainted
5.4.0-rc3-fix1+ #33
[ 1424.241489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1424.243770] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[ 1424.245972] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff
0f 0b
[ 1424.253409] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246
[ 1424.255576] RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000
[ 1424.258504] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI: ffff8d9b77b57908
[ 1424.261404] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908 R09: 0000000000000280
[ 1424.264336] R10: ffff9a12404b3bf0 R11: ffff9a12404b3bf5 R12: ffff8d9b6ece11c0
[ 1424.267285] R13: ffff9a12404b3d48 R14: a6787a60550c54a9 R15: ffff8d9b6fcec300
[ 1424.270191] FS:  0000000000000000(0000) GS:ffff8d9b77b40000(0000)
knlGS:0000000000000000
[ 1424.273491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1424.275831] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4: 00000000000406e0
[ 1424.278733] Call Trace:
[ 1424.279844]  cifs_reconnect+0x268/0x620 [cifs]
[ 1424.281723]  cifs_readv_from_socket+0x220/0x250 [cifs]
[ 1424.283876]  cifs_read_from_socket+0x4a/0x70 [cifs]
[ 1424.285922]  ? try_to_wake_up+0x212/0x650
[ 1424.287595]  ? cifs_small_buf_get+0x16/0x30 [cifs]
[ 1424.289520]  ? allocate_buffers+0x66/0x120 [cifs]
[ 1424.291421]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[ 1424.293506]  kthread+0xfb/0x130
[ 1424.294789]  ? cifs_handle_standard+0x190/0x190 [cifs]
[ 1424.296833]  ? kthread_park+0x90/0x90
[ 1424.298295]  ret_from_fork+0x35/0x40
[ 1424.299717] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel virtio_balloon joydev i2c_piix4 nfsd nfs_acl lockd
auth_rpcgss grace sunrpc xfs libcrc32c crc32c_intel virtio_net
net_failover ata_generic serio_raw virtio_console virtio_blk failover
pata_acpi qemu_fw_cfg
[ 1424.322374] ---[ end trace 214af7e68b58e94b ]---
[ 1424.324305] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[ 1424.326551] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff
0f 0b
[ 1424.333874] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246
[ 1424.335976] RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000
[ 1424.338842] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI: ffff8d9b77b57908
[ 1424.341668] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908 R09: 0000000000000280
[ 1424.344511] R10: ffff9a12404b3bf0 R11: ffff9a12404b3bf5 R12: ffff8d9b6ece11c0
[ 1424.347343] R13: ffff9a12404b3d48 R14: a6787a60550c54a9 R15: ffff8d9b6fcec300
[ 1424.350184] FS:  0000000000000000(0000) GS:ffff8d9b77b40000(0000)
knlGS:0000000000000000
[ 1424.353394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1424.355699] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4: 00000000000406e0

On Thu, Oct 17, 2019 at 3:58 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
>
>
> The patch looks good. Let's see if it fixes the issue in your setup.
>
> --
> Best regards,
> Pavel Shilovsky
>
> -----Original Message-----
> From: David Wysochanski <dwysocha@redhat.com>
> Sent: Thursday, October 17, 2019 12:23 PM
> To: Pavel Shilovskiy <pshilov@microsoft.com>
> Cc: Ronnie Sahlberg <lsahlber@redhat.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> On Thu, Oct 17, 2019 at 2:29 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
> >
> > The similar solution of taking an extra reference should apply to the case of reconnect as well. The reference should be taken during the process of moving mid entries to the private list. Once a callback completes, such a reference should be put back thus freeing the mid.
> >
>
> Ah ok very good.  The above seems consistent with the traces I'm seeing of the race.
> I am going to test this patch as it sounds like what you're describing and similar to what Ronnie suggested earlier:
>
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> +               kref_get(&mid_entry->refcount);
>                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
>                         mid_entry->mid_state = MID_RETRY_NEEDED;
>                 list_move(&mid_entry->qhead, &retry_list); @@ -576,6 +577,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                 list_del_init(&mid_entry->qhead);
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>
>         if (cifs_rdma_enabled(server)) {
>


-- 
Dave Wysochanski
Principal Software Maintenance Engineer
T: 919-754-4024

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 21:44                   ` Ronnie Sahlberg
@ 2019-10-17 22:02                     ` Pavel Shilovskiy
  2019-10-17 22:53                       ` Ronnie Sahlberg
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Shilovskiy @ 2019-10-17 22:02 UTC (permalink / raw)
  To: Ronnie Sahlberg, David Wysochanski; +Cc: linux-cifs, Frank Sorenson

Ok, looking at cifs_delete_mid():

 172 void
 173 cifs_delete_mid(struct mid_q_entry *mid)
 174 {
 175 >-------spin_lock(&GlobalMid_Lock);
 176 >-------list_del_init(&mid->qhead);
 177 >-------mid->mid_flags |= MID_DELETED;
 178 >-------spin_unlock(&GlobalMid_Lock);
 179
 180 >-------DeleteMidQEntry(mid);
 181 }

So, regardless of us taking references on the mid itself or not, the mid might be removed from the list. I also don't think taking GlobalMid_Lock would help much because the next mid in the list might be deleted from the list by another process while cifs_reconnect is calling callback for the current mid.

Instead, shouldn't we try marking the mid as being reconnected? Once we took a reference, let's mark mid->mid_flags with a new flag MID_RECONNECT under the GlobalMid_Lock. Then modify cifs_delete_mid() to check for this flag and do not remove the mid from the list if the flag exists.

--
Best regards,
Pavel Shilovsky

-----Original Message-----
From: Ronnie Sahlberg <lsahlber@redhat.com> 
Sent: Thursday, October 17, 2019 2:45 PM
To: David Wysochanski <dwysocha@redhat.com>
Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3

Dave, Pavel

If it takes longer to trigger it might indicate we are on the right path but there are additional places to fix.

I still think you also need to protect the list mutate functions as well using the global mutex, so something like this :

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index bdea4b3e8005..16705a855818 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
+               kref_get(&mid_entry->refcount);
                if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
                        mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list); @@ -572,11 +573,18 @@ cifs_reconnect(struct TCP_Server_Info *server)
        mutex_unlock(&server->srv_mutex);
 
        cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
+       spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &retry_list) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
                list_del_init(&mid_entry->qhead);
+               spin_unlock(&GlobalMid_Lock);
+
                mid_entry->callback(mid_entry);
+               cifs_mid_q_entry_release(mid_entry);
+
+               spin_lock(&GlobalMid_Lock);
        }
+       spin_unlock(&GlobalMid_Lock);
 
        if (cifs_rdma_enabled(server)) {
                mutex_lock(&server->srv_mutex);


----- Original Message -----
From: "David Wysochanski" <dwysocha@redhat.com>
To: "Pavel Shilovskiy" <pshilov@microsoft.com>
Cc: "Ronnie Sahlberg" <lsahlber@redhat.com>, "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
Sent: Friday, 18 October, 2019 6:34:53 AM
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3

Unfortunately that did not fix the list_del corruption.
It did seem to run longer but I'm not sure runtime is meaningful.

[ 1424.215537] list_del corruption. prev->next should be ffff8d9b74c84d80, but was a6787a60550c54a9 [ 1424.232688] ------------[ cut here ]------------ [ 1424.234535] kernel BUG at lib/list_debug.c:51!
[ 1424.236502] invalid opcode: 0000 [#1] SMP PTI [ 1424.238334] CPU: 5 PID: 10212 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3-fix1+ #33 [ 1424.241489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1424.243770] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[ 1424.245972] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f 0b [ 1424.253409] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.255576] RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [ 1424.258504] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI: ffff8d9b77b57908 [ 1424.261404] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908 R09: 0000000000000280 [ 1424.264336] R10: ffff9a12404b3bf0 R11: ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.267285] R13: ffff9a12404b3d48 R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.270191] FS:  0000000000000000(0000) GS:ffff8d9b77b40000(0000)
knlGS:0000000000000000
[ 1424.273491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1424.275831] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4: 00000000000406e0 [ 1424.278733] Call Trace:
[ 1424.279844]  cifs_reconnect+0x268/0x620 [cifs] [ 1424.281723]  cifs_readv_from_socket+0x220/0x250 [cifs] [ 1424.283876]  cifs_read_from_socket+0x4a/0x70 [cifs] [ 1424.285922]  ? try_to_wake_up+0x212/0x650 [ 1424.287595]  ? cifs_small_buf_get+0x16/0x30 [cifs] [ 1424.289520]  ? allocate_buffers+0x66/0x120 [cifs] [ 1424.291421]  cifs_demultiplex_thread+0xdc/0xc30 [cifs] [ 1424.293506]  kthread+0xfb/0x130 [ 1424.294789]  ? cifs_handle_standard+0x190/0x190 [cifs] [ 1424.296833]  ? kthread_park+0x90/0x90 [ 1424.298295]  ret_from_fork+0x35/0x40 [ 1424.299717] Modules linked in: cifs libdes libarc4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon joydev i2c_piix4 nfsd nfs_acl lockd auth_rpcgss grace sunrpc xfs libcrc32c crc32c_intel virtio_net net_failover ata_generic serio_raw virtio_console virtio_blk failover pata_acpi qemu_fw_cfg [ 1424.322374] ---[ end trace 214af7e68b58e94b ]--- [ 1424.324305] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[ 1424.326551] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f 0b [ 1424.333874] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.335976] RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [ 1424.338842] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI: ffff8d9b77b57908 [ 1424.341668] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908 R09: 0000000000000280 [ 1424.344511] R10: ffff9a12404b3bf0 R11: ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.347343] R13: ffff9a12404b3d48 R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.350184] FS:  0000000000000000(0000) GS:ffff8d9b77b40000(0000)
knlGS:0000000000000000
[ 1424.353394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1424.355699] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4: 00000000000406e0

On Thu, Oct 17, 2019 at 3:58 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
>
>
> The patch looks good. Let's see if it fixes the issue in your setup.
>
> --
> Best regards,
> Pavel Shilovsky
>
> -----Original Message-----
> From: David Wysochanski <dwysocha@redhat.com>
> Sent: Thursday, October 17, 2019 12:23 PM
> To: Pavel Shilovskiy <pshilov@microsoft.com>
> Cc: Ronnie Sahlberg <lsahlber@redhat.com>; linux-cifs 
> <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> Subject: Re: list_del corruption while iterating retry_list in 
> cifs_reconnect still seen on 5.4-rc3 On Thu, Oct 17, 2019 at 2:29 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
> >
> > The similar solution of taking an extra reference should apply to the case of reconnect as well. The reference should be taken during the process of moving mid entries to the private list. Once a callback completes, such a reference should be put back thus freeing the mid.
> >
>
> Ah ok very good.  The above seems consistent with the traces I'm seeing of the race.
> I am going to test this patch as it sounds like what you're describing and similar to what Ronnie suggested earlier:
>
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, 
> qhead);
> +               kref_get(&mid_entry->refcount);
>                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
>                         mid_entry->mid_state = MID_RETRY_NEEDED;
>                 list_move(&mid_entry->qhead, &retry_list); @@ -576,6 +577,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                 list_del_init(&mid_entry->qhead);
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>
>         if (cifs_rdma_enabled(server)) {
>


--
Dave Wysochanski
Principal Software Maintenance Engineer
T: 919-754-4024

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 22:02                     ` Pavel Shilovskiy
@ 2019-10-17 22:53                       ` Ronnie Sahlberg
  2019-10-17 23:20                         ` Pavel Shilovskiy
  2019-10-18  8:16                         ` David Wysochanski
  0 siblings, 2 replies; 31+ messages in thread
From: Ronnie Sahlberg @ 2019-10-17 22:53 UTC (permalink / raw)
  To: Pavel Shilovskiy; +Cc: David Wysochanski, linux-cifs, Frank Sorenson





----- Original Message -----
> From: "Pavel Shilovskiy" <pshilov@microsoft.com>
> To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "David Wysochanski" <dwysocha@redhat.com>
> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> Sent: Friday, 18 October, 2019 8:02:23 AM
> Subject: RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> 
> Ok, looking at cifs_delete_mid():
> 
>  172 void
>  173 cifs_delete_mid(struct mid_q_entry *mid)
>  174 {
>  175 >-------spin_lock(&GlobalMid_Lock);
>  176 >-------list_del_init(&mid->qhead);
>  177 >-------mid->mid_flags |= MID_DELETED;
>  178 >-------spin_unlock(&GlobalMid_Lock);
>  179
>  180 >-------DeleteMidQEntry(mid);
>  181 }
> 
> So, regardless of us taking references on the mid itself or not, the mid
> might be removed from the list. I also don't think taking GlobalMid_Lock
> would help much because the next mid in the list might be deleted from the
> list by another process while cifs_reconnect is calling callback for the
> current mid.
> 
> Instead, shouldn't we try marking the mid as being reconnected? Once we took
> a reference, let's mark mid->mid_flags with a new flag MID_RECONNECT under
> the GlobalMid_Lock. Then modify cifs_delete_mid() to check for this flag and
> do not remove the mid from the list if the flag exists.

That could work. But then we should also use that flag to suppress the other places where we do a list_del*, so something like this ?

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 50dfd9049370..b324fff33e53 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
 /* Flags */
 #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response */
 #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
+#define   MID_RECONNECT          4 /* Mid is being used during reconnect */
 
 /* Types of response buffer returned from SendReceive2 */
 #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index bdea4b3e8005..b142bd2a3ef5 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
+               kref_get(&mid_entry->refcount);
+               mid_entry->mid_flags |= MID_RECONNECT;
                if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
                        mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list);
@@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
        list_for_each_safe(tmp, tmp2, &retry_list) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
                list_del_init(&mid_entry->qhead);
+
                mid_entry->callback(mid_entry);
+               cifs_mid_q_entry_release(mid_entry);
        }
 
        if (cifs_rdma_enabled(server)) {
@@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
        if (mid->mid_flags & MID_DELETED)
                printk_once(KERN_WARNING
                            "trying to dequeue a deleted mid\n");
-       else
+       else if (!(mid->mid_flags & MID_RECONNECT))
                list_del_init(&mid->qhead);
        spin_unlock(&GlobalMid_Lock);
 }
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 308ad0f495e1..ba4b5ab9cf35 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -173,7 +173,8 @@ void
 cifs_delete_mid(struct mid_q_entry *mid)
 {
        spin_lock(&GlobalMid_Lock);
-       list_del_init(&mid->qhead);
+       if (!(mid->mid_flags & MID_RECONNECT))
+               list_del_init(&mid->qhead);
        mid->mid_flags |= MID_DELETED;
        spin_unlock(&GlobalMid_Lock);
 
@@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
                rc = -EHOSTDOWN;
                break;
        default:
-               list_del_init(&mid->qhead);
+               if (!(mid->mid_flags & MID_RECONNECT))
+                       list_del_init(&mid->qhead);
                cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
                         __func__, mid->mid, mid->mid_state);
                rc = -EIO;


> 
> --
> Best regards,
> Pavel Shilovsky
> 
> -----Original Message-----
> From: Ronnie Sahlberg <lsahlber@redhat.com>
> Sent: Thursday, October 17, 2019 2:45 PM
> To: David Wysochanski <dwysocha@redhat.com>
> Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs
> <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect
> still seen on 5.4-rc3
> 
> Dave, Pavel
> 
> If it takes longer to trigger it might indicate we are on the right path but
> there are additional places to fix.
> 
> I still think you also need to protect the list mutate functions as well
> using the global mutex, so something like this :
> 
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> bdea4b3e8005..16705a855818 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> +               kref_get(&mid_entry->refcount);
>                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
>                         mid_entry->mid_state = MID_RETRY_NEEDED;
>                 list_move(&mid_entry->qhead, &retry_list); @@ -572,11 +573,18
>                 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         mutex_unlock(&server->srv_mutex);
>  
>         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> +       spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                 list_del_init(&mid_entry->qhead);
> +               spin_unlock(&GlobalMid_Lock);
> +
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
> +
> +               spin_lock(&GlobalMid_Lock);
>         }
> +       spin_unlock(&GlobalMid_Lock);
>  
>         if (cifs_rdma_enabled(server)) {
>                 mutex_lock(&server->srv_mutex);
> 
> 
> ----- Original Message -----
> From: "David Wysochanski" <dwysocha@redhat.com>
> To: "Pavel Shilovskiy" <pshilov@microsoft.com>
> Cc: "Ronnie Sahlberg" <lsahlber@redhat.com>, "linux-cifs"
> <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> Sent: Friday, 18 October, 2019 6:34:53 AM
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect
> still seen on 5.4-rc3
> 
> Unfortunately that did not fix the list_del corruption.
> It did seem to run longer but I'm not sure runtime is meaningful.
> 
> [ 1424.215537] list_del corruption. prev->next should be ffff8d9b74c84d80,
> but was a6787a60550c54a9 [ 1424.232688] ------------[ cut here ]------------
> [ 1424.234535] kernel BUG at lib/list_debug.c:51!
> [ 1424.236502] invalid opcode: 0000 [#1] SMP PTI [ 1424.238334] CPU: 5 PID:
> 10212 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3-fix1+ #33 [
> 1424.241489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [
> 1424.243770] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> [ 1424.245972] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f 0b
> [ 1424.253409] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.255576]
> RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> 1424.258504] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> ffff8d9b77b57908 [ 1424.261404] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908
> R09: 0000000000000280 [ 1424.264336] R10: ffff9a12404b3bf0 R11:
> ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.267285] R13: ffff9a12404b3d48
> R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.270191] FS:
> 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> knlGS:0000000000000000
> [ 1424.273491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> 1424.275831] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> 00000000000406e0 [ 1424.278733] Call Trace:
> [ 1424.279844]  cifs_reconnect+0x268/0x620 [cifs] [ 1424.281723]
> cifs_readv_from_socket+0x220/0x250 [cifs] [ 1424.283876]
> cifs_read_from_socket+0x4a/0x70 [cifs] [ 1424.285922]  ?
> try_to_wake_up+0x212/0x650 [ 1424.287595]  ? cifs_small_buf_get+0x16/0x30
> [cifs] [ 1424.289520]  ? allocate_buffers+0x66/0x120 [cifs] [ 1424.291421]
> cifs_demultiplex_thread+0xdc/0xc30 [cifs] [ 1424.293506]
> kthread+0xfb/0x130 [ 1424.294789]  ? cifs_handle_standard+0x190/0x190
> [cifs] [ 1424.296833]  ? kthread_park+0x90/0x90 [ 1424.298295]
> ret_from_fork+0x35/0x40 [ 1424.299717] Modules linked in: cifs libdes
> libarc4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat
> ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat
> nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> virtio_balloon joydev i2c_piix4 nfsd nfs_acl lockd auth_rpcgss grace sunrpc
> xfs libcrc32c crc32c_intel virtio_net net_failover ata_generic serio_raw
> virtio_console virtio_blk failover pata_acpi qemu_fw_cfg [ 1424.322374] ---[
> end trace 214af7e68b58e94b ]--- [ 1424.324305] RIP:
> 0010:__list_del_entry_valid.cold+0x31/0x55
> [ 1424.326551] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f 0b
> [ 1424.333874] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.335976]
> RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> 1424.338842] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> ffff8d9b77b57908 [ 1424.341668] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908
> R09: 0000000000000280 [ 1424.344511] R10: ffff9a12404b3bf0 R11:
> ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.347343] R13: ffff9a12404b3d48
> R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.350184] FS:
> 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> knlGS:0000000000000000
> [ 1424.353394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> 1424.355699] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> 00000000000406e0
> 
> On Thu, Oct 17, 2019 at 3:58 PM Pavel Shilovskiy <pshilov@microsoft.com>
> wrote:
> >
> >
> > The patch looks good. Let's see if it fixes the issue in your setup.
> >
> > --
> > Best regards,
> > Pavel Shilovsky
> >
> > -----Original Message-----
> > From: David Wysochanski <dwysocha@redhat.com>
> > Sent: Thursday, October 17, 2019 12:23 PM
> > To: Pavel Shilovskiy <pshilov@microsoft.com>
> > Cc: Ronnie Sahlberg <lsahlber@redhat.com>; linux-cifs
> > <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > Subject: Re: list_del corruption while iterating retry_list in
> > cifs_reconnect still seen on 5.4-rc3 On Thu, Oct 17, 2019 at 2:29 PM Pavel
> > Shilovskiy <pshilov@microsoft.com> wrote:
> > >
> > > The similar solution of taking an extra reference should apply to the
> > > case of reconnect as well. The reference should be taken during the
> > > process of moving mid entries to the private list. Once a callback
> > > completes, such a reference should be put back thus freeing the mid.
> > >
> >
> > Ah ok very good.  The above seems consistent with the traces I'm seeing of
> > the race.
> > I am going to test this patch as it sounds like what you're describing and
> > similar to what Ronnie suggested earlier:
> >
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         spin_lock(&GlobalMid_Lock);
> >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry,
> > qhead);
> > +               kref_get(&mid_entry->refcount);
> >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> >                 list_move(&mid_entry->qhead, &retry_list); @@ -576,6 +577,7
> >                 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> >                 list_del_init(&mid_entry->qhead);
> >                 mid_entry->callback(mid_entry);
> > +               cifs_mid_q_entry_release(mid_entry);
> >         }
> >
> >         if (cifs_rdma_enabled(server)) {
> >
> 
> 
> --
> Dave Wysochanski
> Principal Software Maintenance Engineer
> T: 919-754-4024
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 22:53                       ` Ronnie Sahlberg
@ 2019-10-17 23:20                         ` Pavel Shilovskiy
  2019-10-17 23:41                           ` Ronnie Sahlberg
  2019-10-18  8:16                         ` David Wysochanski
  1 sibling, 1 reply; 31+ messages in thread
From: Pavel Shilovskiy @ 2019-10-17 23:20 UTC (permalink / raw)
  To: Ronnie Sahlberg
  Cc: David Wysochanski, linux-cifs, Frank Sorenson, Steven French

Agree.

Probably the change in dequeue_mid() is not needed but won't hurt at least - right now dequeue_mid() is being called from the demultiplex thread only, so as cifs_reconnect(). I am wondering how your patch behaves with the repro.

In general, I am starting to think more that we should probably remove a MID immediately from the pending list once we parse MessageId from the response and find the entry in the list. Especially with the recent parallel decryption capability that Steve is working on, we would need to break the above assumption and process the mid entry in another thread. There are some cases where we don't end up removing the MID but for those cases we may simply add the entry back. Anyway, it needs much more thinking and out of the scope of the bugfix being discussed.

--
Best regards,
Pavel Shilovsky

-----Original Message-----
From: Ronnie Sahlberg <lsahlber@redhat.com> 
Sent: Thursday, October 17, 2019 3:54 PM
To: Pavel Shilovskiy <pshilov@microsoft.com>
Cc: David Wysochanski <dwysocha@redhat.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3

That could work. But then we should also use that flag to suppress the other places where we do a list_del*, so something like this ?

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 50dfd9049370..b324fff33e53 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
 /* Flags */
 #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response */
 #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
+#define   MID_RECONNECT          4 /* Mid is being used during reconnect */
 
 /* Types of response buffer returned from SendReceive2 */
 #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index bdea4b3e8005..b142bd2a3ef5 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
+               kref_get(&mid_entry->refcount);
+               mid_entry->mid_flags |= MID_RECONNECT;
                if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
                        mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list); @@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
        list_for_each_safe(tmp, tmp2, &retry_list) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
                list_del_init(&mid_entry->qhead);
+
                mid_entry->callback(mid_entry);
+               cifs_mid_q_entry_release(mid_entry);
        }
 
        if (cifs_rdma_enabled(server)) { @@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
        if (mid->mid_flags & MID_DELETED)
                printk_once(KERN_WARNING
                            "trying to dequeue a deleted mid\n");
-       else
+       else if (!(mid->mid_flags & MID_RECONNECT))
                list_del_init(&mid->qhead);
        spin_unlock(&GlobalMid_Lock);
 }
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index 308ad0f495e1..ba4b5ab9cf35 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -173,7 +173,8 @@ void
 cifs_delete_mid(struct mid_q_entry *mid)  {
        spin_lock(&GlobalMid_Lock);
-       list_del_init(&mid->qhead);
+       if (!(mid->mid_flags & MID_RECONNECT))
+               list_del_init(&mid->qhead);
        mid->mid_flags |= MID_DELETED;
        spin_unlock(&GlobalMid_Lock);
 
@@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
                rc = -EHOSTDOWN;
                break;
        default:
-               list_del_init(&mid->qhead);
+               if (!(mid->mid_flags & MID_RECONNECT))
+                       list_del_init(&mid->qhead);
                cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
                         __func__, mid->mid, mid->mid_state);
                rc = -EIO;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 23:20                         ` Pavel Shilovskiy
@ 2019-10-17 23:41                           ` Ronnie Sahlberg
  0 siblings, 0 replies; 31+ messages in thread
From: Ronnie Sahlberg @ 2019-10-17 23:41 UTC (permalink / raw)
  To: Pavel Shilovskiy
  Cc: David Wysochanski, linux-cifs, Frank Sorenson, Steven French





----- Original Message -----
> From: "Pavel Shilovskiy" <pshilov@microsoft.com>
> To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> Cc: "David Wysochanski" <dwysocha@redhat.com>, "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> <sorenson@redhat.com>, "Steven French" <Steven.French@microsoft.com>
> Sent: Friday, 18 October, 2019 9:20:51 AM
> Subject: RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> 
> Agree.
> 
> Probably the change in dequeue_mid() is not needed but won't hurt at least -
> right now dequeue_mid() is being called from the demultiplex thread only, so
> as cifs_reconnect(). I am wondering how your patch behaves with the repro.

Thanks.
Dave, can you test with your reproducer if this makes things better?


> 
> In general, I am starting to think more that we should probably remove a MID
> immediately from the pending list once we parse MessageId from the response
> and find the entry in the list. Especially with the recent parallel
> decryption capability that Steve is working on, we would need to break the
> above assumption and process the mid entry in another thread. There are some
> cases where we don't end up removing the MID but for those cases we may
> simply add the entry back. Anyway, it needs much more thinking and out of
> the scope of the bugfix being discussed.

I agree.

Btw, I think we have had bug sin this code since at least the 3.x kernels
so we need a simple fix that is easy to backport to stable and beyond.


> 
> --
> Best regards,
> Pavel Shilovsky
> 
> -----Original Message-----
> From: Ronnie Sahlberg <lsahlber@redhat.com>
> Sent: Thursday, October 17, 2019 3:54 PM
> To: Pavel Shilovskiy <pshilov@microsoft.com>
> Cc: David Wysochanski <dwysocha@redhat.com>; linux-cifs
> <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect
> still seen on 5.4-rc3
> 
> That could work. But then we should also use that flag to suppress the other
> places where we do a list_del*, so something like this ?
> 
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index
> 50dfd9049370..b324fff33e53 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
>  /* Flags */
>  #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response */
>  #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> +#define   MID_RECONNECT          4 /* Mid is being used during reconnect */
>  
>  /* Types of response buffer returned from SendReceive2 */
>  #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> bdea4b3e8005..b142bd2a3ef5 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> +               kref_get(&mid_entry->refcount);
> +               mid_entry->mid_flags |= MID_RECONNECT;
>                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
>                         mid_entry->mid_state = MID_RETRY_NEEDED;
>                 list_move(&mid_entry->qhead, &retry_list); @@ -575,7 +577,9
>                 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                 list_del_init(&mid_entry->qhead);
> +
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>  
>         if (cifs_rdma_enabled(server)) { @@ -895,7 +899,7 @@
>         dequeue_mid(struct mid_q_entry *mid, bool malformed)
>         if (mid->mid_flags & MID_DELETED)
>                 printk_once(KERN_WARNING
>                             "trying to dequeue a deleted mid\n");
> -       else
> +       else if (!(mid->mid_flags & MID_RECONNECT))
>                 list_del_init(&mid->qhead);
>         spin_unlock(&GlobalMid_Lock);
>  }
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index
> 308ad0f495e1..ba4b5ab9cf35 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -173,7 +173,8 @@ void
>  cifs_delete_mid(struct mid_q_entry *mid)  {
>         spin_lock(&GlobalMid_Lock);
> -       list_del_init(&mid->qhead);
> +       if (!(mid->mid_flags & MID_RECONNECT))
> +               list_del_init(&mid->qhead);
>         mid->mid_flags |= MID_DELETED;
>         spin_unlock(&GlobalMid_Lock);
>  
> @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct
> TCP_Server_Info *server)
>                 rc = -EHOSTDOWN;
>                 break;
>         default:
> -               list_del_init(&mid->qhead);
> +               if (!(mid->mid_flags & MID_RECONNECT))
> +                       list_del_init(&mid->qhead);
>                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu
>                 state=%d\n",
>                          __func__, mid->mid, mid->mid_state);
>                 rc = -EIO;
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-17 22:53                       ` Ronnie Sahlberg
  2019-10-17 23:20                         ` Pavel Shilovskiy
@ 2019-10-18  8:16                         ` David Wysochanski
  2019-10-18  9:27                           ` Ronnie Sahlberg
  2019-10-19  9:44                           ` list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 Ronnie Sahlberg
  1 sibling, 2 replies; 31+ messages in thread
From: David Wysochanski @ 2019-10-18  8:16 UTC (permalink / raw)
  To: Ronnie Sahlberg; +Cc: Pavel Shilovskiy, linux-cifs, Frank Sorenson

On Thu, Oct 17, 2019 at 6:53 PM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
>
>
>
>
>
> ----- Original Message -----
> > From: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "David Wysochanski" <dwysocha@redhat.com>
> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> > Sent: Friday, 18 October, 2019 8:02:23 AM
> > Subject: RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> >
> > Ok, looking at cifs_delete_mid():
> >
> >  172 void
> >  173 cifs_delete_mid(struct mid_q_entry *mid)
> >  174 {
> >  175 >-------spin_lock(&GlobalMid_Lock);
> >  176 >-------list_del_init(&mid->qhead);
> >  177 >-------mid->mid_flags |= MID_DELETED;
> >  178 >-------spin_unlock(&GlobalMid_Lock);
> >  179
> >  180 >-------DeleteMidQEntry(mid);
> >  181 }
> >
> > So, regardless of us taking references on the mid itself or not, the mid
> > might be removed from the list. I also don't think taking GlobalMid_Lock
> > would help much because the next mid in the list might be deleted from the
> > list by another process while cifs_reconnect is calling callback for the
> > current mid.
> >

Yes the above is consistent with my tracing the crash after the first
initial refcount patch was applied.
After the simple refcount patch, when iterating the retry_loop, it was
processing an orphaned list with a single item over and over and
eventually ran itself down to refcount == 0 and crashed like before.


> > Instead, shouldn't we try marking the mid as being reconnected? Once we took
> > a reference, let's mark mid->mid_flags with a new flag MID_RECONNECT under
> > the GlobalMid_Lock. Then modify cifs_delete_mid() to check for this flag and
> > do not remove the mid from the list if the flag exists.
>
> That could work. But then we should also use that flag to suppress the other places where we do a list_del*, so something like this ?
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 50dfd9049370..b324fff33e53 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
>  /* Flags */
>  #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response */
>  #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> +#define   MID_RECONNECT          4 /* Mid is being used during reconnect */
>
Do we need this extra flag?  Can just use  mid_state ==
MID_RETRY_NEEDED in the necessary places?


>  /* Types of response buffer returned from SendReceive2 */
>  #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index bdea4b3e8005..b142bd2a3ef5 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> +               kref_get(&mid_entry->refcount);
> +               mid_entry->mid_flags |= MID_RECONNECT;
>                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
>                         mid_entry->mid_state = MID_RETRY_NEEDED;

What happens if the state is wrong going in there, and it is not set
to MID_RETRY_NEEDED, but yet we queue up the retry_list and run it
below?
Should the above 'if' check for MID_REQUEST_SUBMITTED be a WARN_ON
followed by unconditionally setting the state?

WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
/* Unconditionally set MID_RETRY_NEEDED */
mid_etnry->mid_state = MID_RETRY_NEEDED;


>                 list_move(&mid_entry->qhead, &retry_list);
> @@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                 list_del_init(&mid_entry->qhead);
> +
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>
>         if (cifs_rdma_enabled(server)) {
> @@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
>         if (mid->mid_flags & MID_DELETED)
>                 printk_once(KERN_WARNING
>                             "trying to dequeue a deleted mid\n");
> -       else
> +       else if (!(mid->mid_flags & MID_RECONNECT))

Instead of the above,

 -       else
+          else if (mid_entry->mid_state == MID_RETRY_NEEDED)
                  list_del_init(&mid->qhead);


>         spin_unlock(&GlobalMid_Lock);
>  }
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 308ad0f495e1..ba4b5ab9cf35 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -173,7 +173,8 @@ void
>  cifs_delete_mid(struct mid_q_entry *mid)
>  {
>         spin_lock(&GlobalMid_Lock);
> -       list_del_init(&mid->qhead);
> +       if (!(mid->mid_flags & MID_RECONNECT))
> +               list_del_init(&mid->qhead);

Same check as above.


>         mid->mid_flags |= MID_DELETED;
>         spin_unlock(&GlobalMid_Lock);
>
> @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
>                 rc = -EHOSTDOWN;
>                 break;
>         default:
> -               list_del_init(&mid->qhead);
> +               if (!(mid->mid_flags & MID_RECONNECT))
> +                       list_del_init(&mid->qhead);

Same check as above.

>                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
>                          __func__, mid->mid, mid->mid_state);
>                 rc = -EIO;
>
>
> >
> > --
> > Best regards,
> > Pavel Shilovsky
> >
> > -----Original Message-----
> > From: Ronnie Sahlberg <lsahlber@redhat.com>
> > Sent: Thursday, October 17, 2019 2:45 PM
> > To: David Wysochanski <dwysocha@redhat.com>
> > Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs
> > <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect
> > still seen on 5.4-rc3
> >
> > Dave, Pavel
> >
> > If it takes longer to trigger it might indicate we are on the right path but
> > there are additional places to fix.
> >
> > I still think you also need to protect the list mutate functions as well
> > using the global mutex, so something like this :
> >
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> > bdea4b3e8005..16705a855818 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         spin_lock(&GlobalMid_Lock);
> >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > +               kref_get(&mid_entry->refcount);
> >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> >                 list_move(&mid_entry->qhead, &retry_list); @@ -572,11 +573,18
> >                 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         mutex_unlock(&server->srv_mutex);
> >
> >         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > +       spin_lock(&GlobalMid_Lock);
> >         list_for_each_safe(tmp, tmp2, &retry_list) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> >                 list_del_init(&mid_entry->qhead);
> > +               spin_unlock(&GlobalMid_Lock);
> > +
> >                 mid_entry->callback(mid_entry);
> > +               cifs_mid_q_entry_release(mid_entry);
> > +
> > +               spin_lock(&GlobalMid_Lock);
> >         }
> > +       spin_unlock(&GlobalMid_Lock);
> >
> >         if (cifs_rdma_enabled(server)) {
> >                 mutex_lock(&server->srv_mutex);
> >
> >
> > ----- Original Message -----
> > From: "David Wysochanski" <dwysocha@redhat.com>
> > To: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > Cc: "Ronnie Sahlberg" <lsahlber@redhat.com>, "linux-cifs"
> > <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> > Sent: Friday, 18 October, 2019 6:34:53 AM
> > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect
> > still seen on 5.4-rc3
> >
> > Unfortunately that did not fix the list_del corruption.
> > It did seem to run longer but I'm not sure runtime is meaningful.
> >
> > [ 1424.215537] list_del corruption. prev->next should be ffff8d9b74c84d80,
> > but was a6787a60550c54a9 [ 1424.232688] ------------[ cut here ]------------
> > [ 1424.234535] kernel BUG at lib/list_debug.c:51!
> > [ 1424.236502] invalid opcode: 0000 [#1] SMP PTI [ 1424.238334] CPU: 5 PID:
> > 10212 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3-fix1+ #33 [
> > 1424.241489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [
> > 1424.243770] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > [ 1424.245972] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> > b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f 0b
> > [ 1424.253409] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.255576]
> > RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> > 1424.258504] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> > ffff8d9b77b57908 [ 1424.261404] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908
> > R09: 0000000000000280 [ 1424.264336] R10: ffff9a12404b3bf0 R11:
> > ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.267285] R13: ffff9a12404b3d48
> > R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.270191] FS:
> > 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> > knlGS:0000000000000000
> > [ 1424.273491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > 1424.275831] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> > 00000000000406e0 [ 1424.278733] Call Trace:
> > [ 1424.279844]  cifs_reconnect+0x268/0x620 [cifs] [ 1424.281723]
> > cifs_readv_from_socket+0x220/0x250 [cifs] [ 1424.283876]
> > cifs_read_from_socket+0x4a/0x70 [cifs] [ 1424.285922]  ?
> > try_to_wake_up+0x212/0x650 [ 1424.287595]  ? cifs_small_buf_get+0x16/0x30
> > [cifs] [ 1424.289520]  ? allocate_buffers+0x66/0x120 [cifs] [ 1424.291421]
> > cifs_demultiplex_thread+0xdc/0xc30 [cifs] [ 1424.293506]
> > kthread+0xfb/0x130 [ 1424.294789]  ? cifs_handle_standard+0x190/0x190
> > [cifs] [ 1424.296833]  ? kthread_park+0x90/0x90 [ 1424.298295]
> > ret_from_fork+0x35/0x40 [ 1424.299717] Modules linked in: cifs libdes
> > libarc4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat
> > ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat
> > nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
> > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> > ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> > virtio_balloon joydev i2c_piix4 nfsd nfs_acl lockd auth_rpcgss grace sunrpc
> > xfs libcrc32c crc32c_intel virtio_net net_failover ata_generic serio_raw
> > virtio_console virtio_blk failover pata_acpi qemu_fw_cfg [ 1424.322374] ---[
> > end trace 214af7e68b58e94b ]--- [ 1424.324305] RIP:
> > 0010:__list_del_entry_valid.cold+0x31/0x55
> > [ 1424.326551] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> > b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f 0b
> > [ 1424.333874] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.335976]
> > RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> > 1424.338842] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> > ffff8d9b77b57908 [ 1424.341668] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908
> > R09: 0000000000000280 [ 1424.344511] R10: ffff9a12404b3bf0 R11:
> > ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.347343] R13: ffff9a12404b3d48
> > R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.350184] FS:
> > 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> > knlGS:0000000000000000
> > [ 1424.353394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > 1424.355699] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> > 00000000000406e0
> >
> > On Thu, Oct 17, 2019 at 3:58 PM Pavel Shilovskiy <pshilov@microsoft.com>
> > wrote:
> > >
> > >
> > > The patch looks good. Let's see if it fixes the issue in your setup.
> > >
> > > --
> > > Best regards,
> > > Pavel Shilovsky
> > >
> > > -----Original Message-----
> > > From: David Wysochanski <dwysocha@redhat.com>
> > > Sent: Thursday, October 17, 2019 12:23 PM
> > > To: Pavel Shilovskiy <pshilov@microsoft.com>
> > > Cc: Ronnie Sahlberg <lsahlber@redhat.com>; linux-cifs
> > > <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > > Subject: Re: list_del corruption while iterating retry_list in
> > > cifs_reconnect still seen on 5.4-rc3 On Thu, Oct 17, 2019 at 2:29 PM Pavel
> > > Shilovskiy <pshilov@microsoft.com> wrote:
> > > >
> > > > The similar solution of taking an extra reference should apply to the
> > > > case of reconnect as well. The reference should be taken during the
> > > > process of moving mid entries to the private list. Once a callback
> > > > completes, such a reference should be put back thus freeing the mid.
> > > >
> > >
> > > Ah ok very good.  The above seems consistent with the traces I'm seeing of
> > > the race.
> > > I am going to test this patch as it sounds like what you're describing and
> > > similar to what Ronnie suggested earlier:
> > >
> > > --- a/fs/cifs/connect.c
> > > +++ b/fs/cifs/connect.c
> > > @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         spin_lock(&GlobalMid_Lock);
> > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry,
> > > qhead);
> > > +               kref_get(&mid_entry->refcount);
> > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> > >                 list_move(&mid_entry->qhead, &retry_list); @@ -576,6 +577,7
> > >                 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > >                 list_del_init(&mid_entry->qhead);
> > >                 mid_entry->callback(mid_entry);
> > > +               cifs_mid_q_entry_release(mid_entry);
> > >         }
> > >
> > >         if (cifs_rdma_enabled(server)) {
> > >
> >
> >
> > --
> > Dave Wysochanski
> > Principal Software Maintenance Engineer
> > T: 919-754-4024
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-18  8:16                         ` David Wysochanski
@ 2019-10-18  9:27                           ` Ronnie Sahlberg
  2019-10-18 10:12                             ` David Wysochanski
  2019-10-19  9:44                           ` list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 Ronnie Sahlberg
  1 sibling, 1 reply; 31+ messages in thread
From: Ronnie Sahlberg @ 2019-10-18  9:27 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Pavel Shilovskiy, linux-cifs, Frank Sorenson


----- Original Message -----
> From: "David Wysochanski" <dwysocha@redhat.com>
> To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> Cc: "Pavel Shilovskiy" <pshilov@microsoft.com>, "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> <sorenson@redhat.com>
> Sent: Friday, 18 October, 2019 6:16:45 PM
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> 
> On Thu, Oct 17, 2019 at 6:53 PM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> >
> >
> >
> >

Good comments.
New version of the patch, please test and see comments inline below

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index bdea4b3e8005..8a78358693a5 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,8 +564,13 @@ cifs_reconnect(struct TCP_Server_Info *server)
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
-               if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
-                       mid_entry->mid_state = MID_RETRY_NEEDED;
+               kref_get(&mid_entry->refcount);
+               WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
+               /*
+                * Set MID_RETRY_NEEDED to prevent the demultiplex loop from
+                * removing us, or our neighbours, from the linked list.
+                */
+               mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list);
        }
        spin_unlock(&GlobalMid_Lock);
@@ -575,7 +580,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
        list_for_each_safe(tmp, tmp2, &retry_list) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
                list_del_init(&mid_entry->qhead);
+
                mid_entry->callback(mid_entry);
+               cifs_mid_q_entry_release(mid_entry);
        }
 
        if (cifs_rdma_enabled(server)) {
@@ -895,7 +902,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
        if (mid->mid_flags & MID_DELETED)
                printk_once(KERN_WARNING
                            "trying to dequeue a deleted mid\n");
-       else
+       else if (mid->mid_state != MID_RETRY_NEEDED)
                list_del_init(&mid->qhead);
        spin_unlock(&GlobalMid_Lock);
 }
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 308ad0f495e1..17a430b58673 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -173,7 +173,8 @@ void
 cifs_delete_mid(struct mid_q_entry *mid)
 {
        spin_lock(&GlobalMid_Lock);
-       list_del_init(&mid->qhead);
+       if (mid->mid_state != MID_RETRY_NEEDED)
+               list_del_init(&mid->qhead);
        mid->mid_flags |= MID_DELETED;
        spin_unlock(&GlobalMid_Lock);
 
@@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
                rc = -EHOSTDOWN;
                break;
        default:
-               list_del_init(&mid->qhead);
+               if (mid->mid_state != MID_RETRY_NEEDED)
+                       list_del_init(&mid->qhead);
                cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
                         __func__, mid->mid, mid->mid_state);
                rc = -EIO;






> >
> > ----- Original Message -----
> > > From: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "David Wysochanski"
> > > <dwysocha@redhat.com>
> > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > > <sorenson@redhat.com>
> > > Sent: Friday, 18 October, 2019 8:02:23 AM
> > > Subject: RE: list_del corruption while iterating retry_list in
> > > cifs_reconnect still seen on 5.4-rc3
> > >
> > > Ok, looking at cifs_delete_mid():
> > >
> > >  172 void
> > >  173 cifs_delete_mid(struct mid_q_entry *mid)
> > >  174 {
> > >  175 >-------spin_lock(&GlobalMid_Lock);
> > >  176 >-------list_del_init(&mid->qhead);
> > >  177 >-------mid->mid_flags |= MID_DELETED;
> > >  178 >-------spin_unlock(&GlobalMid_Lock);
> > >  179
> > >  180 >-------DeleteMidQEntry(mid);
> > >  181 }
> > >
> > > So, regardless of us taking references on the mid itself or not, the mid
> > > might be removed from the list. I also don't think taking GlobalMid_Lock
> > > would help much because the next mid in the list might be deleted from
> > > the
> > > list by another process while cifs_reconnect is calling callback for the
> > > current mid.
> > >
> 
> Yes the above is consistent with my tracing the crash after the first
> initial refcount patch was applied.
> After the simple refcount patch, when iterating the retry_loop, it was
> processing an orphaned list with a single item over and over and
> eventually ran itself down to refcount == 0 and crashed like before.
> 
> 
> > > Instead, shouldn't we try marking the mid as being reconnected? Once we
> > > took
> > > a reference, let's mark mid->mid_flags with a new flag MID_RECONNECT
> > > under
> > > the GlobalMid_Lock. Then modify cifs_delete_mid() to check for this flag
> > > and
> > > do not remove the mid from the list if the flag exists.
> >
> > That could work. But then we should also use that flag to suppress the
> > other places where we do a list_del*, so something like this ?
> >
> > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> > index 50dfd9049370..b324fff33e53 100644
> > --- a/fs/cifs/cifsglob.h
> > +++ b/fs/cifs/cifsglob.h
> > @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
> >  /* Flags */
> >  #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response
> >  */
> >  #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> > +#define   MID_RECONNECT          4 /* Mid is being used during reconnect
> > */
> >
> Do we need this extra flag?  Can just use  mid_state ==
> MID_RETRY_NEEDED in the necessary places?

That is a good point.
It saves us a redundant flag.

> 
> 
> >  /* Types of response buffer returned from SendReceive2 */
> >  #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> > index bdea4b3e8005..b142bd2a3ef5 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         spin_lock(&GlobalMid_Lock);
> >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > +               kref_get(&mid_entry->refcount);
> > +               mid_entry->mid_flags |= MID_RECONNECT;
> >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> 
> What happens if the state is wrong going in there, and it is not set
> to MID_RETRY_NEEDED, but yet we queue up the retry_list and run it
> below?
> Should the above 'if' check for MID_REQUEST_SUBMITTED be a WARN_ON
> followed by unconditionally setting the state?
> 
> WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> /* Unconditionally set MID_RETRY_NEEDED */
> mid_etnry->mid_state = MID_RETRY_NEEDED;

Yepp.

> 
> 
> >                 list_move(&mid_entry->qhead, &retry_list);
> > @@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         list_for_each_safe(tmp, tmp2, &retry_list) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> >                 list_del_init(&mid_entry->qhead);
> > +
> >                 mid_entry->callback(mid_entry);
> > +               cifs_mid_q_entry_release(mid_entry);
> >         }
> >
> >         if (cifs_rdma_enabled(server)) {
> > @@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
> >         if (mid->mid_flags & MID_DELETED)
> >                 printk_once(KERN_WARNING
> >                             "trying to dequeue a deleted mid\n");
> > -       else
> > +       else if (!(mid->mid_flags & MID_RECONNECT))
> 
> Instead of the above,
> 
>  -       else
> +          else if (mid_entry->mid_state == MID_RETRY_NEEDED)

Yes, but mid_state != MID_RETRY_NEEDED


>                   list_del_init(&mid->qhead);
> 
> 
> >         spin_unlock(&GlobalMid_Lock);
> >  }
> > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> > index 308ad0f495e1..ba4b5ab9cf35 100644
> > --- a/fs/cifs/transport.c
> > +++ b/fs/cifs/transport.c
> > @@ -173,7 +173,8 @@ void
> >  cifs_delete_mid(struct mid_q_entry *mid)
> >  {
> >         spin_lock(&GlobalMid_Lock);
> > -       list_del_init(&mid->qhead);
> > +       if (!(mid->mid_flags & MID_RECONNECT))
> > +               list_del_init(&mid->qhead);
> 
> Same check as above.
> 
> 
> >         mid->mid_flags |= MID_DELETED;
> >         spin_unlock(&GlobalMid_Lock);
> >
> > @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct
> > TCP_Server_Info *server)
> >                 rc = -EHOSTDOWN;
> >                 break;
> >         default:
> > -               list_del_init(&mid->qhead);
> > +               if (!(mid->mid_flags & MID_RECONNECT))
> > +                       list_del_init(&mid->qhead);
> 
> Same check as above.
> 
> >                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu
> >                 state=%d\n",
> >                          __func__, mid->mid, mid->mid_state);
> >                 rc = -EIO;
> >
> >
> > >
> > > --
> > > Best regards,
> > > Pavel Shilovsky
> > >
> > > -----Original Message-----
> > > From: Ronnie Sahlberg <lsahlber@redhat.com>
> > > Sent: Thursday, October 17, 2019 2:45 PM
> > > To: David Wysochanski <dwysocha@redhat.com>
> > > Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs
> > > <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > > Subject: Re: list_del corruption while iterating retry_list in
> > > cifs_reconnect
> > > still seen on 5.4-rc3
> > >
> > > Dave, Pavel
> > >
> > > If it takes longer to trigger it might indicate we are on the right path
> > > but
> > > there are additional places to fix.
> > >
> > > I still think you also need to protect the list mutate functions as well
> > > using the global mutex, so something like this :
> > >
> > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> > > bdea4b3e8005..16705a855818 100644
> > > --- a/fs/cifs/connect.c
> > > +++ b/fs/cifs/connect.c
> > > @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         spin_lock(&GlobalMid_Lock);
> > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > +               kref_get(&mid_entry->refcount);
> > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> > >                 list_move(&mid_entry->qhead, &retry_list); @@ -572,11
> > >                 +573,18
> > >                 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         mutex_unlock(&server->srv_mutex);
> > >
> > >         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > > +       spin_lock(&GlobalMid_Lock);
> > >         list_for_each_safe(tmp, tmp2, &retry_list) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > >                 list_del_init(&mid_entry->qhead);
> > > +               spin_unlock(&GlobalMid_Lock);
> > > +
> > >                 mid_entry->callback(mid_entry);
> > > +               cifs_mid_q_entry_release(mid_entry);
> > > +
> > > +               spin_lock(&GlobalMid_Lock);
> > >         }
> > > +       spin_unlock(&GlobalMid_Lock);
> > >
> > >         if (cifs_rdma_enabled(server)) {
> > >                 mutex_lock(&server->srv_mutex);
> > >
> > >
> > > ----- Original Message -----
> > > From: "David Wysochanski" <dwysocha@redhat.com>
> > > To: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > > Cc: "Ronnie Sahlberg" <lsahlber@redhat.com>, "linux-cifs"
> > > <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> > > Sent: Friday, 18 October, 2019 6:34:53 AM
> > > Subject: Re: list_del corruption while iterating retry_list in
> > > cifs_reconnect
> > > still seen on 5.4-rc3
> > >
> > > Unfortunately that did not fix the list_del corruption.
> > > It did seem to run longer but I'm not sure runtime is meaningful.
> > >
> > > [ 1424.215537] list_del corruption. prev->next should be
> > > ffff8d9b74c84d80,
> > > but was a6787a60550c54a9 [ 1424.232688] ------------[ cut here
> > > ]------------
> > > [ 1424.234535] kernel BUG at lib/list_debug.c:51!
> > > [ 1424.236502] invalid opcode: 0000 [#1] SMP PTI [ 1424.238334] CPU: 5
> > > PID:
> > > 10212 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3-fix1+ #33 [
> > > 1424.241489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [
> > > 1424.243770] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > [ 1424.245972] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> > > b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f
> > > 0b
> > > [ 1424.253409] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.255576]
> > > RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> > > 1424.258504] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> > > ffff8d9b77b57908 [ 1424.261404] RBP: ffff8d9b74c84d80 R08:
> > > ffff8d9b77b57908
> > > R09: 0000000000000280 [ 1424.264336] R10: ffff9a12404b3bf0 R11:
> > > ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.267285] R13:
> > > ffff9a12404b3d48
> > > R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.270191] FS:
> > > 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> > > knlGS:0000000000000000
> > > [ 1424.273491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > > 1424.275831] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> > > 00000000000406e0 [ 1424.278733] Call Trace:
> > > [ 1424.279844]  cifs_reconnect+0x268/0x620 [cifs] [ 1424.281723]
> > > cifs_readv_from_socket+0x220/0x250 [cifs] [ 1424.283876]
> > > cifs_read_from_socket+0x4a/0x70 [cifs] [ 1424.285922]  ?
> > > try_to_wake_up+0x212/0x650 [ 1424.287595]  ? cifs_small_buf_get+0x16/0x30
> > > [cifs] [ 1424.289520]  ? allocate_buffers+0x66/0x120 [cifs] [
> > > 1424.291421]
> > > cifs_demultiplex_thread+0xdc/0xc30 [cifs] [ 1424.293506]
> > > kthread+0xfb/0x130 [ 1424.294789]  ? cifs_handle_standard+0x190/0x190
> > > [cifs] [ 1424.296833]  ? kthread_park+0x90/0x90 [ 1424.298295]
> > > ret_from_fork+0x35/0x40 [ 1424.299717] Modules linked in: cifs libdes
> > > libarc4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat
> > > ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat
> > > nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
> > > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> > > ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul
> > > ghash_clmulni_intel
> > > virtio_balloon joydev i2c_piix4 nfsd nfs_acl lockd auth_rpcgss grace
> > > sunrpc
> > > xfs libcrc32c crc32c_intel virtio_net net_failover ata_generic serio_raw
> > > virtio_console virtio_blk failover pata_acpi qemu_fw_cfg [ 1424.322374]
> > > ---[
> > > end trace 214af7e68b58e94b ]--- [ 1424.324305] RIP:
> > > 0010:__list_del_entry_valid.cold+0x31/0x55
> > > [ 1424.326551] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> > > b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f
> > > 0b
> > > [ 1424.333874] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.335976]
> > > RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> > > 1424.338842] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> > > ffff8d9b77b57908 [ 1424.341668] RBP: ffff8d9b74c84d80 R08:
> > > ffff8d9b77b57908
> > > R09: 0000000000000280 [ 1424.344511] R10: ffff9a12404b3bf0 R11:
> > > ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.347343] R13:
> > > ffff9a12404b3d48
> > > R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.350184] FS:
> > > 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> > > knlGS:0000000000000000
> > > [ 1424.353394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > > 1424.355699] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> > > 00000000000406e0
> > >
> > > On Thu, Oct 17, 2019 at 3:58 PM Pavel Shilovskiy <pshilov@microsoft.com>
> > > wrote:
> > > >
> > > >
> > > > The patch looks good. Let's see if it fixes the issue in your setup.
> > > >
> > > > --
> > > > Best regards,
> > > > Pavel Shilovsky
> > > >
> > > > -----Original Message-----
> > > > From: David Wysochanski <dwysocha@redhat.com>
> > > > Sent: Thursday, October 17, 2019 12:23 PM
> > > > To: Pavel Shilovskiy <pshilov@microsoft.com>
> > > > Cc: Ronnie Sahlberg <lsahlber@redhat.com>; linux-cifs
> > > > <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > > > Subject: Re: list_del corruption while iterating retry_list in
> > > > cifs_reconnect still seen on 5.4-rc3 On Thu, Oct 17, 2019 at 2:29 PM
> > > > Pavel
> > > > Shilovskiy <pshilov@microsoft.com> wrote:
> > > > >
> > > > > The similar solution of taking an extra reference should apply to the
> > > > > case of reconnect as well. The reference should be taken during the
> > > > > process of moving mid entries to the private list. Once a callback
> > > > > completes, such a reference should be put back thus freeing the mid.
> > > > >
> > > >
> > > > Ah ok very good.  The above seems consistent with the traces I'm seeing
> > > > of
> > > > the race.
> > > > I am going to test this patch as it sounds like what you're describing
> > > > and
> > > > similar to what Ronnie suggested earlier:
> > > >
> > > > --- a/fs/cifs/connect.c
> > > > +++ b/fs/cifs/connect.c
> > > > @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > >         spin_lock(&GlobalMid_Lock);
> > > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > > >                 mid_entry = list_entry(tmp, struct mid_q_entry,
> > > > qhead);
> > > > +               kref_get(&mid_entry->refcount);
> > > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> > > >                 list_move(&mid_entry->qhead, &retry_list); @@ -576,6
> > > >                 +577,7
> > > >                 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > >                 list_del_init(&mid_entry->qhead);
> > > >                 mid_entry->callback(mid_entry);
> > > > +               cifs_mid_q_entry_release(mid_entry);
> > > >         }
> > > >
> > > >         if (cifs_rdma_enabled(server)) {
> > > >
> > >
> > >
> > > --
> > > Dave Wysochanski
> > > Principal Software Maintenance Engineer
> > > T: 919-754-4024
> > >
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-18  9:27                           ` Ronnie Sahlberg
@ 2019-10-18 10:12                             ` David Wysochanski
  2019-10-18 20:59                               ` Pavel Shilovskiy
  0 siblings, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-18 10:12 UTC (permalink / raw)
  To: Ronnie Sahlberg; +Cc: Pavel Shilovskiy, linux-cifs, Frank Sorenson

On Fri, Oct 18, 2019 at 5:27 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
>
>
> ----- Original Message -----
> > From: "David Wysochanski" <dwysocha@redhat.com>
> > To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > Cc: "Pavel Shilovskiy" <pshilov@microsoft.com>, "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > <sorenson@redhat.com>
> > Sent: Friday, 18 October, 2019 6:16:45 PM
> > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> >
> > On Thu, Oct 17, 2019 at 6:53 PM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> > >
> > >
> > >
> > >
>
> Good comments.
> New version of the patch, please test and see comments inline below
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index bdea4b3e8005..8a78358693a5 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,8 +564,13 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> -               if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> -                       mid_entry->mid_state = MID_RETRY_NEEDED;
> +               kref_get(&mid_entry->refcount);
> +               WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> +               /*
> +                * Set MID_RETRY_NEEDED to prevent the demultiplex loop from
> +                * removing us, or our neighbours, from the linked list.
> +                */
> +               mid_entry->mid_state = MID_RETRY_NEEDED;
>                 list_move(&mid_entry->qhead, &retry_list);
>         }
>         spin_unlock(&GlobalMid_Lock);
> @@ -575,7 +580,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                 list_del_init(&mid_entry->qhead);
> +
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>
>         if (cifs_rdma_enabled(server)) {
> @@ -895,7 +902,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
>         if (mid->mid_flags & MID_DELETED)
>                 printk_once(KERN_WARNING
>                             "trying to dequeue a deleted mid\n");
> -       else
> +       else if (mid->mid_state != MID_RETRY_NEEDED)

I'm just using an 'if' here not 'else if'.  Do you see any issue with that?

Actually this section needed a little of reorganizing due to the
setting of the mid_state.  So I have this now for this hunk:

        mid->when_received = jiffies;
 #endif
        spin_lock(&GlobalMid_Lock);
-       if (!malformed)
-               mid->mid_state = MID_RESPONSE_RECEIVED;
-       else
-               mid->mid_state = MID_RESPONSE_MALFORMED;
        /*
         * Trying to handle/dequeue a mid after the send_recv()
         * function has finished processing it is a bug.
@@ -895,8 +893,14 @@ static inline int reconn_setup_dfs_targets(struct
cifs_sb_info *cifs_sb,
        if (mid->mid_flags & MID_DELETED)
                printk_once(KERN_WARNING
                            "trying to dequeue a deleted mid\n");
-       else
+       if (mid->mid_state != MID_RETRY_NEEDED)
                list_del_init(&mid->qhead);
+
+       if (!malformed)
+               mid->mid_state = MID_RESPONSE_RECEIVED;
+       else
+               mid->mid_state = MID_RESPONSE_MALFORMED;
+
        spin_unlock(&GlobalMid_Lock);
 }



>                 list_del_init(&mid->qhead);
>         spin_unlock(&GlobalMid_Lock);
>  }
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 308ad0f495e1..17a430b58673 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -173,7 +173,8 @@ void
>  cifs_delete_mid(struct mid_q_entry *mid)
>  {
>         spin_lock(&GlobalMid_Lock);
> -       list_del_init(&mid->qhead);
> +       if (mid->mid_state != MID_RETRY_NEEDED)
> +               list_del_init(&mid->qhead);
>         mid->mid_flags |= MID_DELETED;
>         spin_unlock(&GlobalMid_Lock);
>
> @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
>                 rc = -EHOSTDOWN;
>                 break;
>         default:
> -               list_del_init(&mid->qhead);
> +               if (mid->mid_state != MID_RETRY_NEEDED)
> +                       list_del_init(&mid->qhead);
>                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
>                          __func__, mid->mid, mid->mid_state);
>                 rc = -EIO;
>
>
>
>
>
>
> > >
> > > ----- Original Message -----
> > > > From: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > > > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "David Wysochanski"
> > > > <dwysocha@redhat.com>
> > > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > > > <sorenson@redhat.com>
> > > > Sent: Friday, 18 October, 2019 8:02:23 AM
> > > > Subject: RE: list_del corruption while iterating retry_list in
> > > > cifs_reconnect still seen on 5.4-rc3
> > > >
> > > > Ok, looking at cifs_delete_mid():
> > > >
> > > >  172 void
> > > >  173 cifs_delete_mid(struct mid_q_entry *mid)
> > > >  174 {
> > > >  175 >-------spin_lock(&GlobalMid_Lock);
> > > >  176 >-------list_del_init(&mid->qhead);
> > > >  177 >-------mid->mid_flags |= MID_DELETED;
> > > >  178 >-------spin_unlock(&GlobalMid_Lock);
> > > >  179
> > > >  180 >-------DeleteMidQEntry(mid);
> > > >  181 }
> > > >
> > > > So, regardless of us taking references on the mid itself or not, the mid
> > > > might be removed from the list. I also don't think taking GlobalMid_Lock
> > > > would help much because the next mid in the list might be deleted from
> > > > the
> > > > list by another process while cifs_reconnect is calling callback for the
> > > > current mid.
> > > >
> >
> > Yes the above is consistent with my tracing the crash after the first
> > initial refcount patch was applied.
> > After the simple refcount patch, when iterating the retry_loop, it was
> > processing an orphaned list with a single item over and over and
> > eventually ran itself down to refcount == 0 and crashed like before.
> >
> >
> > > > Instead, shouldn't we try marking the mid as being reconnected? Once we
> > > > took
> > > > a reference, let's mark mid->mid_flags with a new flag MID_RECONNECT
> > > > under
> > > > the GlobalMid_Lock. Then modify cifs_delete_mid() to check for this flag
> > > > and
> > > > do not remove the mid from the list if the flag exists.
> > >
> > > That could work. But then we should also use that flag to suppress the
> > > other places where we do a list_del*, so something like this ?
> > >
> > > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> > > index 50dfd9049370..b324fff33e53 100644
> > > --- a/fs/cifs/cifsglob.h
> > > +++ b/fs/cifs/cifsglob.h
> > > @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
> > >  /* Flags */
> > >  #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response
> > >  */
> > >  #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> > > +#define   MID_RECONNECT          4 /* Mid is being used during reconnect
> > > */
> > >
> > Do we need this extra flag?  Can just use  mid_state ==
> > MID_RETRY_NEEDED in the necessary places?
>
> That is a good point.
> It saves us a redundant flag.
>
> >
> >
> > >  /* Types of response buffer returned from SendReceive2 */
> > >  #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
> > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> > > index bdea4b3e8005..b142bd2a3ef5 100644
> > > --- a/fs/cifs/connect.c
> > > +++ b/fs/cifs/connect.c
> > > @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         spin_lock(&GlobalMid_Lock);
> > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > +               kref_get(&mid_entry->refcount);
> > > +               mid_entry->mid_flags |= MID_RECONNECT;
> > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> >
> > What happens if the state is wrong going in there, and it is not set
> > to MID_RETRY_NEEDED, but yet we queue up the retry_list and run it
> > below?
> > Should the above 'if' check for MID_REQUEST_SUBMITTED be a WARN_ON
> > followed by unconditionally setting the state?
> >
> > WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> > /* Unconditionally set MID_RETRY_NEEDED */
> > mid_etnry->mid_state = MID_RETRY_NEEDED;
>
> Yepp.
>
> >
> >
> > >                 list_move(&mid_entry->qhead, &retry_list);
> > > @@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         list_for_each_safe(tmp, tmp2, &retry_list) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > >                 list_del_init(&mid_entry->qhead);
> > > +
> > >                 mid_entry->callback(mid_entry);
> > > +               cifs_mid_q_entry_release(mid_entry);
> > >         }
> > >
> > >         if (cifs_rdma_enabled(server)) {
> > > @@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
> > >         if (mid->mid_flags & MID_DELETED)
> > >                 printk_once(KERN_WARNING
> > >                             "trying to dequeue a deleted mid\n");
> > > -       else
> > > +       else if (!(mid->mid_flags & MID_RECONNECT))
> >
> > Instead of the above,
> >
> >  -       else
> > +          else if (mid_entry->mid_state == MID_RETRY_NEEDED)
>
> Yes, but mid_state != MID_RETRY_NEEDED
>

Yeah good catch on that - somehow I reversed the logic, and when I
tested the former it blew up spectacularly almost instantaenously!
Doh!

So far the latest patch has been running for about 25 minutes, which
is I think the longest this test has survived.
I need a bit more runtime to be sure it's good, but if it keeps going
I'll plan to create a patch header and submit to list by end of today.
Thanks Ronnie and Pavel for the help tracking this down.


>
> >                   list_del_init(&mid->qhead);
> >
> >
> > >         spin_unlock(&GlobalMid_Lock);
> > >  }
> > > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> > > index 308ad0f495e1..ba4b5ab9cf35 100644
> > > --- a/fs/cifs/transport.c
> > > +++ b/fs/cifs/transport.c
> > > @@ -173,7 +173,8 @@ void
> > >  cifs_delete_mid(struct mid_q_entry *mid)
> > >  {
> > >         spin_lock(&GlobalMid_Lock);
> > > -       list_del_init(&mid->qhead);
> > > +       if (!(mid->mid_flags & MID_RECONNECT))
> > > +               list_del_init(&mid->qhead);
> >
> > Same check as above.
> >
> >
> > >         mid->mid_flags |= MID_DELETED;
> > >         spin_unlock(&GlobalMid_Lock);
> > >
> > > @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct
> > > TCP_Server_Info *server)
> > >                 rc = -EHOSTDOWN;
> > >                 break;
> > >         default:
> > > -               list_del_init(&mid->qhead);
> > > +               if (!(mid->mid_flags & MID_RECONNECT))
> > > +                       list_del_init(&mid->qhead);
> >
> > Same check as above.
> >
> > >                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu
> > >                 state=%d\n",
> > >                          __func__, mid->mid, mid->mid_state);
> > >                 rc = -EIO;
> > >
> > >
> > > >
> > > > --
> > > > Best regards,
> > > > Pavel Shilovsky
> > > >
> > > > -----Original Message-----
> > > > From: Ronnie Sahlberg <lsahlber@redhat.com>
> > > > Sent: Thursday, October 17, 2019 2:45 PM
> > > > To: David Wysochanski <dwysocha@redhat.com>
> > > > Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs
> > > > <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > > > Subject: Re: list_del corruption while iterating retry_list in
> > > > cifs_reconnect
> > > > still seen on 5.4-rc3
> > > >
> > > > Dave, Pavel
> > > >
> > > > If it takes longer to trigger it might indicate we are on the right path
> > > > but
> > > > there are additional places to fix.
> > > >
> > > > I still think you also need to protect the list mutate functions as well
> > > > using the global mutex, so something like this :
> > > >
> > > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> > > > bdea4b3e8005..16705a855818 100644
> > > > --- a/fs/cifs/connect.c
> > > > +++ b/fs/cifs/connect.c
> > > > @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > >         spin_lock(&GlobalMid_Lock);
> > > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > +               kref_get(&mid_entry->refcount);
> > > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> > > >                 list_move(&mid_entry->qhead, &retry_list); @@ -572,11
> > > >                 +573,18
> > > >                 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > >         mutex_unlock(&server->srv_mutex);
> > > >
> > > >         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > > > +       spin_lock(&GlobalMid_Lock);
> > > >         list_for_each_safe(tmp, tmp2, &retry_list) {
> > > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > >                 list_del_init(&mid_entry->qhead);
> > > > +               spin_unlock(&GlobalMid_Lock);
> > > > +
> > > >                 mid_entry->callback(mid_entry);
> > > > +               cifs_mid_q_entry_release(mid_entry);
> > > > +
> > > > +               spin_lock(&GlobalMid_Lock);
> > > >         }
> > > > +       spin_unlock(&GlobalMid_Lock);
> > > >
> > > >         if (cifs_rdma_enabled(server)) {
> > > >                 mutex_lock(&server->srv_mutex);
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: "David Wysochanski" <dwysocha@redhat.com>
> > > > To: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > > > Cc: "Ronnie Sahlberg" <lsahlber@redhat.com>, "linux-cifs"
> > > > <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> > > > Sent: Friday, 18 October, 2019 6:34:53 AM
> > > > Subject: Re: list_del corruption while iterating retry_list in
> > > > cifs_reconnect
> > > > still seen on 5.4-rc3
> > > >
> > > > Unfortunately that did not fix the list_del corruption.
> > > > It did seem to run longer but I'm not sure runtime is meaningful.
> > > >
> > > > [ 1424.215537] list_del corruption. prev->next should be
> > > > ffff8d9b74c84d80,
> > > > but was a6787a60550c54a9 [ 1424.232688] ------------[ cut here
> > > > ]------------
> > > > [ 1424.234535] kernel BUG at lib/list_debug.c:51!
> > > > [ 1424.236502] invalid opcode: 0000 [#1] SMP PTI [ 1424.238334] CPU: 5
> > > > PID:
> > > > 10212 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3-fix1+ #33 [
> > > > 1424.241489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [
> > > > 1424.243770] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > [ 1424.245972] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> > > > b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f
> > > > 0b
> > > > [ 1424.253409] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.255576]
> > > > RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> > > > 1424.258504] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> > > > ffff8d9b77b57908 [ 1424.261404] RBP: ffff8d9b74c84d80 R08:
> > > > ffff8d9b77b57908
> > > > R09: 0000000000000280 [ 1424.264336] R10: ffff9a12404b3bf0 R11:
> > > > ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.267285] R13:
> > > > ffff9a12404b3d48
> > > > R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.270191] FS:
> > > > 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> > > > knlGS:0000000000000000
> > > > [ 1424.273491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > > > 1424.275831] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> > > > 00000000000406e0 [ 1424.278733] Call Trace:
> > > > [ 1424.279844]  cifs_reconnect+0x268/0x620 [cifs] [ 1424.281723]
> > > > cifs_readv_from_socket+0x220/0x250 [cifs] [ 1424.283876]
> > > > cifs_read_from_socket+0x4a/0x70 [cifs] [ 1424.285922]  ?
> > > > try_to_wake_up+0x212/0x650 [ 1424.287595]  ? cifs_small_buf_get+0x16/0x30
> > > > [cifs] [ 1424.289520]  ? allocate_buffers+0x66/0x120 [cifs] [
> > > > 1424.291421]
> > > > cifs_demultiplex_thread+0xdc/0xc30 [cifs] [ 1424.293506]
> > > > kthread+0xfb/0x130 [ 1424.294789]  ? cifs_handle_standard+0x190/0x190
> > > > [cifs] [ 1424.296833]  ? kthread_park+0x90/0x90 [ 1424.298295]
> > > > ret_from_fork+0x35/0x40 [ 1424.299717] Modules linked in: cifs libdes
> > > > libarc4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat
> > > > ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat
> > > > nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
> > > > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> > > > ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul
> > > > ghash_clmulni_intel
> > > > virtio_balloon joydev i2c_piix4 nfsd nfs_acl lockd auth_rpcgss grace
> > > > sunrpc
> > > > xfs libcrc32c crc32c_intel virtio_net net_failover ata_generic serio_raw
> > > > virtio_console virtio_blk failover pata_acpi qemu_fw_cfg [ 1424.322374]
> > > > ---[
> > > > end trace 214af7e68b58e94b ]--- [ 1424.324305] RIP:
> > > > 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > [ 1424.326551] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> > > > b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f
> > > > 0b
> > > > [ 1424.333874] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.335976]
> > > > RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> > > > 1424.338842] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> > > > ffff8d9b77b57908 [ 1424.341668] RBP: ffff8d9b74c84d80 R08:
> > > > ffff8d9b77b57908
> > > > R09: 0000000000000280 [ 1424.344511] R10: ffff9a12404b3bf0 R11:
> > > > ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.347343] R13:
> > > > ffff9a12404b3d48
> > > > R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.350184] FS:
> > > > 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> > > > knlGS:0000000000000000
> > > > [ 1424.353394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > > > 1424.355699] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> > > > 00000000000406e0
> > > >
> > > > On Thu, Oct 17, 2019 at 3:58 PM Pavel Shilovskiy <pshilov@microsoft.com>
> > > > wrote:
> > > > >
> > > > >
> > > > > The patch looks good. Let's see if it fixes the issue in your setup.
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Pavel Shilovsky
> > > > >
> > > > > -----Original Message-----
> > > > > From: David Wysochanski <dwysocha@redhat.com>
> > > > > Sent: Thursday, October 17, 2019 12:23 PM
> > > > > To: Pavel Shilovskiy <pshilov@microsoft.com>
> > > > > Cc: Ronnie Sahlberg <lsahlber@redhat.com>; linux-cifs
> > > > > <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > > > > Subject: Re: list_del corruption while iterating retry_list in
> > > > > cifs_reconnect still seen on 5.4-rc3 On Thu, Oct 17, 2019 at 2:29 PM
> > > > > Pavel
> > > > > Shilovskiy <pshilov@microsoft.com> wrote:
> > > > > >
> > > > > > The similar solution of taking an extra reference should apply to the
> > > > > > case of reconnect as well. The reference should be taken during the
> > > > > > process of moving mid entries to the private list. Once a callback
> > > > > > completes, such a reference should be put back thus freeing the mid.
> > > > > >
> > > > >
> > > > > Ah ok very good.  The above seems consistent with the traces I'm seeing
> > > > > of
> > > > > the race.
> > > > > I am going to test this patch as it sounds like what you're describing
> > > > > and
> > > > > similar to what Ronnie suggested earlier:
> > > > >
> > > > > --- a/fs/cifs/connect.c
> > > > > +++ b/fs/cifs/connect.c
> > > > > @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > > >         spin_lock(&GlobalMid_Lock);
> > > > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > > > >                 mid_entry = list_entry(tmp, struct mid_q_entry,
> > > > > qhead);
> > > > > +               kref_get(&mid_entry->refcount);
> > > > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> > > > >                 list_move(&mid_entry->qhead, &retry_list); @@ -576,6
> > > > >                 +577,7
> > > > >                 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > >                 list_del_init(&mid_entry->qhead);
> > > > >                 mid_entry->callback(mid_entry);
> > > > > +               cifs_mid_q_entry_release(mid_entry);
> > > > >         }
> > > > >
> > > > >         if (cifs_rdma_enabled(server)) {
> > > > >
> > > >
> > > >
> > > > --
> > > > Dave Wysochanski
> > > > Principal Software Maintenance Engineer
> > > > T: 919-754-4024
> > > >
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-18 10:12                             ` David Wysochanski
@ 2019-10-18 20:59                               ` Pavel Shilovskiy
  2019-10-18 21:21                                 ` David Wysochanski
  2019-10-19 23:35                                 ` [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect Dave Wysochanski
  0 siblings, 2 replies; 31+ messages in thread
From: Pavel Shilovskiy @ 2019-10-18 20:59 UTC (permalink / raw)
  To: David Wysochanski, Ronnie Sahlberg; +Cc: linux-cifs, Frank Sorenson

Thanks for the good news that the patch is stable in your workload!

The extra flag may not be necessary and we may rely on a MID state but we would need to handle two states actually: MID_RETRY_NEEDED and MID_SHUTDOWN - see clean_demultiplex_info() which is doing the same things with mid as cifs_reconnect(). Please add ref counting to both functions since they both can race with system call threads.

I also think that we need to create as smaller patch as possible to avoid hidden regressions. That's why I don't think we should change IF() to WARN_ON() in the same patch and keep  it separately without the stable tag.

Another general thought is that including extra logic into the MID state may complicate the code. Having a flag like MID_QUEUED would reflect the meaning more straightforward: if mis is queued then de-queue it (aka remove it from the list), else - skip this step. This may be changed later if you think this will complicate the small stable patch.

--
Best regards,
Pavel Shilovsky

-----Original Message-----
From: David Wysochanski <dwysocha@redhat.com> 
Sent: Friday, October 18, 2019 3:12 AM
To: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3

On Fri, Oct 18, 2019 at 5:27 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
>
>
> ----- Original Message -----
> > From: "David Wysochanski" <dwysocha@redhat.com>
> > To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > Cc: "Pavel Shilovskiy" <pshilov@microsoft.com>, "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > <sorenson@redhat.com>
> > Sent: Friday, 18 October, 2019 6:16:45 PM
> > Subject: Re: list_del corruption while iterating retry_list in 
> > cifs_reconnect still seen on 5.4-rc3
> >
> > On Thu, Oct 17, 2019 at 6:53 PM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> > >
> > >
> > >
> > >
>
> Good comments.
> New version of the patch, please test and see comments inline below
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 
> bdea4b3e8005..8a78358693a5 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,8 +564,13 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> -               if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> -                       mid_entry->mid_state = MID_RETRY_NEEDED;
> +               kref_get(&mid_entry->refcount);
> +               WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> +               /*
> +                * Set MID_RETRY_NEEDED to prevent the demultiplex loop from
> +                * removing us, or our neighbours, from the linked list.
> +                */
> +               mid_entry->mid_state = MID_RETRY_NEEDED;
>                 list_move(&mid_entry->qhead, &retry_list);
>         }
>         spin_unlock(&GlobalMid_Lock);
> @@ -575,7 +580,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                 list_del_init(&mid_entry->qhead);
> +
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>
>         if (cifs_rdma_enabled(server)) { @@ -895,7 +902,7 @@ 
> dequeue_mid(struct mid_q_entry *mid, bool malformed)
>         if (mid->mid_flags & MID_DELETED)
>                 printk_once(KERN_WARNING
>                             "trying to dequeue a deleted mid\n");
> -       else
> +       else if (mid->mid_state != MID_RETRY_NEEDED)

I'm just using an 'if' here not 'else if'.  Do you see any issue with that?

Actually this section needed a little of reorganizing due to the setting of the mid_state.  So I have this now for this hunk:

        mid->when_received = jiffies;
 #endif
        spin_lock(&GlobalMid_Lock);
-       if (!malformed)
-               mid->mid_state = MID_RESPONSE_RECEIVED;
-       else
-               mid->mid_state = MID_RESPONSE_MALFORMED;
        /*
         * Trying to handle/dequeue a mid after the send_recv()
         * function has finished processing it is a bug.
@@ -895,8 +893,14 @@ static inline int reconn_setup_dfs_targets(struct cifs_sb_info *cifs_sb,
        if (mid->mid_flags & MID_DELETED)
                printk_once(KERN_WARNING
                            "trying to dequeue a deleted mid\n");
-       else
+       if (mid->mid_state != MID_RETRY_NEEDED)
                list_del_init(&mid->qhead);
+
+       if (!malformed)
+               mid->mid_state = MID_RESPONSE_RECEIVED;
+       else
+               mid->mid_state = MID_RESPONSE_MALFORMED;
+
        spin_unlock(&GlobalMid_Lock);
 }



>                 list_del_init(&mid->qhead);
>         spin_unlock(&GlobalMid_Lock);
>  }
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index 
> 308ad0f495e1..17a430b58673 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -173,7 +173,8 @@ void
>  cifs_delete_mid(struct mid_q_entry *mid)  {
>         spin_lock(&GlobalMid_Lock);
> -       list_del_init(&mid->qhead);
> +       if (mid->mid_state != MID_RETRY_NEEDED)
> +               list_del_init(&mid->qhead);
>         mid->mid_flags |= MID_DELETED;
>         spin_unlock(&GlobalMid_Lock);
>
> @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
>                 rc = -EHOSTDOWN;
>                 break;
>         default:
> -               list_del_init(&mid->qhead);
> +               if (mid->mid_state != MID_RETRY_NEEDED)
> +                       list_del_init(&mid->qhead);
>                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
>                          __func__, mid->mid, mid->mid_state);
>                 rc = -EIO;
>
>
>
>
>
>
> > >
> > > ----- Original Message -----
> > > > From: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > > > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "David Wysochanski"
> > > > <dwysocha@redhat.com>
> > > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > > > <sorenson@redhat.com>
> > > > Sent: Friday, 18 October, 2019 8:02:23 AM
> > > > Subject: RE: list_del corruption while iterating retry_list in 
> > > > cifs_reconnect still seen on 5.4-rc3
> > > >
> > > > Ok, looking at cifs_delete_mid():
> > > >
> > > >  172 void
> > > >  173 cifs_delete_mid(struct mid_q_entry *mid)
> > > >  174 {
> > > >  175 >-------spin_lock(&GlobalMid_Lock);
> > > >  176 >-------list_del_init(&mid->qhead);
> > > >  177 >-------mid->mid_flags |= MID_DELETED;
> > > >  178 >-------spin_unlock(&GlobalMid_Lock);
> > > >  179
> > > >  180 >-------DeleteMidQEntry(mid);
> > > >  181 }
> > > >
> > > > So, regardless of us taking references on the mid itself or not, 
> > > > the mid might be removed from the list. I also don't think 
> > > > taking GlobalMid_Lock would help much because the next mid in 
> > > > the list might be deleted from the list by another process while 
> > > > cifs_reconnect is calling callback for the current mid.
> > > >
> >
> > Yes the above is consistent with my tracing the crash after the first
> > initial refcount patch was applied.
> > After the simple refcount patch, when iterating the retry_loop, it was
> > processing an orphaned list with a single item over and over and
> > eventually ran itself down to refcount == 0 and crashed like before.
> >
> >
> > > > Instead, shouldn't we try marking the mid as being reconnected? Once we
> > > > took
> > > > a reference, let's mark mid->mid_flags with a new flag MID_RECONNECT
> > > > under
> > > > the GlobalMid_Lock. Then modify cifs_delete_mid() to check for this flag
> > > > and
> > > > do not remove the mid from the list if the flag exists.
> > >
> > > That could work. But then we should also use that flag to suppress the
> > > other places where we do a list_del*, so something like this ?
> > >
> > > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> > > index 50dfd9049370..b324fff33e53 100644
> > > --- a/fs/cifs/cifsglob.h
> > > +++ b/fs/cifs/cifsglob.h
> > > @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
> > >  /* Flags */
> > >  #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response
> > >  */
> > >  #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> > > +#define   MID_RECONNECT          4 /* Mid is being used during reconnect
> > > */
> > >
> > Do we need this extra flag?  Can just use  mid_state ==
> > MID_RETRY_NEEDED in the necessary places?
>
> That is a good point.
> It saves us a redundant flag.
>
> >
> >
> > >  /* Types of response buffer returned from SendReceive2 */
> > >  #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
> > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> > > index bdea4b3e8005..b142bd2a3ef5 100644
> > > --- a/fs/cifs/connect.c
> > > +++ b/fs/cifs/connect.c
> > > @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         spin_lock(&GlobalMid_Lock);
> > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > +               kref_get(&mid_entry->refcount);
> > > +               mid_entry->mid_flags |= MID_RECONNECT;
> > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> >
> > What happens if the state is wrong going in there, and it is not set
> > to MID_RETRY_NEEDED, but yet we queue up the retry_list and run it
> > below?
> > Should the above 'if' check for MID_REQUEST_SUBMITTED be a WARN_ON
> > followed by unconditionally setting the state?
> >
> > WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> > /* Unconditionally set MID_RETRY_NEEDED */
> > mid_etnry->mid_state = MID_RETRY_NEEDED;
>
> Yepp.
>
> >
> >
> > >                 list_move(&mid_entry->qhead, &retry_list);
> > > @@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         list_for_each_safe(tmp, tmp2, &retry_list) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > >                 list_del_init(&mid_entry->qhead);
> > > +
> > >                 mid_entry->callback(mid_entry);
> > > +               cifs_mid_q_entry_release(mid_entry);
> > >         }
> > >
> > >         if (cifs_rdma_enabled(server)) {
> > > @@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
> > >         if (mid->mid_flags & MID_DELETED)
> > >                 printk_once(KERN_WARNING
> > >                             "trying to dequeue a deleted mid\n");
> > > -       else
> > > +       else if (!(mid->mid_flags & MID_RECONNECT))
> >
> > Instead of the above,
> >
> >  -       else
> > +          else if (mid_entry->mid_state == MID_RETRY_NEEDED)
>
> Yes, but mid_state != MID_RETRY_NEEDED
>

Yeah good catch on that - somehow I reversed the logic, and when I
tested the former it blew up spectacularly almost instantaenously!
Doh!

So far the latest patch has been running for about 25 minutes, which
is I think the longest this test has survived.
I need a bit more runtime to be sure it's good, but if it keeps going
I'll plan to create a patch header and submit to list by end of today.
Thanks Ronnie and Pavel for the help tracking this down.







^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-18 20:59                               ` Pavel Shilovskiy
@ 2019-10-18 21:21                                 ` David Wysochanski
  2019-10-18 21:44                                   ` David Wysochanski
  2019-10-19 23:35                                 ` [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect Dave Wysochanski
  1 sibling, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-18 21:21 UTC (permalink / raw)
  To: Pavel Shilovskiy; +Cc: Ronnie Sahlberg, linux-cifs, Frank Sorenson

[-- Attachment #1: Type: text/plain, Size: 13016 bytes --]

On Fri, Oct 18, 2019 at 4:59 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
>
> Thanks for the good news that the patch is stable in your workload!
>
The attached patch I ran on top of 5.4-rc3 for over 5 hrs today on the
reboot test - before it would crash after a few minutes tops.

> The extra flag may not be necessary and we may rely on a MID state but we would need to handle two states actually: MID_RETRY_NEEDED and MID_SHUTDOWN - see clean_demultiplex_info() which is doing the same things with mid as cifs_reconnect(). Please add ref counting to both functions since they both can race with system call threads.
>
> I also think that we need to create as smaller patch as possible to avoid hidden regressions. That's why I don't think we should change IF() to WARN_ON() in the same patch and keep  it separately without the stable tag.
>
IMO that 'if' statement is wrong, and should be removed unless it can
be defended.  Why are we _conditionally_ setting the state to
MID_RETRY_NEEDED in the same loop as we're putting mids on retry_list?
 What's the state machine supposed to be doing if it's ambiguous?

> Another general thought is that including extra logic into the MID state may complicate the code. Having a flag like MID_QUEUED would reflect the meaning more straightforward: if mis is queued then de-queue it (aka remove it from the list), else - skip this step. This may be changed later if you think this will complicate the small stable patch.
>

You all know better than me.  I'll take another look next week and
look forward to more discussion.

> --
> Best regards,
> Pavel Shilovsky
>
> -----Original Message-----
> From: David Wysochanski <dwysocha@redhat.com>
> Sent: Friday, October 18, 2019 3:12 AM
> To: Ronnie Sahlberg <lsahlber@redhat.com>
> Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
>
> On Fri, Oct 18, 2019 at 5:27 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> >
> >
> > ----- Original Message -----
> > > From: "David Wysochanski" <dwysocha@redhat.com>
> > > To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > > Cc: "Pavel Shilovskiy" <pshilov@microsoft.com>, "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > > <sorenson@redhat.com>
> > > Sent: Friday, 18 October, 2019 6:16:45 PM
> > > Subject: Re: list_del corruption while iterating retry_list in
> > > cifs_reconnect still seen on 5.4-rc3
> > >
> > > On Thu, Oct 17, 2019 at 6:53 PM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> > > >
> > > >
> > > >
> > > >
> >
> > Good comments.
> > New version of the patch, please test and see comments inline below
> >
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> > bdea4b3e8005..8a78358693a5 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -564,8 +564,13 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         spin_lock(&GlobalMid_Lock);
> >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > -               if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > -                       mid_entry->mid_state = MID_RETRY_NEEDED;
> > +               kref_get(&mid_entry->refcount);
> > +               WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> > +               /*
> > +                * Set MID_RETRY_NEEDED to prevent the demultiplex loop from
> > +                * removing us, or our neighbours, from the linked list.
> > +                */
> > +               mid_entry->mid_state = MID_RETRY_NEEDED;
> >                 list_move(&mid_entry->qhead, &retry_list);
> >         }
> >         spin_unlock(&GlobalMid_Lock);
> > @@ -575,7 +580,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         list_for_each_safe(tmp, tmp2, &retry_list) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> >                 list_del_init(&mid_entry->qhead);
> > +
> >                 mid_entry->callback(mid_entry);
> > +               cifs_mid_q_entry_release(mid_entry);
> >         }
> >
> >         if (cifs_rdma_enabled(server)) { @@ -895,7 +902,7 @@
> > dequeue_mid(struct mid_q_entry *mid, bool malformed)
> >         if (mid->mid_flags & MID_DELETED)
> >                 printk_once(KERN_WARNING
> >                             "trying to dequeue a deleted mid\n");
> > -       else
> > +       else if (mid->mid_state != MID_RETRY_NEEDED)
>
> I'm just using an 'if' here not 'else if'.  Do you see any issue with that?
>
> Actually this section needed a little of reorganizing due to the setting of the mid_state.  So I have this now for this hunk:
>
>         mid->when_received = jiffies;
>  #endif
>         spin_lock(&GlobalMid_Lock);
> -       if (!malformed)
> -               mid->mid_state = MID_RESPONSE_RECEIVED;
> -       else
> -               mid->mid_state = MID_RESPONSE_MALFORMED;
>         /*
>          * Trying to handle/dequeue a mid after the send_recv()
>          * function has finished processing it is a bug.
> @@ -895,8 +893,14 @@ static inline int reconn_setup_dfs_targets(struct cifs_sb_info *cifs_sb,
>         if (mid->mid_flags & MID_DELETED)
>                 printk_once(KERN_WARNING
>                             "trying to dequeue a deleted mid\n");
> -       else
> +       if (mid->mid_state != MID_RETRY_NEEDED)
>                 list_del_init(&mid->qhead);
> +
> +       if (!malformed)
> +               mid->mid_state = MID_RESPONSE_RECEIVED;
> +       else
> +               mid->mid_state = MID_RESPONSE_MALFORMED;
> +
>         spin_unlock(&GlobalMid_Lock);
>  }
>
>
>
> >                 list_del_init(&mid->qhead);
> >         spin_unlock(&GlobalMid_Lock);
> >  }
> > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index
> > 308ad0f495e1..17a430b58673 100644
> > --- a/fs/cifs/transport.c
> > +++ b/fs/cifs/transport.c
> > @@ -173,7 +173,8 @@ void
> >  cifs_delete_mid(struct mid_q_entry *mid)  {
> >         spin_lock(&GlobalMid_Lock);
> > -       list_del_init(&mid->qhead);
> > +       if (mid->mid_state != MID_RETRY_NEEDED)
> > +               list_del_init(&mid->qhead);
> >         mid->mid_flags |= MID_DELETED;
> >         spin_unlock(&GlobalMid_Lock);
> >
> > @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
> >                 rc = -EHOSTDOWN;
> >                 break;
> >         default:
> > -               list_del_init(&mid->qhead);
> > +               if (mid->mid_state != MID_RETRY_NEEDED)
> > +                       list_del_init(&mid->qhead);
> >                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
> >                          __func__, mid->mid, mid->mid_state);
> >                 rc = -EIO;
> >
> >
> >
> >
> >
> >
> > > >
> > > > ----- Original Message -----
> > > > > From: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > > > > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "David Wysochanski"
> > > > > <dwysocha@redhat.com>
> > > > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > > > > <sorenson@redhat.com>
> > > > > Sent: Friday, 18 October, 2019 8:02:23 AM
> > > > > Subject: RE: list_del corruption while iterating retry_list in
> > > > > cifs_reconnect still seen on 5.4-rc3
> > > > >
> > > > > Ok, looking at cifs_delete_mid():
> > > > >
> > > > >  172 void
> > > > >  173 cifs_delete_mid(struct mid_q_entry *mid)
> > > > >  174 {
> > > > >  175 >-------spin_lock(&GlobalMid_Lock);
> > > > >  176 >-------list_del_init(&mid->qhead);
> > > > >  177 >-------mid->mid_flags |= MID_DELETED;
> > > > >  178 >-------spin_unlock(&GlobalMid_Lock);
> > > > >  179
> > > > >  180 >-------DeleteMidQEntry(mid);
> > > > >  181 }
> > > > >
> > > > > So, regardless of us taking references on the mid itself or not,
> > > > > the mid might be removed from the list. I also don't think
> > > > > taking GlobalMid_Lock would help much because the next mid in
> > > > > the list might be deleted from the list by another process while
> > > > > cifs_reconnect is calling callback for the current mid.
> > > > >
> > >
> > > Yes the above is consistent with my tracing the crash after the first
> > > initial refcount patch was applied.
> > > After the simple refcount patch, when iterating the retry_loop, it was
> > > processing an orphaned list with a single item over and over and
> > > eventually ran itself down to refcount == 0 and crashed like before.
> > >
> > >
> > > > > Instead, shouldn't we try marking the mid as being reconnected? Once we
> > > > > took
> > > > > a reference, let's mark mid->mid_flags with a new flag MID_RECONNECT
> > > > > under
> > > > > the GlobalMid_Lock. Then modify cifs_delete_mid() to check for this flag
> > > > > and
> > > > > do not remove the mid from the list if the flag exists.
> > > >
> > > > That could work. But then we should also use that flag to suppress the
> > > > other places where we do a list_del*, so something like this ?
> > > >
> > > > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> > > > index 50dfd9049370..b324fff33e53 100644
> > > > --- a/fs/cifs/cifsglob.h
> > > > +++ b/fs/cifs/cifsglob.h
> > > > @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
> > > >  /* Flags */
> > > >  #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response
> > > >  */
> > > >  #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> > > > +#define   MID_RECONNECT          4 /* Mid is being used during reconnect
> > > > */
> > > >
> > > Do we need this extra flag?  Can just use  mid_state ==
> > > MID_RETRY_NEEDED in the necessary places?
> >
> > That is a good point.
> > It saves us a redundant flag.
> >
> > >
> > >
> > > >  /* Types of response buffer returned from SendReceive2 */
> > > >  #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
> > > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> > > > index bdea4b3e8005..b142bd2a3ef5 100644
> > > > --- a/fs/cifs/connect.c
> > > > +++ b/fs/cifs/connect.c
> > > > @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > >         spin_lock(&GlobalMid_Lock);
> > > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > +               kref_get(&mid_entry->refcount);
> > > > +               mid_entry->mid_flags |= MID_RECONNECT;
> > > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> > >
> > > What happens if the state is wrong going in there, and it is not set
> > > to MID_RETRY_NEEDED, but yet we queue up the retry_list and run it
> > > below?
> > > Should the above 'if' check for MID_REQUEST_SUBMITTED be a WARN_ON
> > > followed by unconditionally setting the state?
> > >
> > > WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> > > /* Unconditionally set MID_RETRY_NEEDED */
> > > mid_etnry->mid_state = MID_RETRY_NEEDED;
> >
> > Yepp.
> >
> > >
> > >
> > > >                 list_move(&mid_entry->qhead, &retry_list);
> > > > @@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > >         list_for_each_safe(tmp, tmp2, &retry_list) {
> > > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > >                 list_del_init(&mid_entry->qhead);
> > > > +
> > > >                 mid_entry->callback(mid_entry);
> > > > +               cifs_mid_q_entry_release(mid_entry);
> > > >         }
> > > >
> > > >         if (cifs_rdma_enabled(server)) {
> > > > @@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
> > > >         if (mid->mid_flags & MID_DELETED)
> > > >                 printk_once(KERN_WARNING
> > > >                             "trying to dequeue a deleted mid\n");
> > > > -       else
> > > > +       else if (!(mid->mid_flags & MID_RECONNECT))
> > >
> > > Instead of the above,
> > >
> > >  -       else
> > > +          else if (mid_entry->mid_state == MID_RETRY_NEEDED)
> >
> > Yes, but mid_state != MID_RETRY_NEEDED
> >
>
> Yeah good catch on that - somehow I reversed the logic, and when I
> tested the former it blew up spectacularly almost instantaenously!
> Doh!
>
> So far the latest patch has been running for about 25 minutes, which
> is I think the longest this test has survived.
> I need a bit more runtime to be sure it's good, but if it keeps going
> I'll plan to create a patch header and submit to list by end of today.
> Thanks Ronnie and Pavel for the help tracking this down.
>
>
>
>
>
>

[-- Attachment #2: 0001-cifs-Fix-list_del-corruption-of-retry_list-in-cifs_r.patch --]
[-- Type: text/x-patch, Size: 5816 bytes --]

From 32def41aac71b227dc11a5988754cbda4ba9ad8a Mon Sep 17 00:00:00 2001
From: Dave Wysochanski <dwysocha@redhat.com>
Date: Fri, 18 Oct 2019 04:28:56 -0400
Subject: [PATCH] cifs: Fix list_del corruption of retry_list in cifs_reconnect

There's a race between the demultiplexer thread and the request
issuing thread similar to the race described in
commit 696e420bb2a6 ("cifs: Fix use after free of a mid_q_entry")
where both threads may obtain and attempt to call list_del_init
on the same mid and a list_del corruption similar to the
following will result:

[  430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
[  430.464668] ------------[ cut here ]------------
[  430.466569] kernel BUG at lib/list_debug.c:51!
[  430.468476] invalid opcode: 0000 [#1] SMP PTI
[  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
[  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32 a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff 0f 0b
[  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
[  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX: 0000000000000000
[  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI: ffff98d3b7a17908
[  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09: 0000000000000285
[  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12: ffff98d3aabb89c0
[  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15: ffff98d3b24c4480
[  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000) knlGS:0000000000000000
[  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4: 00000000000406f0
[  430.510426] Call Trace:
[  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
[  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
[  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
[  430.517452]  ? try_to_wake_up+0x212/0x650
[  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
[  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
[  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[  430.525116]  kthread+0xfb/0x130
[  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
[  430.528514]  ? kthread_park+0x90/0x90
[  430.530019]  ret_from_fork+0x35/0x40

To fix the above, inside cifs_reconnect unconditionally set the
state to MID_RETRY_NEEDED, and then take a reference before we
move any mid_q_entry on server->pending_mid_q to the temporary
retry_list.  Then while processing retry_list drop the reference
after the mid_q_entry callback has been completed.  In the code
paths for request issuing thread, avoid calling list_del_init
if we notice mid->mid_state != MID_RETRY_NEEDED, avoiding the
race and duplicate call to list_del_init.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/cifs/connect.c   | 18 +++++++++++-------
 fs/cifs/transport.c |  6 ++++--
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index a64dfa95a925..c8b8d4efe5a4 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,8 +564,9 @@ static inline int reconn_setup_dfs_targets(struct cifs_sb_info *cifs_sb,
 	spin_lock(&GlobalMid_Lock);
 	list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
 		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
-		if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
-			mid_entry->mid_state = MID_RETRY_NEEDED;
+		kref_get(&mid_entry->refcount);
+		WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
+		mid_entry->mid_state = MID_RETRY_NEEDED;
 		list_move(&mid_entry->qhead, &retry_list);
 	}
 	spin_unlock(&GlobalMid_Lock);
@@ -576,6 +577,7 @@ static inline int reconn_setup_dfs_targets(struct cifs_sb_info *cifs_sb,
 		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
 		list_del_init(&mid_entry->qhead);
 		mid_entry->callback(mid_entry);
+		cifs_mid_q_entry_release(mid_entry);
 	}
 
 	if (cifs_rdma_enabled(server)) {
@@ -884,10 +886,6 @@ static inline int reconn_setup_dfs_targets(struct cifs_sb_info *cifs_sb,
 	mid->when_received = jiffies;
 #endif
 	spin_lock(&GlobalMid_Lock);
-	if (!malformed)
-		mid->mid_state = MID_RESPONSE_RECEIVED;
-	else
-		mid->mid_state = MID_RESPONSE_MALFORMED;
 	/*
 	 * Trying to handle/dequeue a mid after the send_recv()
 	 * function has finished processing it is a bug.
@@ -895,8 +893,14 @@ static inline int reconn_setup_dfs_targets(struct cifs_sb_info *cifs_sb,
 	if (mid->mid_flags & MID_DELETED)
 		printk_once(KERN_WARNING
 			    "trying to dequeue a deleted mid\n");
-	else
+	if (mid->mid_state != MID_RETRY_NEEDED)
 		list_del_init(&mid->qhead);
+
+	if (!malformed)
+		mid->mid_state = MID_RESPONSE_RECEIVED;
+	else
+		mid->mid_state = MID_RESPONSE_MALFORMED;
+
 	spin_unlock(&GlobalMid_Lock);
 }
 
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 308ad0f495e1..17a430b58673 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -173,7 +173,8 @@ void cifs_mid_q_entry_release(struct mid_q_entry *midEntry)
 cifs_delete_mid(struct mid_q_entry *mid)
 {
 	spin_lock(&GlobalMid_Lock);
-	list_del_init(&mid->qhead);
+	if (mid->mid_state != MID_RETRY_NEEDED)
+		list_del_init(&mid->qhead);
 	mid->mid_flags |= MID_DELETED;
 	spin_unlock(&GlobalMid_Lock);
 
@@ -872,7 +873,8 @@ struct mid_q_entry *
 		rc = -EHOSTDOWN;
 		break;
 	default:
-		list_del_init(&mid->qhead);
+		if (mid->mid_state != MID_RETRY_NEEDED)
+			list_del_init(&mid->qhead);
 		cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
 			 __func__, mid->mid, mid->mid_state);
 		rc = -EIO;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-18 21:21                                 ` David Wysochanski
@ 2019-10-18 21:44                                   ` David Wysochanski
  2019-10-18 22:45                                     ` Pavel Shilovskiy
  0 siblings, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-18 21:44 UTC (permalink / raw)
  To: Pavel Shilovskiy; +Cc: Ronnie Sahlberg, linux-cifs, Frank Sorenson

On Fri, Oct 18, 2019 at 5:21 PM David Wysochanski <dwysocha@redhat.com> wrote:
>
> On Fri, Oct 18, 2019 at 4:59 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
> >
> > Thanks for the good news that the patch is stable in your workload!
> >
> The attached patch I ran on top of 5.4-rc3 for over 5 hrs today on the
> reboot test - before it would crash after a few minutes tops.
>
> > The extra flag may not be necessary and we may rely on a MID state but we would need to handle two states actually: MID_RETRY_NEEDED and MID_SHUTDOWN - see clean_demultiplex_info() which is doing the same things with mid as cifs_reconnect(). Please add ref counting to both functions since they both can race with system call threads.
> >

I agree that loop has the same problem.  I can add that you're ok with
the mid_state approach.  I think the only other option is probably a
flag like Ronnie suggested.
I will have to review the state machine more when I am more alert if
you are concerned about possible subtle regressions.

> > I also think that we need to create as smaller patch as possible to avoid hidden regressions. That's why I don't think we should change IF() to WARN_ON() in the same patch and keep  it separately without the stable tag.
> >
> IMO that 'if' statement is wrong, and should be removed unless it can
> be defended.  Why are we _conditionally_ setting the state to
> MID_RETRY_NEEDED in the same loop as we're putting mids on retry_list?
>  What's the state machine supposed to be doing if it's ambiguous?
>
> > Another general thought is that including extra logic into the MID state may complicate the code. Having a flag like MID_QUEUED would reflect the meaning more straightforward: if mis is queued then de-queue it (aka remove it from the list), else - skip this step. This may be changed later if you think this will complicate the small stable patch.
> >
>
> You all know better than me.  I'll take another look next week and
> look forward to more discussion.
>
> > --
> > Best regards,
> > Pavel Shilovsky
> >
> > -----Original Message-----
> > From: David Wysochanski <dwysocha@redhat.com>
> > Sent: Friday, October 18, 2019 3:12 AM
> > To: Ronnie Sahlberg <lsahlber@redhat.com>
> > Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> >
> > On Fri, Oct 18, 2019 at 5:27 AM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> > >
> > >
> > > ----- Original Message -----
> > > > From: "David Wysochanski" <dwysocha@redhat.com>
> > > > To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> > > > Cc: "Pavel Shilovskiy" <pshilov@microsoft.com>, "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > > > <sorenson@redhat.com>
> > > > Sent: Friday, 18 October, 2019 6:16:45 PM
> > > > Subject: Re: list_del corruption while iterating retry_list in
> > > > cifs_reconnect still seen on 5.4-rc3
> > > >
> > > > On Thu, Oct 17, 2019 at 6:53 PM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > > Good comments.
> > > New version of the patch, please test and see comments inline below
> > >
> > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> > > bdea4b3e8005..8a78358693a5 100644
> > > --- a/fs/cifs/connect.c
> > > +++ b/fs/cifs/connect.c
> > > @@ -564,8 +564,13 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         spin_lock(&GlobalMid_Lock);
> > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > -               if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > -                       mid_entry->mid_state = MID_RETRY_NEEDED;
> > > +               kref_get(&mid_entry->refcount);
> > > +               WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> > > +               /*
> > > +                * Set MID_RETRY_NEEDED to prevent the demultiplex loop from
> > > +                * removing us, or our neighbours, from the linked list.
> > > +                */
> > > +               mid_entry->mid_state = MID_RETRY_NEEDED;
> > >                 list_move(&mid_entry->qhead, &retry_list);
> > >         }
> > >         spin_unlock(&GlobalMid_Lock);
> > > @@ -575,7 +580,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         list_for_each_safe(tmp, tmp2, &retry_list) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > >                 list_del_init(&mid_entry->qhead);
> > > +
> > >                 mid_entry->callback(mid_entry);
> > > +               cifs_mid_q_entry_release(mid_entry);
> > >         }
> > >
> > >         if (cifs_rdma_enabled(server)) { @@ -895,7 +902,7 @@
> > > dequeue_mid(struct mid_q_entry *mid, bool malformed)
> > >         if (mid->mid_flags & MID_DELETED)
> > >                 printk_once(KERN_WARNING
> > >                             "trying to dequeue a deleted mid\n");
> > > -       else
> > > +       else if (mid->mid_state != MID_RETRY_NEEDED)
> >
> > I'm just using an 'if' here not 'else if'.  Do you see any issue with that?
> >
> > Actually this section needed a little of reorganizing due to the setting of the mid_state.  So I have this now for this hunk:
> >
> >         mid->when_received = jiffies;
> >  #endif
> >         spin_lock(&GlobalMid_Lock);
> > -       if (!malformed)
> > -               mid->mid_state = MID_RESPONSE_RECEIVED;
> > -       else
> > -               mid->mid_state = MID_RESPONSE_MALFORMED;
> >         /*
> >          * Trying to handle/dequeue a mid after the send_recv()
> >          * function has finished processing it is a bug.
> > @@ -895,8 +893,14 @@ static inline int reconn_setup_dfs_targets(struct cifs_sb_info *cifs_sb,
> >         if (mid->mid_flags & MID_DELETED)
> >                 printk_once(KERN_WARNING
> >                             "trying to dequeue a deleted mid\n");
> > -       else
> > +       if (mid->mid_state != MID_RETRY_NEEDED)
> >                 list_del_init(&mid->qhead);
> > +
> > +       if (!malformed)
> > +               mid->mid_state = MID_RESPONSE_RECEIVED;
> > +       else
> > +               mid->mid_state = MID_RESPONSE_MALFORMED;
> > +
> >         spin_unlock(&GlobalMid_Lock);
> >  }
> >
> >
> >
> > >                 list_del_init(&mid->qhead);
> > >         spin_unlock(&GlobalMid_Lock);
> > >  }
> > > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index
> > > 308ad0f495e1..17a430b58673 100644
> > > --- a/fs/cifs/transport.c
> > > +++ b/fs/cifs/transport.c
> > > @@ -173,7 +173,8 @@ void
> > >  cifs_delete_mid(struct mid_q_entry *mid)  {
> > >         spin_lock(&GlobalMid_Lock);
> > > -       list_del_init(&mid->qhead);
> > > +       if (mid->mid_state != MID_RETRY_NEEDED)
> > > +               list_del_init(&mid->qhead);
> > >         mid->mid_flags |= MID_DELETED;
> > >         spin_unlock(&GlobalMid_Lock);
> > >
> > > @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
> > >                 rc = -EHOSTDOWN;
> > >                 break;
> > >         default:
> > > -               list_del_init(&mid->qhead);
> > > +               if (mid->mid_state != MID_RETRY_NEEDED)
> > > +                       list_del_init(&mid->qhead);
> > >                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
> > >                          __func__, mid->mid, mid->mid_state);
> > >                 rc = -EIO;
> > >
> > >
> > >
> > >
> > >
> > >
> > > > >
> > > > > ----- Original Message -----
> > > > > > From: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > > > > > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "David Wysochanski"
> > > > > > <dwysocha@redhat.com>
> > > > > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson"
> > > > > > <sorenson@redhat.com>
> > > > > > Sent: Friday, 18 October, 2019 8:02:23 AM
> > > > > > Subject: RE: list_del corruption while iterating retry_list in
> > > > > > cifs_reconnect still seen on 5.4-rc3
> > > > > >
> > > > > > Ok, looking at cifs_delete_mid():
> > > > > >
> > > > > >  172 void
> > > > > >  173 cifs_delete_mid(struct mid_q_entry *mid)
> > > > > >  174 {
> > > > > >  175 >-------spin_lock(&GlobalMid_Lock);
> > > > > >  176 >-------list_del_init(&mid->qhead);
> > > > > >  177 >-------mid->mid_flags |= MID_DELETED;
> > > > > >  178 >-------spin_unlock(&GlobalMid_Lock);
> > > > > >  179
> > > > > >  180 >-------DeleteMidQEntry(mid);
> > > > > >  181 }
> > > > > >
> > > > > > So, regardless of us taking references on the mid itself or not,
> > > > > > the mid might be removed from the list. I also don't think
> > > > > > taking GlobalMid_Lock would help much because the next mid in
> > > > > > the list might be deleted from the list by another process while
> > > > > > cifs_reconnect is calling callback for the current mid.
> > > > > >
> > > >
> > > > Yes the above is consistent with my tracing the crash after the first
> > > > initial refcount patch was applied.
> > > > After the simple refcount patch, when iterating the retry_loop, it was
> > > > processing an orphaned list with a single item over and over and
> > > > eventually ran itself down to refcount == 0 and crashed like before.
> > > >
> > > >
> > > > > > Instead, shouldn't we try marking the mid as being reconnected? Once we
> > > > > > took
> > > > > > a reference, let's mark mid->mid_flags with a new flag MID_RECONNECT
> > > > > > under
> > > > > > the GlobalMid_Lock. Then modify cifs_delete_mid() to check for this flag
> > > > > > and
> > > > > > do not remove the mid from the list if the flag exists.
> > > > >
> > > > > That could work. But then we should also use that flag to suppress the
> > > > > other places where we do a list_del*, so something like this ?
> > > > >
> > > > > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> > > > > index 50dfd9049370..b324fff33e53 100644
> > > > > --- a/fs/cifs/cifsglob.h
> > > > > +++ b/fs/cifs/cifsglob.h
> > > > > @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
> > > > >  /* Flags */
> > > > >  #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response
> > > > >  */
> > > > >  #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> > > > > +#define   MID_RECONNECT          4 /* Mid is being used during reconnect
> > > > > */
> > > > >
> > > > Do we need this extra flag?  Can just use  mid_state ==
> > > > MID_RETRY_NEEDED in the necessary places?
> > >
> > > That is a good point.
> > > It saves us a redundant flag.
> > >
> > > >
> > > >
> > > > >  /* Types of response buffer returned from SendReceive2 */
> > > > >  #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
> > > > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> > > > > index bdea4b3e8005..b142bd2a3ef5 100644
> > > > > --- a/fs/cifs/connect.c
> > > > > +++ b/fs/cifs/connect.c
> > > > > @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > > >         spin_lock(&GlobalMid_Lock);
> > > > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > > > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > > +               kref_get(&mid_entry->refcount);
> > > > > +               mid_entry->mid_flags |= MID_RECONNECT;
> > > > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> > > >
> > > > What happens if the state is wrong going in there, and it is not set
> > > > to MID_RETRY_NEEDED, but yet we queue up the retry_list and run it
> > > > below?
> > > > Should the above 'if' check for MID_REQUEST_SUBMITTED be a WARN_ON
> > > > followed by unconditionally setting the state?
> > > >
> > > > WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> > > > /* Unconditionally set MID_RETRY_NEEDED */
> > > > mid_etnry->mid_state = MID_RETRY_NEEDED;
> > >
> > > Yepp.
> > >
> > > >
> > > >
> > > > >                 list_move(&mid_entry->qhead, &retry_list);
> > > > > @@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > > > >         list_for_each_safe(tmp, tmp2, &retry_list) {
> > > > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > >                 list_del_init(&mid_entry->qhead);
> > > > > +
> > > > >                 mid_entry->callback(mid_entry);
> > > > > +               cifs_mid_q_entry_release(mid_entry);
> > > > >         }
> > > > >
> > > > >         if (cifs_rdma_enabled(server)) {
> > > > > @@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
> > > > >         if (mid->mid_flags & MID_DELETED)
> > > > >                 printk_once(KERN_WARNING
> > > > >                             "trying to dequeue a deleted mid\n");
> > > > > -       else
> > > > > +       else if (!(mid->mid_flags & MID_RECONNECT))
> > > >
> > > > Instead of the above,
> > > >
> > > >  -       else
> > > > +          else if (mid_entry->mid_state == MID_RETRY_NEEDED)
> > >
> > > Yes, but mid_state != MID_RETRY_NEEDED
> > >
> >
> > Yeah good catch on that - somehow I reversed the logic, and when I
> > tested the former it blew up spectacularly almost instantaenously!
> > Doh!
> >
> > So far the latest patch has been running for about 25 minutes, which
> > is I think the longest this test has survived.
> > I need a bit more runtime to be sure it's good, but if it keeps going
> > I'll plan to create a patch header and submit to list by end of today.
> > Thanks Ronnie and Pavel for the help tracking this down.
> >
> >
> >
> >
> >
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-18 21:44                                   ` David Wysochanski
@ 2019-10-18 22:45                                     ` Pavel Shilovskiy
  2019-10-19 11:09                                       ` David Wysochanski
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Shilovskiy @ 2019-10-18 22:45 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Ronnie Sahlberg, linux-cifs, Frank Sorenson

On Fri, Oct 18, 2019 at 5:21 PM David Wysochanski <dwysocha@redhat.com> wrote:
>
> On Fri, Oct 18, 2019 at 4:59 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
> >
> > Thanks for the good news that the patch is stable in your workload!
> >
> The attached patch I ran on top of 5.4-rc3 for over 5 hrs today on the 
> reboot test - before it would crash after a few minutes tops.

This is great! Thanks for verifying the fix.

> > The extra flag may not be necessary and we may rely on a MID state but we would need to handle two states actually: MID_RETRY_NEEDED and MID_SHUTDOWN - see clean_demultiplex_info() which is doing the same things with mid as cifs_reconnect(). Please add ref counting to both functions since they both can race with system call threads.
> >

> I agree that loop has the same problem.  I can add that you're ok with the mid_state approach.  I think the only other option is probably a flag like Ronnie
> suggested.
> I will have to review the state machine more when I am more alert if you are concerned about possible subtle regressions.

I am ok with both approaches as long as the stable patch is minimal. Thinking about this conditional assignment of the mid retry state: I don't think there is any case in the current code base where the WARN_ON you proposed would fire but I can't be sure about all possible stable kernel that the stable patch is going to be applied.

Another more general thought: we have cifs_delete_mid -> DeleteMidQEntry -> _cifs_mid_q_entry_release chain of calls and every function frees its own part of the mid entry. I think we should merge the last two at least. It would allow us to guarantee that holding a reference to the mid means:

1) the mid itself is valid;
2) the mid response buffer is valid;
3) the mid is in a list if it is REQUEST_SUBMITTED, RETRY_NEEDED or SHUTDOWN and is not in a list if it is ALLOCATED, RESPONSE_RECEIVED, RESPONSE_MALFORMED or FREE; the release function should remove the mid from the list or warn appropriately depending on a state of the mid.

The mid state and list location are changed only when the GlobalMid_Lock is held. In this case cifs_delete_mid is not needed too because all what it does will be done in the release function. I think this would allow to avoid all the problems discussed in this thread but looks too risky for stable.

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-18  8:16                         ` David Wysochanski
  2019-10-18  9:27                           ` Ronnie Sahlberg
@ 2019-10-19  9:44                           ` Ronnie Sahlberg
  1 sibling, 0 replies; 31+ messages in thread
From: Ronnie Sahlberg @ 2019-10-19  9:44 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Pavel Shilovskiy, linux-cifs, Frank Sorenson

Only comment is that in the header where  MID_RETRY_NEEDED is defined, please add a big comments to describe the semantics of this state and how it interacts with the reconnect logic.



----- Original Message -----
From: "David Wysochanski" <dwysocha@redhat.com>
To: "Ronnie Sahlberg" <lsahlber@redhat.com>
Cc: "Pavel Shilovskiy" <pshilov@microsoft.com>, "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
Sent: Friday, 18 October, 2019 6:16:45 PM
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3

On Thu, Oct 17, 2019 at 6:53 PM Ronnie Sahlberg <lsahlber@redhat.com> wrote:
>
>
>
>
>
> ----- Original Message -----
> > From: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "David Wysochanski" <dwysocha@redhat.com>
> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> > Sent: Friday, 18 October, 2019 8:02:23 AM
> > Subject: RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
> >
> > Ok, looking at cifs_delete_mid():
> >
> >  172 void
> >  173 cifs_delete_mid(struct mid_q_entry *mid)
> >  174 {
> >  175 >-------spin_lock(&GlobalMid_Lock);
> >  176 >-------list_del_init(&mid->qhead);
> >  177 >-------mid->mid_flags |= MID_DELETED;
> >  178 >-------spin_unlock(&GlobalMid_Lock);
> >  179
> >  180 >-------DeleteMidQEntry(mid);
> >  181 }
> >
> > So, regardless of us taking references on the mid itself or not, the mid
> > might be removed from the list. I also don't think taking GlobalMid_Lock
> > would help much because the next mid in the list might be deleted from the
> > list by another process while cifs_reconnect is calling callback for the
> > current mid.
> >

Yes the above is consistent with my tracing the crash after the first
initial refcount patch was applied.
After the simple refcount patch, when iterating the retry_loop, it was
processing an orphaned list with a single item over and over and
eventually ran itself down to refcount == 0 and crashed like before.


> > Instead, shouldn't we try marking the mid as being reconnected? Once we took
> > a reference, let's mark mid->mid_flags with a new flag MID_RECONNECT under
> > the GlobalMid_Lock. Then modify cifs_delete_mid() to check for this flag and
> > do not remove the mid from the list if the flag exists.
>
> That could work. But then we should also use that flag to suppress the other places where we do a list_del*, so something like this ?
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 50dfd9049370..b324fff33e53 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error)
>  /* Flags */
>  #define   MID_WAIT_CANCELLED    1 /* Cancelled while waiting for response */
>  #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> +#define   MID_RECONNECT          4 /* Mid is being used during reconnect */
>
Do we need this extra flag?  Can just use  mid_state ==
MID_RETRY_NEEDED in the necessary places?


>  /* Types of response buffer returned from SendReceive2 */
>  #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index bdea4b3e8005..b142bd2a3ef5 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> +               kref_get(&mid_entry->refcount);
> +               mid_entry->mid_flags |= MID_RECONNECT;
>                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
>                         mid_entry->mid_state = MID_RETRY_NEEDED;

What happens if the state is wrong going in there, and it is not set
to MID_RETRY_NEEDED, but yet we queue up the retry_list and run it
below?
Should the above 'if' check for MID_REQUEST_SUBMITTED be a WARN_ON
followed by unconditionally setting the state?

WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
/* Unconditionally set MID_RETRY_NEEDED */
mid_etnry->mid_state = MID_RETRY_NEEDED;


>                 list_move(&mid_entry->qhead, &retry_list);
> @@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                 list_del_init(&mid_entry->qhead);
> +
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>
>         if (cifs_rdma_enabled(server)) {
> @@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
>         if (mid->mid_flags & MID_DELETED)
>                 printk_once(KERN_WARNING
>                             "trying to dequeue a deleted mid\n");
> -       else
> +       else if (!(mid->mid_flags & MID_RECONNECT))

Instead of the above,

 -       else
+          else if (mid_entry->mid_state == MID_RETRY_NEEDED)
                  list_del_init(&mid->qhead);


>         spin_unlock(&GlobalMid_Lock);
>  }
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 308ad0f495e1..ba4b5ab9cf35 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -173,7 +173,8 @@ void
>  cifs_delete_mid(struct mid_q_entry *mid)
>  {
>         spin_lock(&GlobalMid_Lock);
> -       list_del_init(&mid->qhead);
> +       if (!(mid->mid_flags & MID_RECONNECT))
> +               list_del_init(&mid->qhead);

Same check as above.


>         mid->mid_flags |= MID_DELETED;
>         spin_unlock(&GlobalMid_Lock);
>
> @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
>                 rc = -EHOSTDOWN;
>                 break;
>         default:
> -               list_del_init(&mid->qhead);
> +               if (!(mid->mid_flags & MID_RECONNECT))
> +                       list_del_init(&mid->qhead);

Same check as above.

>                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
>                          __func__, mid->mid, mid->mid_state);
>                 rc = -EIO;
>
>
> >
> > --
> > Best regards,
> > Pavel Shilovsky
> >
> > -----Original Message-----
> > From: Ronnie Sahlberg <lsahlber@redhat.com>
> > Sent: Thursday, October 17, 2019 2:45 PM
> > To: David Wysochanski <dwysocha@redhat.com>
> > Cc: Pavel Shilovskiy <pshilov@microsoft.com>; linux-cifs
> > <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect
> > still seen on 5.4-rc3
> >
> > Dave, Pavel
> >
> > If it takes longer to trigger it might indicate we are on the right path but
> > there are additional places to fix.
> >
> > I still think you also need to protect the list mutate functions as well
> > using the global mutex, so something like this :
> >
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> > bdea4b3e8005..16705a855818 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         spin_lock(&GlobalMid_Lock);
> >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > +               kref_get(&mid_entry->refcount);
> >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> >                 list_move(&mid_entry->qhead, &retry_list); @@ -572,11 +573,18
> >                 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         mutex_unlock(&server->srv_mutex);
> >
> >         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > +       spin_lock(&GlobalMid_Lock);
> >         list_for_each_safe(tmp, tmp2, &retry_list) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> >                 list_del_init(&mid_entry->qhead);
> > +               spin_unlock(&GlobalMid_Lock);
> > +
> >                 mid_entry->callback(mid_entry);
> > +               cifs_mid_q_entry_release(mid_entry);
> > +
> > +               spin_lock(&GlobalMid_Lock);
> >         }
> > +       spin_unlock(&GlobalMid_Lock);
> >
> >         if (cifs_rdma_enabled(server)) {
> >                 mutex_lock(&server->srv_mutex);
> >
> >
> > ----- Original Message -----
> > From: "David Wysochanski" <dwysocha@redhat.com>
> > To: "Pavel Shilovskiy" <pshilov@microsoft.com>
> > Cc: "Ronnie Sahlberg" <lsahlber@redhat.com>, "linux-cifs"
> > <linux-cifs@vger.kernel.org>, "Frank Sorenson" <sorenson@redhat.com>
> > Sent: Friday, 18 October, 2019 6:34:53 AM
> > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect
> > still seen on 5.4-rc3
> >
> > Unfortunately that did not fix the list_del corruption.
> > It did seem to run longer but I'm not sure runtime is meaningful.
> >
> > [ 1424.215537] list_del corruption. prev->next should be ffff8d9b74c84d80,
> > but was a6787a60550c54a9 [ 1424.232688] ------------[ cut here ]------------
> > [ 1424.234535] kernel BUG at lib/list_debug.c:51!
> > [ 1424.236502] invalid opcode: 0000 [#1] SMP PTI [ 1424.238334] CPU: 5 PID:
> > 10212 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3-fix1+ #33 [
> > 1424.241489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [
> > 1424.243770] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > [ 1424.245972] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> > b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f 0b
> > [ 1424.253409] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.255576]
> > RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> > 1424.258504] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> > ffff8d9b77b57908 [ 1424.261404] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908
> > R09: 0000000000000280 [ 1424.264336] R10: ffff9a12404b3bf0 R11:
> > ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.267285] R13: ffff9a12404b3d48
> > R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.270191] FS:
> > 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> > knlGS:0000000000000000
> > [ 1424.273491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > 1424.275831] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> > 00000000000406e0 [ 1424.278733] Call Trace:
> > [ 1424.279844]  cifs_reconnect+0x268/0x620 [cifs] [ 1424.281723]
> > cifs_readv_from_socket+0x220/0x250 [cifs] [ 1424.283876]
> > cifs_read_from_socket+0x4a/0x70 [cifs] [ 1424.285922]  ?
> > try_to_wake_up+0x212/0x650 [ 1424.287595]  ? cifs_small_buf_get+0x16/0x30
> > [cifs] [ 1424.289520]  ? allocate_buffers+0x66/0x120 [cifs] [ 1424.291421]
> > cifs_demultiplex_thread+0xdc/0xc30 [cifs] [ 1424.293506]
> > kthread+0xfb/0x130 [ 1424.294789]  ? cifs_handle_standard+0x190/0x190
> > [cifs] [ 1424.296833]  ? kthread_park+0x90/0x90 [ 1424.298295]
> > ret_from_fork+0x35/0x40 [ 1424.299717] Modules linked in: cifs libdes
> > libarc4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat
> > ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat
> > nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
> > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> > ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> > virtio_balloon joydev i2c_piix4 nfsd nfs_acl lockd auth_rpcgss grace sunrpc
> > xfs libcrc32c crc32c_intel virtio_net net_failover ata_generic serio_raw
> > virtio_console virtio_blk failover pata_acpi qemu_fw_cfg [ 1424.322374] ---[
> > end trace 214af7e68b58e94b ]--- [ 1424.324305] RIP:
> > 0010:__list_del_entry_valid.cold+0x31/0x55
> > [ 1424.326551] Code: 5e 15 b5 e8 54 a3 c5 ff 0f 0b 48 c7 c7 70 5f 15
> > b5 e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 30 5f 15 b5 e8 32
> > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 f8 5e 15 b5 e8 1e a3 c5 ff 0f 0b
> > [ 1424.333874] RSP: 0018:ffff9a12404b3d38 EFLAGS: 00010246 [ 1424.335976]
> > RAX: 0000000000000054 RBX: ffff8d9b6ece1000 RCX: 0000000000000000 [
> > 1424.338842] RDX: 0000000000000000 RSI: ffff8d9b77b57908 RDI:
> > ffff8d9b77b57908 [ 1424.341668] RBP: ffff8d9b74c84d80 R08: ffff8d9b77b57908
> > R09: 0000000000000280 [ 1424.344511] R10: ffff9a12404b3bf0 R11:
> > ffff9a12404b3bf5 R12: ffff8d9b6ece11c0 [ 1424.347343] R13: ffff9a12404b3d48
> > R14: a6787a60550c54a9 R15: ffff8d9b6fcec300 [ 1424.350184] FS:
> > 0000000000000000(0000) GS:ffff8d9b77b40000(0000)
> > knlGS:0000000000000000
> > [ 1424.353394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > 1424.355699] CR2: 0000562cdf4a2000 CR3: 000000023340c000 CR4:
> > 00000000000406e0
> >
> > On Thu, Oct 17, 2019 at 3:58 PM Pavel Shilovskiy <pshilov@microsoft.com>
> > wrote:
> > >
> > >
> > > The patch looks good. Let's see if it fixes the issue in your setup.
> > >
> > > --
> > > Best regards,
> > > Pavel Shilovsky
> > >
> > > -----Original Message-----
> > > From: David Wysochanski <dwysocha@redhat.com>
> > > Sent: Thursday, October 17, 2019 12:23 PM
> > > To: Pavel Shilovskiy <pshilov@microsoft.com>
> > > Cc: Ronnie Sahlberg <lsahlber@redhat.com>; linux-cifs
> > > <linux-cifs@vger.kernel.org>; Frank Sorenson <sorenson@redhat.com>
> > > Subject: Re: list_del corruption while iterating retry_list in
> > > cifs_reconnect still seen on 5.4-rc3 On Thu, Oct 17, 2019 at 2:29 PM Pavel
> > > Shilovskiy <pshilov@microsoft.com> wrote:
> > > >
> > > > The similar solution of taking an extra reference should apply to the
> > > > case of reconnect as well. The reference should be taken during the
> > > > process of moving mid entries to the private list. Once a callback
> > > > completes, such a reference should be put back thus freeing the mid.
> > > >
> > >
> > > Ah ok very good.  The above seems consistent with the traces I'm seeing of
> > > the race.
> > > I am going to test this patch as it sounds like what you're describing and
> > > similar to what Ronnie suggested earlier:
> > >
> > > --- a/fs/cifs/connect.c
> > > +++ b/fs/cifs/connect.c
> > > @@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >         spin_lock(&GlobalMid_Lock);
> > >         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry,
> > > qhead);
> > > +               kref_get(&mid_entry->refcount);
> > >                 if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > >                         mid_entry->mid_state = MID_RETRY_NEEDED;
> > >                 list_move(&mid_entry->qhead, &retry_list); @@ -576,6 +577,7
> > >                 @@ cifs_reconnect(struct TCP_Server_Info *server)
> > >                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > >                 list_del_init(&mid_entry->qhead);
> > >                 mid_entry->callback(mid_entry);
> > > +               cifs_mid_q_entry_release(mid_entry);
> > >         }
> > >
> > >         if (cifs_rdma_enabled(server)) {
> > >
> >
> >
> > --
> > Dave Wysochanski
> > Principal Software Maintenance Engineer
> > T: 919-754-4024
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-18 22:45                                     ` Pavel Shilovskiy
@ 2019-10-19 11:09                                       ` David Wysochanski
  2019-10-21 21:54                                         ` Pavel Shilovsky
  0 siblings, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-19 11:09 UTC (permalink / raw)
  To: Pavel Shilovskiy; +Cc: Ronnie Sahlberg, linux-cifs, Frank Sorenson

On Fri, Oct 18, 2019 at 6:45 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
>
> On Fri, Oct 18, 2019 at 5:21 PM David Wysochanski <dwysocha@redhat.com> wrote:
> >
> > On Fri, Oct 18, 2019 at 4:59 PM Pavel Shilovskiy <pshilov@microsoft.com> wrote:
> > >
> > > Thanks for the good news that the patch is stable in your workload!
> > >
> > The attached patch I ran on top of 5.4-rc3 for over 5 hrs today on the
> > reboot test - before it would crash after a few minutes tops.
>
> This is great! Thanks for verifying the fix.
>
> > > The extra flag may not be necessary and we may rely on a MID state but we would need to handle two states actually: MID_RETRY_NEEDED and MID_SHUTDOWN - see clean_demultiplex_info() which is doing the same things with mid as cifs_reconnect(). Please add ref counting to both functions since they both can race with system call threads.
> > >
>
> > I agree that loop has the same problem.  I can add that you're ok with the mid_state approach.  I think the only other option is probably a flag like Ronnie
> > suggested.
> > I will have to review the state machine more when I am more alert if you are concerned about possible subtle regressions.
>
> I am ok with both approaches as long as the stable patch is minimal. Thinking about this conditional assignment of the mid retry state: I don't think there is any case in the current code base where the WARN_ON you proposed would fire but I can't be sure about all possible stable kernel that the stable patch is going to be applied.
>

Right but look at it this way.  If we conditionally set the state,
then what is preventing a duplicate list_del_init call?  Let's say we
get into the special case that you're not sure it could happen
(mid_entry->mid_state == MID_REQUEST_SUBMITTED is false), and so the
mid_state does not get set to MID_RETRY_NEEDED inside cifs_reconnect
but yet the mid gets added to retry_list.  In that case both the
cifs_reconnect code path will call list_del_init as well as the other
code paths which we're adding the conditional tests and that will
cause a blowup again because cifs_reconnect retry_list loop will end
up in a singleton list and exhaust the refcount, leading to the same
crash.  This is exactly why the refcount only patch crashed again -
it's erroneous to think it's ok to modify mid_entry->qhead without a)
taking globalMid_Lock and b) checking mid_state is what you think it
should be.  But if you're really concerned about that 'if' condition
and want to leave it, and you want a stable patch, then the extra flag
seems like the way to go.  But that has the downside that it's only
being done for stable, so a later patch will likely remove it
(presumably).  I am not sure what such policy is or if that is even
acceptable or allowed.

> Another more general thought: we have cifs_delete_mid -> DeleteMidQEntry -> _cifs_mid_q_entry_release chain of calls and every function frees its own part of the mid entry. I think we should merge the last two at least. It would allow us to guarantee that holding a reference to the mid means:
>
> 1) the mid itself is valid;
> 2) the mid response buffer is valid;
> 3) the mid is in a list if it is REQUEST_SUBMITTED, RETRY_NEEDED or SHUTDOWN and is not in a list if it is ALLOCATED, RESPONSE_RECEIVED, RESPONSE_MALFORMED or FREE; the release function should remove the mid from the list or warn appropriately depending on a state of the mid.
>
> The mid state and list location are changed only when the GlobalMid_Lock is held. In this case cifs_delete_mid is not needed too because all what it does will be done in the release function. I think this would allow to avoid all the problems discussed in this thread but looks too risky for stable.
>
> --
> Best regards,
> Pavel Shilovsky

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect
  2019-10-18 20:59                               ` Pavel Shilovskiy
  2019-10-18 21:21                                 ` David Wysochanski
@ 2019-10-19 23:35                                 ` Dave Wysochanski
  2019-10-21 22:34                                   ` Pavel Shilovsky
  1 sibling, 1 reply; 31+ messages in thread
From: Dave Wysochanski @ 2019-10-19 23:35 UTC (permalink / raw)
  To: Pavel Shilovskiy, Ronnie Sahlberg; +Cc: linux-cifs, Frank Sorenson

This is a second attempt at the fix for the list_del corruption
issue.  This patch adds a similar refcount approach in 
clean_demultiplex_info() since that was noticed.  However, for
some reason I now get the list_del corruption come back fairly
quickly (after a few minutes) and eventually a softlockup.
I have not tracked down the problem yet.


There's a race between the demultiplexer thread and the request
issuing thread similar to the race described in
commit 696e420bb2a6 ("cifs: Fix use after free of a mid_q_entry")
where both threads may obtain and attempt to call list_del_init
on the same mid and a list_del corruption similar to the
following will result:

[  430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
[  430.464668] ------------[ cut here ]------------
[  430.466569] kernel BUG at lib/list_debug.c:51!
[  430.468476] invalid opcode: 0000 [#1] SMP PTI
[  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
[  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32 a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff 0f 0b
[  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
[  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX: 0000000000000000
[  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI: ffff98d3b7a17908
[  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09: 0000000000000285
[  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12: ffff98d3aabb89c0
[  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15: ffff98d3b24c4480
[  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000) knlGS:0000000000000000
[  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4: 00000000000406f0
[  430.510426] Call Trace:
[  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
[  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
[  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
[  430.517452]  ? try_to_wake_up+0x212/0x650
[  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
[  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
[  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[  430.525116]  kthread+0xfb/0x130
[  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
[  430.528514]  ? kthread_park+0x90/0x90
[  430.530019]  ret_from_fork+0x35/0x40

To fix the above, inside cifs_reconnect unconditionally set the
state to MID_RETRY_NEEDED, and then take a reference before we
move any mid_q_entry on server->pending_mid_q to the temporary
retry_list.  Then while processing retry_list make sure we check
the state is still MID_RETRY_NEEDED while holding GlobalMid_Lock
before calling list_del_init.  Then after mid_q_entry callback
has been completed, drop the reference.  In the code paths for
request issuing thread, avoid calling list_del_init if we
notice mid->mid_state != MID_RETRY_NEEDED, avoiding the
race and duplicate call to list_del_init.  In addition to
the above MID_RETRY_NEEDED case, handle the MID_SHUTDOWN case
in a similar fashion to avoid the possibility of a similar
crash.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
---
 fs/cifs/connect.c   | 30 ++++++++++++++++++++++--------
 fs/cifs/transport.c |  8 ++++++--
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index a64dfa95a925..0327bace214d 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,8 +564,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
 	spin_lock(&GlobalMid_Lock);
 	list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
 		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
-		if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
-			mid_entry->mid_state = MID_RETRY_NEEDED;
+		kref_get(&mid_entry->refcount);
+		WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
+		mid_entry->mid_state = MID_RETRY_NEEDED;
 		list_move(&mid_entry->qhead, &retry_list);
 	}
 	spin_unlock(&GlobalMid_Lock);
@@ -574,8 +575,12 @@ cifs_reconnect(struct TCP_Server_Info *server)
 	cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
 	list_for_each_safe(tmp, tmp2, &retry_list) {
 		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
-		list_del_init(&mid_entry->qhead);
+		spin_lock(&GlobalMid_Lock);
+		if (mid_entry->mid_state == MID_RETRY_NEEDED)
+			list_del_init(&mid_entry->qhead);
+		spin_unlock(&GlobalMid_Lock);
 		mid_entry->callback(mid_entry);
+		cifs_mid_q_entry_release(mid_entry);
 	}
 
 	if (cifs_rdma_enabled(server)) {
@@ -884,10 +889,6 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
 	mid->when_received = jiffies;
 #endif
 	spin_lock(&GlobalMid_Lock);
-	if (!malformed)
-		mid->mid_state = MID_RESPONSE_RECEIVED;
-	else
-		mid->mid_state = MID_RESPONSE_MALFORMED;
 	/*
 	 * Trying to handle/dequeue a mid after the send_recv()
 	 * function has finished processing it is a bug.
@@ -895,8 +896,15 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
 	if (mid->mid_flags & MID_DELETED)
 		printk_once(KERN_WARNING
 			    "trying to dequeue a deleted mid\n");
-	else
+	if (mid->mid_state != MID_RETRY_NEEDED &&
+	    mid->mid_state != MID_SHUTDOWN)
 		list_del_init(&mid->qhead);
+
+	if (!malformed)
+		mid->mid_state = MID_RESPONSE_RECEIVED;
+	else
+		mid->mid_state = MID_RESPONSE_MALFORMED;
+
 	spin_unlock(&GlobalMid_Lock);
 }
 
@@ -966,6 +974,7 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
 		list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
 			mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
 			cifs_dbg(FYI, "Clearing mid 0x%llx\n", mid_entry->mid);
+			kref_get(&mid_entry->refcount);
 			mid_entry->mid_state = MID_SHUTDOWN;
 			list_move(&mid_entry->qhead, &dispose_list);
 		}
@@ -975,8 +984,13 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
 		list_for_each_safe(tmp, tmp2, &dispose_list) {
 			mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
 			cifs_dbg(FYI, "Callback mid 0x%llx\n", mid_entry->mid);
+			spin_lock(&GlobalMid_Lock);
+			if (mid_entry->mid_state == MID_SHUTDOWN)
+				list_del_init(&mid_entry->qhead);
+			spin_unlock(&GlobalMid_Lock);
 			list_del_init(&mid_entry->qhead);
 			mid_entry->callback(mid_entry);
+			cifs_mid_q_entry_release(mid_entry);
 		}
 		/* 1/8th of sec is more than enough time for them to exit */
 		msleep(125);
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 308ad0f495e1..d1794bd664ae 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -173,7 +173,9 @@ void
 cifs_delete_mid(struct mid_q_entry *mid)
 {
 	spin_lock(&GlobalMid_Lock);
-	list_del_init(&mid->qhead);
+	if (mid->mid_state != MID_RETRY_NEEDED &&
+	    mid->mid_state != MID_SHUTDOWN)
+		list_del_init(&mid->qhead);
 	mid->mid_flags |= MID_DELETED;
 	spin_unlock(&GlobalMid_Lock);
 
@@ -872,7 +874,9 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
 		rc = -EHOSTDOWN;
 		break;
 	default:
-		list_del_init(&mid->qhead);
+		if (mid->mid_state != MID_RETRY_NEEDED &&
+		    mid->mid_state != MID_SHUTDOWN)
+			list_del_init(&mid->qhead);
 		cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
 			 __func__, mid->mid, mid->mid_state);
 		rc = -EIO;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-19 11:09                                       ` David Wysochanski
@ 2019-10-21 21:54                                         ` Pavel Shilovsky
  2019-10-22 18:39                                           ` David Wysochanski
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Shilovsky @ 2019-10-21 21:54 UTC (permalink / raw)
  To: David Wysochanski
  Cc: Pavel Shilovskiy, Ronnie Sahlberg, linux-cifs, Frank Sorenson

[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]

сб, 19 окт. 2019 г. в 04:10, David Wysochanski <dwysocha@redhat.com>:
> Right but look at it this way.  If we conditionally set the state,
> then what is preventing a duplicate list_del_init call?  Let's say we
> get into the special case that you're not sure it could happen
> (mid_entry->mid_state == MID_REQUEST_SUBMITTED is false), and so the
> mid_state does not get set to MID_RETRY_NEEDED inside cifs_reconnect
> but yet the mid gets added to retry_list.  In that case both the
> cifs_reconnect code path will call list_del_init as well as the other
> code paths which we're adding the conditional tests and that will
> cause a blowup again because cifs_reconnect retry_list loop will end
> up in a singleton list and exhaust the refcount, leading to the same
> crash.  This is exactly why the refcount only patch crashed again -
> it's erroneous to think it's ok to modify mid_entry->qhead without a)
> taking globalMid_Lock and b) checking mid_state is what you think it
> should be.  But if you're really concerned about that 'if' condition
> and want to leave it, and you want a stable patch, then the extra flag
> seems like the way to go.  But that has the downside that it's only
> being done for stable, so a later patch will likely remove it
> (presumably).  I am not sure what such policy is or if that is even
> acceptable or allowed.

This is acceptable and it is a good practice to fix the existing issue
with the smallest possible patch and then enhance the code/fix for the
current master branch if needed. This simplify backporting a lot.

Actually looking at the code:

cifsglob.h:

1692 #define   MID_DELETED            2 /* Mid has been dequeued/deleted */

                    ^^^
Isn't "deqeueued" what we need? It seems so because it serves the same
purpose: to indicate that a request has been deleted from the pending
queue. So, I think we need to just make use of this existing flag and
mark the mid with MID_DELETED every time we remove the mid from the
pending list. Also assume moving mids from the pending lists to the
local lists in cleanup_demultiplex_info and cifs_reconnect as a
deletion too because those lists are not exposed globally and mids are
removed from those lists before the functions exit.

I made a patch which is using MID_DELETED logic and merging
DeleteMidQEntry and cifs_mid_q_entry_release into one function to
avoid possible use-after free of mid->resp_buf.

David, could you please test the attached patch in your environment? I
only did sanity testing of it.

--
Best regards,
Pavel Shilovsky

[-- Attachment #2: mid_dequeue.patch --]
[-- Type: application/octet-stream, Size: 4041 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect
  2019-10-19 23:35                                 ` [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect Dave Wysochanski
@ 2019-10-21 22:34                                   ` Pavel Shilovsky
  0 siblings, 0 replies; 31+ messages in thread
From: Pavel Shilovsky @ 2019-10-21 22:34 UTC (permalink / raw)
  To: Dave Wysochanski
  Cc: Pavel Shilovskiy, Ronnie Sahlberg, linux-cifs, Frank Sorenson

сб, 19 окт. 2019 г. в 16:39, Dave Wysochanski <dwysocha@redhat.com>:
>
> This is a second attempt at the fix for the list_del corruption
> issue.  This patch adds a similar refcount approach in
> clean_demultiplex_info() since that was noticed.  However, for
> some reason I now get the list_del corruption come back fairly
> quickly (after a few minutes) and eventually a softlockup.
> I have not tracked down the problem yet.
>

Please find a couple comments below. Not sure that they relate to the
crash you are observing.

>
> There's a race between the demultiplexer thread and the request
> issuing thread similar to the race described in
> commit 696e420bb2a6 ("cifs: Fix use after free of a mid_q_entry")
> where both threads may obtain and attempt to call list_del_init
> on the same mid and a list_del corruption similar to the
> following will result:
>
> [  430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
> [  430.464668] ------------[ cut here ]------------
> [  430.466569] kernel BUG at lib/list_debug.c:51!
> [  430.468476] invalid opcode: 0000 [#1] SMP PTI
> [  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
> [  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32 a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff 0f 0b
> [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> [  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX: 0000000000000000
> [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI: ffff98d3b7a17908
> [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09: 0000000000000285
> [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12: ffff98d3aabb89c0
> [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15: ffff98d3b24c4480
> [  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000) knlGS:0000000000000000
> [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4: 00000000000406f0
> [  430.510426] Call Trace:
> [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
> [  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
> [  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
> [  430.517452]  ? try_to_wake_up+0x212/0x650
> [  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
> [  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
> [  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
> [  430.525116]  kthread+0xfb/0x130
> [  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
> [  430.528514]  ? kthread_park+0x90/0x90
> [  430.530019]  ret_from_fork+0x35/0x40
>
> To fix the above, inside cifs_reconnect unconditionally set the
> state to MID_RETRY_NEEDED, and then take a reference before we
> move any mid_q_entry on server->pending_mid_q to the temporary
> retry_list.  Then while processing retry_list make sure we check
> the state is still MID_RETRY_NEEDED while holding GlobalMid_Lock
> before calling list_del_init.  Then after mid_q_entry callback
> has been completed, drop the reference.  In the code paths for
> request issuing thread, avoid calling list_del_init if we
> notice mid->mid_state != MID_RETRY_NEEDED, avoiding the
> race and duplicate call to list_del_init.  In addition to
> the above MID_RETRY_NEEDED case, handle the MID_SHUTDOWN case
> in a similar fashion to avoid the possibility of a similar
> crash.
>
> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
> ---
>  fs/cifs/connect.c   | 30 ++++++++++++++++++++++--------
>  fs/cifs/transport.c |  8 ++++++--
>  2 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index a64dfa95a925..0327bace214d 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,8 +564,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> -               if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> -                       mid_entry->mid_state = MID_RETRY_NEEDED;
> +               kref_get(&mid_entry->refcount);
> +               WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> +               mid_entry->mid_state = MID_RETRY_NEEDED;
>                 list_move(&mid_entry->qhead, &retry_list);
>         }
>         spin_unlock(&GlobalMid_Lock);
> @@ -574,8 +575,12 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> -               list_del_init(&mid_entry->qhead);
> +               spin_lock(&GlobalMid_Lock);
> +               if (mid_entry->mid_state == MID_RETRY_NEEDED)
> +                       list_del_init(&mid_entry->qhead);

Here you are removing the entry from the local list - it shouldn't be
conditional because the list is supposed to be empty when the function
exists.
Also once removed, we are not adding the mid back to the pending list,
so, it doesn't seem that holding a lock is required here.


> +               spin_unlock(&GlobalMid_Lock);
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>
>         if (cifs_rdma_enabled(server)) {
> @@ -884,10 +889,6 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
>         mid->when_received = jiffies;
>  #endif
>         spin_lock(&GlobalMid_Lock);
> -       if (!malformed)
> -               mid->mid_state = MID_RESPONSE_RECEIVED;
> -       else
> -               mid->mid_state = MID_RESPONSE_MALFORMED;
>         /*
>          * Trying to handle/dequeue a mid after the send_recv()
>          * function has finished processing it is a bug.
> @@ -895,8 +896,15 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
>         if (mid->mid_flags & MID_DELETED)
>                 printk_once(KERN_WARNING
>                             "trying to dequeue a deleted mid\n");
> -       else
> +       if (mid->mid_state != MID_RETRY_NEEDED &&
> +           mid->mid_state != MID_SHUTDOWN)
>                 list_del_init(&mid->qhead);
> +
> +       if (!malformed)
> +               mid->mid_state = MID_RESPONSE_RECEIVED;
> +       else
> +               mid->mid_state = MID_RESPONSE_MALFORMED;
> +
>         spin_unlock(&GlobalMid_Lock);
>  }
>
> @@ -966,6 +974,7 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
>                 list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                         cifs_dbg(FYI, "Clearing mid 0x%llx\n", mid_entry->mid);
> +                       kref_get(&mid_entry->refcount);
>                         mid_entry->mid_state = MID_SHUTDOWN;
>                         list_move(&mid_entry->qhead, &dispose_list);
>                 }
> @@ -975,8 +984,13 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
>                 list_for_each_safe(tmp, tmp2, &dispose_list) {
>                         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                         cifs_dbg(FYI, "Callback mid 0x%llx\n", mid_entry->mid);
> +                       spin_lock(&GlobalMid_Lock);
> +                       if (mid_entry->mid_state == MID_SHUTDOWN)
> +                               list_del_init(&mid_entry->qhead);
> +                       spin_unlock(&GlobalMid_Lock);
>                         list_del_init(&mid_entry->qhead)

Here list_del_init is possble called twice if mid state is SHUTDOWN.

;
>                         mid_entry->callback(mid_entry);
> +                       cifs_mid_q_entry_release(mid_entry);
>                 }
>                 /* 1/8th of sec is more than enough time for them to exit */
>                 msleep(125);
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 308ad0f495e1..d1794bd664ae 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -173,7 +173,9 @@ void
>  cifs_delete_mid(struct mid_q_entry *mid)
>  {
>         spin_lock(&GlobalMid_Lock);
> -       list_del_init(&mid->qhead);
> +       if (mid->mid_state != MID_RETRY_NEEDED &&
> +           mid->mid_state != MID_SHUTDOWN)
> +               list_del_init(&mid->qhead);
>         mid->mid_flags |= MID_DELETED;
>         spin_unlock(&GlobalMid_Lock);
>
> @@ -872,7 +874,9 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
>                 rc = -EHOSTDOWN;
>                 break;
>         default:
> -               list_del_init(&mid->qhead);
> +               if (mid->mid_state != MID_RETRY_NEEDED &&
> +                   mid->mid_state != MID_SHUTDOWN)
> +                       list_del_init(&mid->qhead);

No need to check for the state not being RETRY_NEEDED or MID_SHUTDOWN
because those cases are handled above in the switch.

>                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
>                          __func__, mid->mid, mid->mid_state);
>                 rc = -EIO;
> --
> 2.21.0
>


--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-21 21:54                                         ` Pavel Shilovsky
@ 2019-10-22 18:39                                           ` David Wysochanski
  2019-10-22 21:20                                             ` ronnie sahlberg
  0 siblings, 1 reply; 31+ messages in thread
From: David Wysochanski @ 2019-10-22 18:39 UTC (permalink / raw)
  To: Pavel Shilovsky
  Cc: Pavel Shilovskiy, Ronnie Sahlberg, linux-cifs, Frank Sorenson

On Mon, Oct 21, 2019 at 5:55 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
>
> сб, 19 окт. 2019 г. в 04:10, David Wysochanski <dwysocha@redhat.com>:
> > Right but look at it this way.  If we conditionally set the state,
> > then what is preventing a duplicate list_del_init call?  Let's say we
> > get into the special case that you're not sure it could happen
> > (mid_entry->mid_state == MID_REQUEST_SUBMITTED is false), and so the
> > mid_state does not get set to MID_RETRY_NEEDED inside cifs_reconnect
> > but yet the mid gets added to retry_list.  In that case both the
> > cifs_reconnect code path will call list_del_init as well as the other
> > code paths which we're adding the conditional tests and that will
> > cause a blowup again because cifs_reconnect retry_list loop will end
> > up in a singleton list and exhaust the refcount, leading to the same
> > crash.  This is exactly why the refcount only patch crashed again -
> > it's erroneous to think it's ok to modify mid_entry->qhead without a)
> > taking globalMid_Lock and b) checking mid_state is what you think it
> > should be.  But if you're really concerned about that 'if' condition
> > and want to leave it, and you want a stable patch, then the extra flag
> > seems like the way to go.  But that has the downside that it's only
> > being done for stable, so a later patch will likely remove it
> > (presumably).  I am not sure what such policy is or if that is even
> > acceptable or allowed.
>
> This is acceptable and it is a good practice to fix the existing issue
> with the smallest possible patch and then enhance the code/fix for the
> current master branch if needed. This simplify backporting a lot.
>
> Actually looking at the code:
>
> cifsglob.h:
>
> 1692 #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
>
>                     ^^^
> Isn't "deqeueued" what we need? It seems so because it serves the same
> purpose: to indicate that a request has been deleted from the pending
> queue. So, I think we need to just make use of this existing flag and
> mark the mid with MID_DELETED every time we remove the mid from the
> pending list. Also assume moving mids from the pending lists to the
> local lists in cleanup_demultiplex_info and cifs_reconnect as a
> deletion too because those lists are not exposed globally and mids are
> removed from those lists before the functions exit.
>
> I made a patch which is using MID_DELETED logic and merging
> DeleteMidQEntry and cifs_mid_q_entry_release into one function to
> avoid possible use-after free of mid->resp_buf.
>
> David, could you please test the attached patch in your environment? I
> only did sanity testing of it.
>
I ran 5.4-rc4 plus this patch with the reproducer, and it ran fine for
over 6 hours.
I verified 5.4-rc4 would still crash too - at first I wasn't sure
since it took about 30 mins to crash, but it definitely crashes too
(not surprising).

Your patch seems reasonable to me and is in the spirit of the existing
code and the flag idea that Ronnie had.

To be honest when I look at the other flag (unrelated to this problem)
I am also not sure if it should be a state or a flag, but you probably
know the history on mid_state vs flag better than me.  For purposes of
this bug, I think your patch is fine and if you're wanting a stable
patch and this looks better, FWIW this is fine with me.  I think
probably as your comments earlier there is probably more refactoring
or work that can be done in this area, but is beyond the scope of a
stable patch.

Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-22 18:39                                           ` David Wysochanski
@ 2019-10-22 21:20                                             ` ronnie sahlberg
  2019-10-22 21:25                                               ` Pavel Shilovsky
  0 siblings, 1 reply; 31+ messages in thread
From: ronnie sahlberg @ 2019-10-22 21:20 UTC (permalink / raw)
  To: David Wysochanski
  Cc: Pavel Shilovsky, Pavel Shilovskiy, Ronnie Sahlberg, linux-cifs,
	Frank Sorenson

On Wed, Oct 23, 2019 at 4:40 AM David Wysochanski <dwysocha@redhat.com> wrote:
>
> On Mon, Oct 21, 2019 at 5:55 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
> >
> > сб, 19 окт. 2019 г. в 04:10, David Wysochanski <dwysocha@redhat.com>:
> > > Right but look at it this way.  If we conditionally set the state,
> > > then what is preventing a duplicate list_del_init call?  Let's say we
> > > get into the special case that you're not sure it could happen
> > > (mid_entry->mid_state == MID_REQUEST_SUBMITTED is false), and so the
> > > mid_state does not get set to MID_RETRY_NEEDED inside cifs_reconnect
> > > but yet the mid gets added to retry_list.  In that case both the
> > > cifs_reconnect code path will call list_del_init as well as the other
> > > code paths which we're adding the conditional tests and that will
> > > cause a blowup again because cifs_reconnect retry_list loop will end
> > > up in a singleton list and exhaust the refcount, leading to the same
> > > crash.  This is exactly why the refcount only patch crashed again -
> > > it's erroneous to think it's ok to modify mid_entry->qhead without a)
> > > taking globalMid_Lock and b) checking mid_state is what you think it
> > > should be.  But if you're really concerned about that 'if' condition
> > > and want to leave it, and you want a stable patch, then the extra flag
> > > seems like the way to go.  But that has the downside that it's only
> > > being done for stable, so a later patch will likely remove it
> > > (presumably).  I am not sure what such policy is or if that is even
> > > acceptable or allowed.
> >
> > This is acceptable and it is a good practice to fix the existing issue
> > with the smallest possible patch and then enhance the code/fix for the
> > current master branch if needed. This simplify backporting a lot.
> >
> > Actually looking at the code:
> >
> > cifsglob.h:
> >
> > 1692 #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> >
> >                     ^^^
> > Isn't "deqeueued" what we need? It seems so because it serves the same
> > purpose: to indicate that a request has been deleted from the pending
> > queue. So, I think we need to just make use of this existing flag and
> > mark the mid with MID_DELETED every time we remove the mid from the
> > pending list. Also assume moving mids from the pending lists to the
> > local lists in cleanup_demultiplex_info and cifs_reconnect as a
> > deletion too because those lists are not exposed globally and mids are
> > removed from those lists before the functions exit.
> >
> > I made a patch which is using MID_DELETED logic and merging
> > DeleteMidQEntry and cifs_mid_q_entry_release into one function to
> > avoid possible use-after free of mid->resp_buf.
> >
> > David, could you please test the attached patch in your environment? I
> > only did sanity testing of it.
> >
> I ran 5.4-rc4 plus this patch with the reproducer, and it ran fine for
> over 6 hours.

That is great news and sounds like it is time to get this submitted to for-next
and stable.

Can you send this as a proper patch to the list so we can get it into
steves for-next branch.
Please add a CC: Stable <stable@vger.kernel.org> to it.


I think the patch looks good so whomever sends it to the list, please add a
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>


> I verified 5.4-rc4 would still crash too - at first I wasn't sure
> since it took about 30 mins to crash, but it definitely crashes too
> (not surprising).
>
> Your patch seems reasonable to me and is in the spirit of the existing
> code and the flag idea that Ronnie had.
>
> To be honest when I look at the other flag (unrelated to this problem)
> I am also not sure if it should be a state or a flag, but you probably
> know the history on mid_state vs flag better than me.  For purposes of
> this bug, I think your patch is fine and if you're wanting a stable
> patch and this looks better, FWIW this is fine with me.  I think
> probably as your comments earlier there is probably more refactoring
> or work that can be done in this area, but is beyond the scope of a
> stable patch.
>
> Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-22 21:20                                             ` ronnie sahlberg
@ 2019-10-22 21:25                                               ` Pavel Shilovsky
  2019-10-22 21:32                                                 ` ronnie sahlberg
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Shilovsky @ 2019-10-22 21:25 UTC (permalink / raw)
  To: ronnie sahlberg
  Cc: David Wysochanski, Pavel Shilovskiy, Ronnie Sahlberg, linux-cifs,
	Frank Sorenson

Hi Ronnie,

Thanks for reviewing the patch, I will add your Reviewed-by.

The mainline version (5.4-rc4) of the patch doesn't apply cleanly to
any active stable kernel. Do you think it still needs the Stable tag?
I was going to prepare a stable version and mention all dependencies
anyway.

--
Best regards,
Pavel Shilovsky

вт, 22 окт. 2019 г. в 14:20, ronnie sahlberg <ronniesahlberg@gmail.com>:
>
> On Wed, Oct 23, 2019 at 4:40 AM David Wysochanski <dwysocha@redhat.com> wrote:
> >
> > On Mon, Oct 21, 2019 at 5:55 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
> > >
> > > сб, 19 окт. 2019 г. в 04:10, David Wysochanski <dwysocha@redhat.com>:
> > > > Right but look at it this way.  If we conditionally set the state,
> > > > then what is preventing a duplicate list_del_init call?  Let's say we
> > > > get into the special case that you're not sure it could happen
> > > > (mid_entry->mid_state == MID_REQUEST_SUBMITTED is false), and so the
> > > > mid_state does not get set to MID_RETRY_NEEDED inside cifs_reconnect
> > > > but yet the mid gets added to retry_list.  In that case both the
> > > > cifs_reconnect code path will call list_del_init as well as the other
> > > > code paths which we're adding the conditional tests and that will
> > > > cause a blowup again because cifs_reconnect retry_list loop will end
> > > > up in a singleton list and exhaust the refcount, leading to the same
> > > > crash.  This is exactly why the refcount only patch crashed again -
> > > > it's erroneous to think it's ok to modify mid_entry->qhead without a)
> > > > taking globalMid_Lock and b) checking mid_state is what you think it
> > > > should be.  But if you're really concerned about that 'if' condition
> > > > and want to leave it, and you want a stable patch, then the extra flag
> > > > seems like the way to go.  But that has the downside that it's only
> > > > being done for stable, so a later patch will likely remove it
> > > > (presumably).  I am not sure what such policy is or if that is even
> > > > acceptable or allowed.
> > >
> > > This is acceptable and it is a good practice to fix the existing issue
> > > with the smallest possible patch and then enhance the code/fix for the
> > > current master branch if needed. This simplify backporting a lot.
> > >
> > > Actually looking at the code:
> > >
> > > cifsglob.h:
> > >
> > > 1692 #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> > >
> > >                     ^^^
> > > Isn't "deqeueued" what we need? It seems so because it serves the same
> > > purpose: to indicate that a request has been deleted from the pending
> > > queue. So, I think we need to just make use of this existing flag and
> > > mark the mid with MID_DELETED every time we remove the mid from the
> > > pending list. Also assume moving mids from the pending lists to the
> > > local lists in cleanup_demultiplex_info and cifs_reconnect as a
> > > deletion too because those lists are not exposed globally and mids are
> > > removed from those lists before the functions exit.
> > >
> > > I made a patch which is using MID_DELETED logic and merging
> > > DeleteMidQEntry and cifs_mid_q_entry_release into one function to
> > > avoid possible use-after free of mid->resp_buf.
> > >
> > > David, could you please test the attached patch in your environment? I
> > > only did sanity testing of it.
> > >
> > I ran 5.4-rc4 plus this patch with the reproducer, and it ran fine for
> > over 6 hours.
>
> That is great news and sounds like it is time to get this submitted to for-next
> and stable.
>
> Can you send this as a proper patch to the list so we can get it into
> steves for-next branch.
> Please add a CC: Stable <stable@vger.kernel.org> to it.
>
>
> I think the patch looks good so whomever sends it to the list, please add a
> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
>
>
> > I verified 5.4-rc4 would still crash too - at first I wasn't sure
> > since it took about 30 mins to crash, but it definitely crashes too
> > (not surprising).
> >
> > Your patch seems reasonable to me and is in the spirit of the existing
> > code and the flag idea that Ronnie had.
> >
> > To be honest when I look at the other flag (unrelated to this problem)
> > I am also not sure if it should be a state or a flag, but you probably
> > know the history on mid_state vs flag better than me.  For purposes of
> > this bug, I think your patch is fine and if you're wanting a stable
> > patch and this looks better, FWIW this is fine with me.  I think
> > probably as your comments earlier there is probably more refactoring
> > or work that can be done in this area, but is beyond the scope of a
> > stable patch.
> >
> > Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
  2019-10-22 21:25                                               ` Pavel Shilovsky
@ 2019-10-22 21:32                                                 ` ronnie sahlberg
  0 siblings, 0 replies; 31+ messages in thread
From: ronnie sahlberg @ 2019-10-22 21:32 UTC (permalink / raw)
  To: Pavel Shilovsky
  Cc: David Wysochanski, Pavel Shilovskiy, Ronnie Sahlberg, linux-cifs,
	Frank Sorenson

On Wed, Oct 23, 2019 at 7:25 AM Pavel Shilovsky <piastryyy@gmail.com> wrote:
>
> Hi Ronnie,
>
> Thanks for reviewing the patch, I will add your Reviewed-by.
>
> The mainline version (5.4-rc4) of the patch doesn't apply cleanly to
> any active stable kernel. Do you think it still needs the Stable tag?
> I was going to prepare a stable version and mention all dependencies
> anyway.

Ok, in that case we won't need the stable tag.
Thanks.


>
> --
> Best regards,
> Pavel Shilovsky
>
> вт, 22 окт. 2019 г. в 14:20, ronnie sahlberg <ronniesahlberg@gmail.com>:
> >
> > On Wed, Oct 23, 2019 at 4:40 AM David Wysochanski <dwysocha@redhat.com> wrote:
> > >
> > > On Mon, Oct 21, 2019 at 5:55 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
> > > >
> > > > сб, 19 окт. 2019 г. в 04:10, David Wysochanski <dwysocha@redhat.com>:
> > > > > Right but look at it this way.  If we conditionally set the state,
> > > > > then what is preventing a duplicate list_del_init call?  Let's say we
> > > > > get into the special case that you're not sure it could happen
> > > > > (mid_entry->mid_state == MID_REQUEST_SUBMITTED is false), and so the
> > > > > mid_state does not get set to MID_RETRY_NEEDED inside cifs_reconnect
> > > > > but yet the mid gets added to retry_list.  In that case both the
> > > > > cifs_reconnect code path will call list_del_init as well as the other
> > > > > code paths which we're adding the conditional tests and that will
> > > > > cause a blowup again because cifs_reconnect retry_list loop will end
> > > > > up in a singleton list and exhaust the refcount, leading to the same
> > > > > crash.  This is exactly why the refcount only patch crashed again -
> > > > > it's erroneous to think it's ok to modify mid_entry->qhead without a)
> > > > > taking globalMid_Lock and b) checking mid_state is what you think it
> > > > > should be.  But if you're really concerned about that 'if' condition
> > > > > and want to leave it, and you want a stable patch, then the extra flag
> > > > > seems like the way to go.  But that has the downside that it's only
> > > > > being done for stable, so a later patch will likely remove it
> > > > > (presumably).  I am not sure what such policy is or if that is even
> > > > > acceptable or allowed.
> > > >
> > > > This is acceptable and it is a good practice to fix the existing issue
> > > > with the smallest possible patch and then enhance the code/fix for the
> > > > current master branch if needed. This simplify backporting a lot.
> > > >
> > > > Actually looking at the code:
> > > >
> > > > cifsglob.h:
> > > >
> > > > 1692 #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
> > > >
> > > >                     ^^^
> > > > Isn't "deqeueued" what we need? It seems so because it serves the same
> > > > purpose: to indicate that a request has been deleted from the pending
> > > > queue. So, I think we need to just make use of this existing flag and
> > > > mark the mid with MID_DELETED every time we remove the mid from the
> > > > pending list. Also assume moving mids from the pending lists to the
> > > > local lists in cleanup_demultiplex_info and cifs_reconnect as a
> > > > deletion too because those lists are not exposed globally and mids are
> > > > removed from those lists before the functions exit.
> > > >
> > > > I made a patch which is using MID_DELETED logic and merging
> > > > DeleteMidQEntry and cifs_mid_q_entry_release into one function to
> > > > avoid possible use-after free of mid->resp_buf.
> > > >
> > > > David, could you please test the attached patch in your environment? I
> > > > only did sanity testing of it.
> > > >
> > > I ran 5.4-rc4 plus this patch with the reproducer, and it ran fine for
> > > over 6 hours.
> >
> > That is great news and sounds like it is time to get this submitted to for-next
> > and stable.
> >
> > Can you send this as a proper patch to the list so we can get it into
> > steves for-next branch.
> > Please add a CC: Stable <stable@vger.kernel.org> to it.
> >
> >
> > I think the patch looks good so whomever sends it to the list, please add a
> > Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
> >
> >
> > > I verified 5.4-rc4 would still crash too - at first I wasn't sure
> > > since it took about 30 mins to crash, but it definitely crashes too
> > > (not surprising).
> > >
> > > Your patch seems reasonable to me and is in the spirit of the existing
> > > code and the flag idea that Ronnie had.
> > >
> > > To be honest when I look at the other flag (unrelated to this problem)
> > > I am also not sure if it should be a state or a flag, but you probably
> > > know the history on mid_state vs flag better than me.  For purposes of
> > > this bug, I think your patch is fine and if you're wanting a stable
> > > patch and this looks better, FWIW this is fine with me.  I think
> > > probably as your comments earlier there is probably more refactoring
> > > or work that can be done in this area, but is beyond the scope of a
> > > stable patch.
> > >
> > > Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, back to index

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-16 19:27 list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 David Wysochanski
2019-10-17  0:17 ` Ronnie Sahlberg
2019-10-17  9:05   ` Ronnie Sahlberg
2019-10-17 11:42     ` David Wysochanski
2019-10-17 14:08       ` Ronnie Sahlberg
2019-10-17 15:29         ` David Wysochanski
2019-10-17 18:29           ` Pavel Shilovskiy
2019-10-17 19:23             ` David Wysochanski
2019-10-17 19:58               ` Pavel Shilovskiy
2019-10-17 20:34                 ` David Wysochanski
2019-10-17 21:44                   ` Ronnie Sahlberg
2019-10-17 22:02                     ` Pavel Shilovskiy
2019-10-17 22:53                       ` Ronnie Sahlberg
2019-10-17 23:20                         ` Pavel Shilovskiy
2019-10-17 23:41                           ` Ronnie Sahlberg
2019-10-18  8:16                         ` David Wysochanski
2019-10-18  9:27                           ` Ronnie Sahlberg
2019-10-18 10:12                             ` David Wysochanski
2019-10-18 20:59                               ` Pavel Shilovskiy
2019-10-18 21:21                                 ` David Wysochanski
2019-10-18 21:44                                   ` David Wysochanski
2019-10-18 22:45                                     ` Pavel Shilovskiy
2019-10-19 11:09                                       ` David Wysochanski
2019-10-21 21:54                                         ` Pavel Shilovsky
2019-10-22 18:39                                           ` David Wysochanski
2019-10-22 21:20                                             ` ronnie sahlberg
2019-10-22 21:25                                               ` Pavel Shilovsky
2019-10-22 21:32                                                 ` ronnie sahlberg
2019-10-19 23:35                                 ` [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect Dave Wysochanski
2019-10-21 22:34                                   ` Pavel Shilovsky
2019-10-19  9:44                           ` list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 Ronnie Sahlberg

Linux-CIFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-cifs/0 linux-cifs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-cifs linux-cifs/ https://lore.kernel.org/linux-cifs \
		linux-cifs@vger.kernel.org
	public-inbox-index linux-cifs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-cifs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git