linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Shilovsky <piastryyy@gmail.com>
To: Dave Wysochanski <dwysocha@redhat.com>
Cc: Pavel Shilovskiy <pshilov@microsoft.com>,
	Ronnie Sahlberg <lsahlber@redhat.com>,
	linux-cifs <linux-cifs@vger.kernel.org>,
	Frank Sorenson <sorenson@redhat.com>
Subject: Re: [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect
Date: Mon, 21 Oct 2019 15:34:29 -0700	[thread overview]
Message-ID: <CAKywueSdK1+Mg0FLea5Vu98A016NphfbSW8TEsvLEf9iHS3ZaA@mail.gmail.com> (raw)
In-Reply-To: <20191019233505.14191-1-dwysocha@redhat.com>

сб, 19 окт. 2019 г. в 16:39, Dave Wysochanski <dwysocha@redhat.com>:
>
> This is a second attempt at the fix for the list_del corruption
> issue.  This patch adds a similar refcount approach in
> clean_demultiplex_info() since that was noticed.  However, for
> some reason I now get the list_del corruption come back fairly
> quickly (after a few minutes) and eventually a softlockup.
> I have not tracked down the problem yet.
>

Please find a couple comments below. Not sure that they relate to the
crash you are observing.

>
> There's a race between the demultiplexer thread and the request
> issuing thread similar to the race described in
> commit 696e420bb2a6 ("cifs: Fix use after free of a mid_q_entry")
> where both threads may obtain and attempt to call list_del_init
> on the same mid and a list_del corruption similar to the
> following will result:
>
> [  430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
> [  430.464668] ------------[ cut here ]------------
> [  430.466569] kernel BUG at lib/list_debug.c:51!
> [  430.468476] invalid opcode: 0000 [#1] SMP PTI
> [  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
> [  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32 a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff 0f 0b
> [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
> [  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX: 0000000000000000
> [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI: ffff98d3b7a17908
> [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09: 0000000000000285
> [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12: ffff98d3aabb89c0
> [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15: ffff98d3b24c4480
> [  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000) knlGS:0000000000000000
> [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4: 00000000000406f0
> [  430.510426] Call Trace:
> [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
> [  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
> [  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
> [  430.517452]  ? try_to_wake_up+0x212/0x650
> [  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
> [  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
> [  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
> [  430.525116]  kthread+0xfb/0x130
> [  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
> [  430.528514]  ? kthread_park+0x90/0x90
> [  430.530019]  ret_from_fork+0x35/0x40
>
> To fix the above, inside cifs_reconnect unconditionally set the
> state to MID_RETRY_NEEDED, and then take a reference before we
> move any mid_q_entry on server->pending_mid_q to the temporary
> retry_list.  Then while processing retry_list make sure we check
> the state is still MID_RETRY_NEEDED while holding GlobalMid_Lock
> before calling list_del_init.  Then after mid_q_entry callback
> has been completed, drop the reference.  In the code paths for
> request issuing thread, avoid calling list_del_init if we
> notice mid->mid_state != MID_RETRY_NEEDED, avoiding the
> race and duplicate call to list_del_init.  In addition to
> the above MID_RETRY_NEEDED case, handle the MID_SHUTDOWN case
> in a similar fashion to avoid the possibility of a similar
> crash.
>
> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
> ---
>  fs/cifs/connect.c   | 30 ++++++++++++++++++++++--------
>  fs/cifs/transport.c |  8 ++++++--
>  2 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index a64dfa95a925..0327bace214d 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -564,8 +564,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         spin_lock(&GlobalMid_Lock);
>         list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> -               if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> -                       mid_entry->mid_state = MID_RETRY_NEEDED;
> +               kref_get(&mid_entry->refcount);
> +               WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
> +               mid_entry->mid_state = MID_RETRY_NEEDED;
>                 list_move(&mid_entry->qhead, &retry_list);
>         }
>         spin_unlock(&GlobalMid_Lock);
> @@ -574,8 +575,12 @@ cifs_reconnect(struct TCP_Server_Info *server)
>         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
>         list_for_each_safe(tmp, tmp2, &retry_list) {
>                 mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> -               list_del_init(&mid_entry->qhead);
> +               spin_lock(&GlobalMid_Lock);
> +               if (mid_entry->mid_state == MID_RETRY_NEEDED)
> +                       list_del_init(&mid_entry->qhead);

Here you are removing the entry from the local list - it shouldn't be
conditional because the list is supposed to be empty when the function
exists.
Also once removed, we are not adding the mid back to the pending list,
so, it doesn't seem that holding a lock is required here.


> +               spin_unlock(&GlobalMid_Lock);
>                 mid_entry->callback(mid_entry);
> +               cifs_mid_q_entry_release(mid_entry);
>         }
>
>         if (cifs_rdma_enabled(server)) {
> @@ -884,10 +889,6 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
>         mid->when_received = jiffies;
>  #endif
>         spin_lock(&GlobalMid_Lock);
> -       if (!malformed)
> -               mid->mid_state = MID_RESPONSE_RECEIVED;
> -       else
> -               mid->mid_state = MID_RESPONSE_MALFORMED;
>         /*
>          * Trying to handle/dequeue a mid after the send_recv()
>          * function has finished processing it is a bug.
> @@ -895,8 +896,15 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
>         if (mid->mid_flags & MID_DELETED)
>                 printk_once(KERN_WARNING
>                             "trying to dequeue a deleted mid\n");
> -       else
> +       if (mid->mid_state != MID_RETRY_NEEDED &&
> +           mid->mid_state != MID_SHUTDOWN)
>                 list_del_init(&mid->qhead);
> +
> +       if (!malformed)
> +               mid->mid_state = MID_RESPONSE_RECEIVED;
> +       else
> +               mid->mid_state = MID_RESPONSE_MALFORMED;
> +
>         spin_unlock(&GlobalMid_Lock);
>  }
>
> @@ -966,6 +974,7 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
>                 list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                         cifs_dbg(FYI, "Clearing mid 0x%llx\n", mid_entry->mid);
> +                       kref_get(&mid_entry->refcount);
>                         mid_entry->mid_state = MID_SHUTDOWN;
>                         list_move(&mid_entry->qhead, &dispose_list);
>                 }
> @@ -975,8 +984,13 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
>                 list_for_each_safe(tmp, tmp2, &dispose_list) {
>                         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                         cifs_dbg(FYI, "Callback mid 0x%llx\n", mid_entry->mid);
> +                       spin_lock(&GlobalMid_Lock);
> +                       if (mid_entry->mid_state == MID_SHUTDOWN)
> +                               list_del_init(&mid_entry->qhead);
> +                       spin_unlock(&GlobalMid_Lock);
>                         list_del_init(&mid_entry->qhead)

Here list_del_init is possble called twice if mid state is SHUTDOWN.

;
>                         mid_entry->callback(mid_entry);
> +                       cifs_mid_q_entry_release(mid_entry);
>                 }
>                 /* 1/8th of sec is more than enough time for them to exit */
>                 msleep(125);
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 308ad0f495e1..d1794bd664ae 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -173,7 +173,9 @@ void
>  cifs_delete_mid(struct mid_q_entry *mid)
>  {
>         spin_lock(&GlobalMid_Lock);
> -       list_del_init(&mid->qhead);
> +       if (mid->mid_state != MID_RETRY_NEEDED &&
> +           mid->mid_state != MID_SHUTDOWN)
> +               list_del_init(&mid->qhead);
>         mid->mid_flags |= MID_DELETED;
>         spin_unlock(&GlobalMid_Lock);
>
> @@ -872,7 +874,9 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
>                 rc = -EHOSTDOWN;
>                 break;
>         default:
> -               list_del_init(&mid->qhead);
> +               if (mid->mid_state != MID_RETRY_NEEDED &&
> +                   mid->mid_state != MID_SHUTDOWN)
> +                       list_del_init(&mid->qhead);

No need to check for the state not being RETRY_NEEDED or MID_SHUTDOWN
because those cases are handled above in the switch.

>                 cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
>                          __func__, mid->mid, mid->mid_state);
>                 rc = -EIO;
> --
> 2.21.0
>


--
Best regards,
Pavel Shilovsky

  reply	other threads:[~2019-10-21 22:34 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-16 19:27 list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 David Wysochanski
2019-10-17  0:17 ` Ronnie Sahlberg
2019-10-17  9:05   ` Ronnie Sahlberg
2019-10-17 11:42     ` David Wysochanski
2019-10-17 14:08       ` Ronnie Sahlberg
2019-10-17 15:29         ` David Wysochanski
2019-10-17 18:29           ` Pavel Shilovskiy
2019-10-17 19:23             ` David Wysochanski
2019-10-17 19:58               ` Pavel Shilovskiy
2019-10-17 20:34                 ` David Wysochanski
2019-10-17 21:44                   ` Ronnie Sahlberg
2019-10-17 22:02                     ` Pavel Shilovskiy
2019-10-17 22:53                       ` Ronnie Sahlberg
2019-10-17 23:20                         ` Pavel Shilovskiy
2019-10-17 23:41                           ` Ronnie Sahlberg
2019-10-18  8:16                         ` David Wysochanski
2019-10-18  9:27                           ` Ronnie Sahlberg
2019-10-18 10:12                             ` David Wysochanski
2019-10-18 20:59                               ` Pavel Shilovskiy
2019-10-18 21:21                                 ` David Wysochanski
2019-10-18 21:44                                   ` David Wysochanski
2019-10-18 22:45                                     ` Pavel Shilovskiy
2019-10-19 11:09                                       ` David Wysochanski
2019-10-21 21:54                                         ` Pavel Shilovsky
2019-10-22 18:39                                           ` David Wysochanski
2019-10-22 21:20                                             ` ronnie sahlberg
2019-10-22 21:25                                               ` Pavel Shilovsky
2019-10-22 21:32                                                 ` ronnie sahlberg
2019-10-19 23:35                                 ` [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect Dave Wysochanski
2019-10-21 22:34                                   ` Pavel Shilovsky [this message]
2019-10-19  9:44                           ` list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 Ronnie Sahlberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKywueSdK1+Mg0FLea5Vu98A016NphfbSW8TEsvLEf9iHS3ZaA@mail.gmail.com \
    --to=piastryyy@gmail.com \
    --cc=dwysocha@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=lsahlber@redhat.com \
    --cc=pshilov@microsoft.com \
    --cc=sorenson@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).