linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Wysochanski <dwysocha@redhat.com>
To: Pavel Shilovsky <piastryyy@gmail.com>
Cc: Pavel Shilovskiy <pshilov@microsoft.com>,
	Ronnie Sahlberg <lsahlber@redhat.com>,
	linux-cifs <linux-cifs@vger.kernel.org>,
	Frank Sorenson <sorenson@redhat.com>
Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
Date: Tue, 22 Oct 2019 14:39:37 -0400	[thread overview]
Message-ID: <CALF+zOkji2d7=WJcSmPKhFgm53aCb3Qxy4t1O4=W+4fOR5Qa7A@mail.gmail.com> (raw)
In-Reply-To: <CAKywueRjL=Ob1jKFyV+ApxZPXWM91aQRD8UJxe0h6kLi-iDmpA@mail.gmail.com>

On Mon, Oct 21, 2019 at 5:55 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
>
> сб, 19 окт. 2019 г. в 04:10, David Wysochanski <dwysocha@redhat.com>:
> > Right but look at it this way.  If we conditionally set the state,
> > then what is preventing a duplicate list_del_init call?  Let's say we
> > get into the special case that you're not sure it could happen
> > (mid_entry->mid_state == MID_REQUEST_SUBMITTED is false), and so the
> > mid_state does not get set to MID_RETRY_NEEDED inside cifs_reconnect
> > but yet the mid gets added to retry_list.  In that case both the
> > cifs_reconnect code path will call list_del_init as well as the other
> > code paths which we're adding the conditional tests and that will
> > cause a blowup again because cifs_reconnect retry_list loop will end
> > up in a singleton list and exhaust the refcount, leading to the same
> > crash.  This is exactly why the refcount only patch crashed again -
> > it's erroneous to think it's ok to modify mid_entry->qhead without a)
> > taking globalMid_Lock and b) checking mid_state is what you think it
> > should be.  But if you're really concerned about that 'if' condition
> > and want to leave it, and you want a stable patch, then the extra flag
> > seems like the way to go.  But that has the downside that it's only
> > being done for stable, so a later patch will likely remove it
> > (presumably).  I am not sure what such policy is or if that is even
> > acceptable or allowed.
>
> This is acceptable and it is a good practice to fix the existing issue
> with the smallest possible patch and then enhance the code/fix for the
> current master branch if needed. This simplify backporting a lot.
>
> Actually looking at the code:
>
> cifsglob.h:
>
> 1692 #define   MID_DELETED            2 /* Mid has been dequeued/deleted */
>
>                     ^^^
> Isn't "deqeueued" what we need? It seems so because it serves the same
> purpose: to indicate that a request has been deleted from the pending
> queue. So, I think we need to just make use of this existing flag and
> mark the mid with MID_DELETED every time we remove the mid from the
> pending list. Also assume moving mids from the pending lists to the
> local lists in cleanup_demultiplex_info and cifs_reconnect as a
> deletion too because those lists are not exposed globally and mids are
> removed from those lists before the functions exit.
>
> I made a patch which is using MID_DELETED logic and merging
> DeleteMidQEntry and cifs_mid_q_entry_release into one function to
> avoid possible use-after free of mid->resp_buf.
>
> David, could you please test the attached patch in your environment? I
> only did sanity testing of it.
>
I ran 5.4-rc4 plus this patch with the reproducer, and it ran fine for
over 6 hours.
I verified 5.4-rc4 would still crash too - at first I wasn't sure
since it took about 30 mins to crash, but it definitely crashes too
(not surprising).

Your patch seems reasonable to me and is in the spirit of the existing
code and the flag idea that Ronnie had.

To be honest when I look at the other flag (unrelated to this problem)
I am also not sure if it should be a state or a flag, but you probably
know the history on mid_state vs flag better than me.  For purposes of
this bug, I think your patch is fine and if you're wanting a stable
patch and this looks better, FWIW this is fine with me.  I think
probably as your comments earlier there is probably more refactoring
or work that can be done in this area, but is beyond the scope of a
stable patch.

Thanks!

  reply	other threads:[~2019-10-22 18:40 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-16 19:27 list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 David Wysochanski
2019-10-17  0:17 ` Ronnie Sahlberg
2019-10-17  9:05   ` Ronnie Sahlberg
2019-10-17 11:42     ` David Wysochanski
2019-10-17 14:08       ` Ronnie Sahlberg
2019-10-17 15:29         ` David Wysochanski
2019-10-17 18:29           ` Pavel Shilovskiy
2019-10-17 19:23             ` David Wysochanski
2019-10-17 19:58               ` Pavel Shilovskiy
2019-10-17 20:34                 ` David Wysochanski
2019-10-17 21:44                   ` Ronnie Sahlberg
2019-10-17 22:02                     ` Pavel Shilovskiy
2019-10-17 22:53                       ` Ronnie Sahlberg
2019-10-17 23:20                         ` Pavel Shilovskiy
2019-10-17 23:41                           ` Ronnie Sahlberg
2019-10-18  8:16                         ` David Wysochanski
2019-10-18  9:27                           ` Ronnie Sahlberg
2019-10-18 10:12                             ` David Wysochanski
2019-10-18 20:59                               ` Pavel Shilovskiy
2019-10-18 21:21                                 ` David Wysochanski
2019-10-18 21:44                                   ` David Wysochanski
2019-10-18 22:45                                     ` Pavel Shilovskiy
2019-10-19 11:09                                       ` David Wysochanski
2019-10-21 21:54                                         ` Pavel Shilovsky
2019-10-22 18:39                                           ` David Wysochanski [this message]
2019-10-22 21:20                                             ` ronnie sahlberg
2019-10-22 21:25                                               ` Pavel Shilovsky
2019-10-22 21:32                                                 ` ronnie sahlberg
2019-10-19 23:35                                 ` [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect Dave Wysochanski
2019-10-21 22:34                                   ` Pavel Shilovsky
2019-10-19  9:44                           ` list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 Ronnie Sahlberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALF+zOkji2d7=WJcSmPKhFgm53aCb3Qxy4t1O4=W+4fOR5Qa7A@mail.gmail.com' \
    --to=dwysocha@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=lsahlber@redhat.com \
    --cc=piastryyy@gmail.com \
    --cc=pshilov@microsoft.com \
    --cc=sorenson@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).