From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Tomas Henzl <thenzl@redhat.com>, linux-scsi@vger.kernel.org
Subject: Re: [PATCH] scsi: take module reference during async scan
Date: Mon, 07 Sep 2020 13:24:42 -0700 [thread overview]
Message-ID: <1599510282.4232.54.camel@HansenPartnership.com> (raw)
In-Reply-To: <f3095df4-4a34-1e0c-04e7-8983ffeac973@redhat.com>
On Mon, 2020-09-07 at 22:09 +0200, Tomas Henzl wrote:
> On 9/7/20 7:46 PM, James Bottomley wrote:
> > On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
> > > During an async scan the driver shost->hostt structures are used,
> > > that may cause issues when the driver is removed at that time.
> > > As protection take the module reference.
> >
> > Can I just ask what issues? Today, our module model is that
> > scsi_device_get() bumps the module refcount and therefore makes the
> > module ineligible to be removed. scsi_host_get() doesn't do this
> > because the way the host model is supposed to be coded, we can call
> > remove at any time but the module won't get freed until the last
> > put of the host. I can see we have a potential problem with
> > scsi_forget_host() racing with the async scan thread ... is that
> > what you see? What's supposed to happen is that scsi_device_get()
> > starts failing as soon as the module begins it's exit routine, so
> > if a scan is in progress, it can't add any new devices ... in
> > theory this means that the list is stable for scsi_forget_host(),
> > so knowing how that assumption is breaking would be useful.
>
> I think that the problem is that async scan uses callbacks to the
> module and when the module is being removed during scan it is not
> protected.
As I said above: the module shouldn't be freed until the scans are
completed or aborted ... I don't think we have a use after free
problem. What you show below seems to be a deadlock:
> modprobe mpt3sas && rmmod mpt3sas
>
> [ 370.031614] INFO: task rmmod:3120 blocked for more than 120
> seconds.
> [ 370.037967] Not tainted 4.18.0-193.el8.x86_64 #1
> [ 370.043105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 370.050931] rmmod D 0 3120 2460 0x00004080
> [ 370.056414] Call Trace:
> [ 370.058889] ? __schedule+0x24f/0x650
> [ 370.062554] schedule+0x2f/0xa0
> [ 370.065738] async_synchronize_cookie_domain+0xad/0x140
> [ 370.070983] ? finish_wait+0x80/0x80
> [ 370.074580] __x64_sys_delete_module+0x166/0x280
> [ 370.079198] do_syscall_64+0x5b/0x1a0
> [ 370.082876] entry_SYSCALL_64_after_hwframe+0x65/0xca
> [ 370.087946] RIP: 0033:0x7f6de460a7db
> [ 370.091534] Code: Bad RIP value.
> [ 370.094777] RSP: 002b:00007ffe9971e798 EFLAGS: 00000206 ORIG_RAX:
> 00000000000000b0
> [ 370.102341] RAX: ffffffffffffffda RBX: 00005592370d37b0 RCX:
> 00007f6de460a7db
> [ 370.109481] RDX: 000000000000000a RSI: 0000000000000800 RDI:
> 00005592370d3818
> [ 370.116606] RBP: 0000000000000000 R08: 00007ffe9971d711 R09:
> 0000000000000000
> [ 370.123748] R10: 00007f6de467c8e0 R11: 0000000000000206 R12:
> 00007ffe9971e9c0
> [ 370.130888] R13: 00007ffe99720333 R14: 00005592370d32a0 R15:
> 00005592370d37b0
This seems to be showing something different: I think the
async_synchronize_full() in delete_module is where we're stuck. That
seems to indicate something has just stopped inside the async scan code
... likely due to something reacting badly to scsi_device_get()
failing.
James
next prev parent reply other threads:[~2020-09-07 20:24 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-07 15:47 [PATCH] scsi: take module reference during async scan Tomas Henzl
2020-09-07 16:57 ` Bart Van Assche
2020-09-07 21:12 ` Tomas Henzl
2020-09-07 17:46 ` James Bottomley
2020-09-07 20:09 ` Tomas Henzl
2020-09-07 20:24 ` James Bottomley [this message]
2020-09-07 21:02 ` Tomas Henzl
2020-09-07 22:02 ` James Bottomley
2020-09-08 8:22 ` Tomas Henzl
2020-09-07 21:32 ` Douglas Gilbert
2020-09-07 22:56 ` James Bottomley
2020-09-08 14:04 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1599510282.4232.54.camel@HansenPartnership.com \
--to=james.bottomley@hansenpartnership.com \
--cc=linux-scsi@vger.kernel.org \
--cc=thenzl@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).