linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* work queue of scsi fc transports should be serialized
@ 2017-05-19  9:36 Dashi DS1 Cao
  2017-05-19 22:32 ` Bart Van Assche
  0 siblings, 1 reply; 4+ messages in thread
From: Dashi DS1 Cao @ 2017-05-19  9:36 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel

I ran into a series of crashes within scsi_remove_target in SUSE 12 SP1 (3.12.49-11-default). This will happen very easily if there is a lot of disks with many storage and host FC ports. It occurs when all the ports are timeout at the same time. 50 disks for each rports (the same 50 LUNs), 4 rports of each FC port, and 3 FC ports in my case.

[ 3837.726704] general protection fault: 0000 [#1] SMP
[ 3837.731706] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs msr nls_iso8859_1 nls_ ... 
[ 3837.799437] Supported: Yes, External
[ 3837.803011] CPU: 8 PID: 7975 Comm: kworker/8:4 Tainted: G               X 3.12.49-11-default #1
[ 3837.811706] Hardware name: LENOVO ThinkServer RD650            /ThinkServer RD650            , BIOS PB2TS395 11/15/2016
[ 3837.822479] Workqueue: fc_wq_15 fc_starget_delete [scsi_transport_fc]
[ 3837.828934] task: ffff881010804dc0 ti: ffff881010c30000 task.ti: ffff881010c30000
[ 3837.836408] RIP: 0010:[<ffffffffa0021310>]  [<ffffffffa0021310>] scsi_remove_target+0x90/0x2b0 [scsi_mod]
[ 3837.845991] RSP: 0018:ffff881010c31de0  EFLAGS: 00010003
[ 3837.851295] RAX: 0000000000000292 RBX: 2f73656369766564 RCX: 0000000000000b1b
[ 3837.858423] RDX: 2f7365636976655c RSI: 2f73656369766564 RDI: ffff88101c5cf050
[ 3837.865549] RBP: ffff880f51535060 R08: ffff880859237580 R09: ffff88101c5cf000
[ 3837.872677] R10: 0001d270ffffffc0 R11: 0001d270ffffffc0 R12: ffff881010c31e00
[ 3837.879806] R13: ffff880f4cd00800 R14: ffff88101c5ceff0 R15: 2f7365636976655c
[ 3837.886935] FS:  0000000000000000(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000
[ 3837.895015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3837.900754] CR2: 00007fdc09ba6000 CR3: 0000001019d63000 CR4: 00000000003407e0
[ 3837.907884] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3837.915008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3837.922137] Stack:
[ 3837.924150]  ffff88101c5cf010 ffff88101c5cf000 ffff88101c5cf000 0000000000000292
[ 3837.931606]  ffff880f4ff32408 ffff880f4ff32408 ffff880f51535420 ffff880f51a0ec40
[ 3837.939063]  ffff88107fc117c0 ffffe8ffffc00400 0000000000000000 0000000000000200
[ 3837.946521] Call Trace:
[ 3837.948990]  [<ffffffff810737a2>] process_one_work+0x172/0x420
[ 3837.954825]  [<ffffffff8107441a>] worker_thread+0x11a/0x3c0
[ 3837.960398]  [<ffffffff8107ad94>] kthread+0xb4/0xc0
[ 3837.965283]  [<ffffffff8152a518>] ret_from_fork+0x58/0x90
[ 3837.970681] Code: 8b 5b 10 48 83 c6 10 48 89 34 24 48 8b 13 48 39 f3 4c 8d 6b f8 4c 8d 7a f8 75 24 e9 8b 01 00 00 0f 1f 00 49 8d 5f 08 48 3b 1c 24 <49> 8b 77 08 48 8d 56 f8 0f 84 72 01 00 00 4d 89 fd 49 89 d7 41
[ 3837.990663] RIP  [<ffffffffa0021310>] scsi_remove_target+0x90/0x2b0 [scsi_mod]
[ 3837.997904]  RSP <ffff881010c31de0>

It seems there is a race of multiple "fc_starget_delete" of the same rport, thus of the same SCSI host. The race leads to the race of scsi_remove_target and it cannot be prevented by the code snippet alone, even of the most recent version:
        spin_lock_irqsave(shost->host_lock, flags);
        list_for_each_entry(starget, &shost->__targets, siblings) {
                if (starget->state == STARGET_DEL ||
                    starget->state == STARGET_REMOVE)
                        continue;
If there is a possibility that the starget is under deletion(state == STARGET_DEL), it should be possible that list_next_entry(starget, siblings) could cause a read access violation.

Anyway the crash stops when the following patch is applied:

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 454cc28..35604ed 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -440,7 +440,8 @@ static int fc_host_setup(struct transport_container *tc, struct device *dev,

        snprintf(fc_host->work_q_name, sizeof(fc_host->work_q_name),
                 "fc_wq_%d", shost->host_no);
-       fc_host->work_q = alloc_workqueue("%s", 0, 0, fc_host->work_q_name);
+       fc_host->work_q = alloc_ordered_workqueue("%s", WQ_MEM_RECLAIM,
+                                       fc_host->work_q_name);
        if (!fc_host->work_q)
                return -ENOMEM;

@@ -2559,8 +2560,11 @@ fc_rport_final_delete(struct work_struct *work)
        spin_unlock_irqrestore(shost->host_lock, flags);

        /* Delete SCSI target and sdevs */
-       if (rport->scsi_target_id != -1)
-               fc_starget_delete(&rport->stgt_delete_work);
+       if (rport->scsi_target_id != -1) {
+               fc_flush_work(shost);
+               BUG_ON(ACCESS_ONCE(rport->scsi_target_id) != -1);
+       }
+

        /*
         * Notify the driver that the rport is now dead. The LLDD will
--

1. Make the work queue of the FC scsi host single threaded.
2. Wait for the deletion rather than do it again.

Dashi Cao

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: work queue of scsi fc transports should be serialized
  2017-05-19  9:36 work queue of scsi fc transports should be serialized Dashi DS1 Cao
@ 2017-05-19 22:32 ` Bart Van Assche
  2017-05-20  8:25   ` Dashi DS1 Cao
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2017-05-19 22:32 UTC (permalink / raw)
  To: caods1, linux-scsi; +Cc: linux-kernel

On Fri, 2017-05-19 at 09:36 +0000, Dashi DS1 Cao wrote:
> It seems there is a race of multiple "fc_starget_delete" of the same rport,
> thus of the same SCSI host. The race leads to the race of scsi_remove_target
> and it cannot be prevented by the code snippet alone, even of the most recent
> version:
>         spin_lock_irqsave(shost->host_lock, flags);
>         list_for_each_entry(starget, &shost->__targets, siblings) {
>                 if (starget->state == STARGET_DEL ||
>                     starget->state == STARGET_REMOVE)
>                         continue;
> If there is a possibility that the starget is under deletion(state ==
> STARGET_DEL), it should be possible that list_next_entry(starget, siblings)
> could cause a read access violation.

Hello Dashi,

Something else must be going on. From scsi_remove_target():

restart:
	spin_lock_irqsave(shost->host_lock, flags);
	list_for_each_entry(starget, &shost->__targets, siblings) {
		if (starget->state == STARGET_DEL ||
		    starget->state == STARGET_REMOVE)
			continue;
		if (starget->dev.parent == dev || &starget->dev == dev) {
			kref_get(&starget->reap_ref);
			starget->state = STARGET_REMOVE;
			spin_unlock_irqrestore(shost->host_lock, flags);
			__scsi_remove_target(starget);
			scsi_target_reap(starget);
			goto restart;
		}
	}
	spin_unlock_irqrestore(shost->host_lock, flags);

In other words, before scsi_remove_target() decides to call
__scsi_remove_target(), it changes the target state into STARGET_REMOVE
while holding the host lock. This means that scsi_remove_target() won't
call __scsi_remove_target() twice and also that it won't invoke
list_next_entry(starget, siblings) after starget has been freed.

Bart.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: work queue of scsi fc transports should be serialized
  2017-05-19 22:32 ` Bart Van Assche
@ 2017-05-20  8:25   ` Dashi DS1 Cao
  2017-05-22 20:04     ` Martin Wilck
  0 siblings, 1 reply; 4+ messages in thread
From: Dashi DS1 Cao @ 2017-05-20  8:25 UTC (permalink / raw)
  To: Bart Van Assche, linux-scsi; +Cc: linux-kernel

On Fri, 2017-05-19 at 09:36 +0000, Dashi DS1 Cao wrote:
> It seems there is a race of multiple "fc_starget_delete" of the same 
> rport, thus of the same SCSI host. The race leads to the race of 
> scsi_remove_target and it cannot be prevented by the code snippet 
> alone, even of the most recent
> version:
>         spin_lock_irqsave(shost->host_lock, flags);
>         list_for_each_entry(starget, &shost->__targets, siblings) {
>                 if (starget->state == STARGET_DEL ||
>                     starget->state == STARGET_REMOVE)
>                         continue;
> If there is a possibility that the starget is under deletion(state == 
> STARGET_DEL), it should be possible that list_next_entry(starget, 
> siblings) could cause a read access violation.

>Hello Dashi,

>Something else must be going on. From scsi_remove_target():

>restart:
>	spin_lock_irqsave(shost->host_lock, flags);
>	list_for_each_entry(starget, &shost->__targets, siblings) {
>		if (starget->state == STARGET_DEL ||
>		    starget->state == STARGET_REMOVE)
>			continue;
>		if (starget->dev.parent == dev || &starget->dev == dev) {
>			kref_get(&starget->reap_ref);
>			starget->state = STARGET_REMOVE;
>			spin_unlock_irqrestore(shost->host_lock, flags);
>			__scsi_remove_target(starget);
>			scsi_target_reap(starget);
>			goto restart;
>		}
>	}
>	spin_unlock_irqrestore(shost->host_lock, flags);

>In other words, before scsi_remove_target() decides to call __scsi_remove_target(), it changes the target state into STARGET_REMOVE while holding the host lock. 
>This means that scsi_remove_target() won't call __scsi_remove_target() twice and also that it won't invoke list_next_entry(starget, siblings) after starget has been 
>freed.
>Bart.

In the crashes of Suse 12 sp1, the root cause is the deletion of a list node without holding the lock:
        spin_lock_irqsave(shost->host_lock, flags);
        list_for_each_entry_safe(starget, tmp, &shost->__targets, siblings) {
                if (starget->state == STARGET_DEL)
                        continue;
                if (starget->dev.parent == dev || &starget->dev == dev) {
                        /* assuming new targets arrive at the end */
                        kref_get(&starget->reap_ref);
                        spin_unlock_irqrestore(shost->host_lock, flags);

                        __scsi_remove_target(starget);
                        list_move_tail(&starget->siblings, &reap_list);  --this deletion from shost->__targets list is done without the lock.
                        spin_lock_irqsave(shost->host_lock, flags);
                 }
          }
          spin_unlock_irqrestore(shost->host_lock, flags);

A better solution is as follows, without introducing more states:

restart:
        spin_lock_irqsave(shost->host_lock, flags);
        list_for_each_entry_safe(starget, tmp, &shost->__targets, siblings) {
                if (starget->dev.parent == dev || &starget->dev == dev) {
                        /* assuming new targets arrive at the end */
                        kref_get(&starget->reap_ref);
                        list_move_tail(&starget->siblings, &reap_list);
                        spin_unlock_irqrestore(shost->host_lock, flags);
                        __scsi_remove_target(starget);
                        goto restart;
                }
        }
        spin_unlock_irqrestore(shost->host_lock, flags);
        list_for_each_entry_safe(starget, tmp, &reap_list, siblings)
                scsi_target_reap(starget);

Another place that should be modified is the scsi_transport_fc.c:
From:
        if (rport->scsi_target_id != -1)
                fc_starget_delete(&rport->stgt_delete_work);
To:
        if (rport->scsi_target_id != -1) {
                fc_flush_work(shost);
                BUG_ON(ACCESS_ONCE(rport->scsi_target_id) != -1);
        }

Regards,
Dashi Cao

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: work queue of scsi fc transports should be serialized
  2017-05-20  8:25   ` Dashi DS1 Cao
@ 2017-05-22 20:04     ` Martin Wilck
  0 siblings, 0 replies; 4+ messages in thread
From: Martin Wilck @ 2017-05-22 20:04 UTC (permalink / raw)
  To: Dashi DS1 Cao, Bart Van Assche, linux-scsi; +Cc: linux-kernel

On Sat, 2017-05-20 at 08:25 +0000, Dashi DS1 Cao wrote:
> On Fri, 2017-05-19 at 09:36 +0000, Dashi DS1 Cao wrote:
> > It seems there is a race of multiple "fc_starget_delete" of the
> > same 
> > rport, thus of the same SCSI host. The race leads to the race of 
> > scsi_remove_target and it cannot be prevented by the code snippet 
> > alone, even of the most recent
> > version:
> >         spin_lock_irqsave(shost->host_lock, flags);
> >         list_for_each_entry(starget, &shost->__targets, siblings) {
> >                 if (starget->state == STARGET_DEL ||
> >                     starget->state == STARGET_REMOVE)
> >                         continue;
> > If there is a possibility that the starget is under deletion(state
> > == 
> > STARGET_DEL), it should be possible that list_next_entry(starget, 
> > siblings) could cause a read access violation.
> > Hello Dashi,
> > Something else must be going on. From scsi_remove_target():
> > restart:
> > 	spin_lock_irqsave(shost->host_lock, flags);
> > 	list_for_each_entry(starget, &shost->__targets, siblings) {
> > 		if (starget->state == STARGET_DEL ||
> > 		    starget->state == STARGET_REMOVE)
> > 			continue;
> > 		if (starget->dev.parent == dev || &starget->dev == dev)
> > {
> > 			kref_get(&starget->reap_ref);
> > 			starget->state = STARGET_REMOVE;
> > 			spin_unlock_irqrestore(shost->host_lock,
> > flags);
> > 			__scsi_remove_target(starget);
> > 			scsi_target_reap(starget);
> > 			goto restart;
> > 		}
> > 	}
> > 	spin_unlock_irqrestore(shost->host_lock, flags);
> > In other words, before scsi_remove_target() decides to call
> > __scsi_remove_target(), it changes the target state into
> > STARGET_REMOVE while holding the host lock. 
> > This means that scsi_remove_target() won't
> > call __scsi_remove_target() twice and also that it won't invoke
> > list_next_entry(starget, siblings) after starget has been 
> > freed.
> > Bart.
> 
> In the crashes of Suse 12 sp1, the root cause is the deletion of a
> list node without holding the lock:
>         spin_lock_irqsave(shost->host_lock, flags);
>         list_for_each_entry_safe(starget, tmp, &shost->__targets,
> siblings) {
>                 if (starget->state == STARGET_DEL)
>                         continue;
>                 if (starget->dev.parent == dev || &starget->dev ==
> dev) {
>                         /* assuming new targets arrive at the end */
>                         kref_get(&starget->reap_ref);
>                         spin_unlock_irqrestore(shost->host_lock,
> flags);
> 
>                         __scsi_remove_target(starget);
>                         list_move_tail(&starget->siblings,
> &reap_list);  --this deletion from shost->__targets list is done
> without the lock.
>                         spin_lock_irqsave(shost->host_lock, flags);
>                  }
>           }
>           spin_unlock_irqrestore(shost->host_lock, flags);

I believe this is fixed in SLES12-SP1 kernel 3.12.53-60.30.1, with the
following patch:

* Mon Jan 18 2016 jthumshirn@suse.de
- scsi: restart list search after unlock in scsi_remove_target
  (bsc#944749, bsc#959257).
- Delete
  patches.fixes/0001-SCSI-Fix-hard-lockup-in-scsi_remove_target.patch.
- commit 2490876

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-05-22 20:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19  9:36 work queue of scsi fc transports should be serialized Dashi DS1 Cao
2017-05-19 22:32 ` Bart Van Assche
2017-05-20  8:25   ` Dashi DS1 Cao
2017-05-22 20:04     ` Martin Wilck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).