All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix id computation in scsi_eh_target_reset()
@ 2010-10-14 14:04 Hillf Danton
  2010-10-18  5:33 ` Mike Christie
  0 siblings, 1 reply; 14+ messages in thread
From: Hillf Danton @ 2010-10-14 14:04 UTC (permalink / raw)
  To: linux-scsi

There seems that the id of the tgtr_scmd processed by
scsi_try_target_reset() does not match the id in case that
SUCCESS is returned by scsi_try_target_reset().

The mismatch is fixed, and the loop to find the next highest
is also simplified.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
---

--- a/drivers/scsi/scsi_error.c	2010-09-13 07:07:38.000000000 +0800
+++ b/drivers/scsi/scsi_error.c	2010-10-14 21:45:56.000000000 +0800
@@ -1173,14 +1173,19 @@ static int scsi_eh_target_reset(struct S
 			list_for_each_entry(scmd, work_q, eh_entry) {
 				if (scmd_id(scmd) > id &&
 				    (!tgtr_scmd ||
-				     scmd_id(tgtr_scmd) > scmd_id(scmd)))
+				     scmd_id(tgtr_scmd) > scmd_id(scmd))) {
 						tgtr_scmd = scmd;
+						if (1 + id == scmd_id(scmd))
+							break;
+				}
 			}
 		}
 		if (!tgtr_scmd)
 			/* no more commands, that's it */
 			break;

+		id = scmd_id(tgtr_scmd);
+
 		SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset "
 						  "to target %d\n",
 						  current->comm, id));

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-14 14:04 [PATCH] fix id computation in scsi_eh_target_reset() Hillf Danton
@ 2010-10-18  5:33 ` Mike Christie
  2010-10-25 20:53   ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Christie @ 2010-10-18  5:33 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-scsi

On 10/14/2010 09:04 AM, Hillf Danton wrote:
> There seems that the id of the tgtr_scmd processed by
> scsi_try_target_reset() does not match the id in case that
> SUCCESS is returned by scsi_try_target_reset().
>
> The mismatch is fixed, and the loop to find the next highest
> is also simplified.
>
> Signed-off-by: Hillf Danton<dhillf@gmail.com>
> ---
>
> --- a/drivers/scsi/scsi_error.c	2010-09-13 07:07:38.000000000 +0800
> +++ b/drivers/scsi/scsi_error.c	2010-10-14 21:45:56.000000000 +0800
> @@ -1173,14 +1173,19 @@ static int scsi_eh_target_reset(struct S
>   			list_for_each_entry(scmd, work_q, eh_entry) {
>   				if (scmd_id(scmd)>  id&&
>   				(!tgtr_scmd ||
> -				     scmd_id(tgtr_scmd)>  scmd_id(scmd)))
> +				     scmd_id(tgtr_scmd)>  scmd_id(scmd))) {
>   						tgtr_scmd = scmd;
> +						if (1 + id == scmd_id(scmd))
> +							break;
> +				}
>   			}
>   		}
>   		if (!tgtr_scmd)
>   			/* no more commands, that's it */
>   			break;
>
> +		id = scmd_id(tgtr_scmd);
> +
>   		SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset "
>   						  "to target %d\n",
>   						  current->comm, id));


Thanks. Fixes the extra target resets I have been seeing and commands 
not getting cleaned up properly.

Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-18  5:33 ` Mike Christie
@ 2010-10-25 20:53   ` James Bottomley
  2010-10-26 14:04     ` Hillf Danton
  2010-10-26 23:07     ` Mike Christie
  0 siblings, 2 replies; 14+ messages in thread
From: James Bottomley @ 2010-10-25 20:53 UTC (permalink / raw)
  To: Mike Christie; +Cc: Hillf Danton, linux-scsi

On Mon, 2010-10-18 at 00:33 -0500, Mike Christie wrote:
> On 10/14/2010 09:04 AM, Hillf Danton wrote:
> > There seems that the id of the tgtr_scmd processed by
> > scsi_try_target_reset() does not match the id in case that
> > SUCCESS is returned by scsi_try_target_reset().
> >
> > The mismatch is fixed, and the loop to find the next highest
> > is also simplified.
> >
> > Signed-off-by: Hillf Danton<dhillf@gmail.com>
> > ---
> >
> > --- a/drivers/scsi/scsi_error.c	2010-09-13 07:07:38.000000000 +0800
> > +++ b/drivers/scsi/scsi_error.c	2010-10-14 21:45:56.000000000 +0800
> > @@ -1173,14 +1173,19 @@ static int scsi_eh_target_reset(struct S
> >   			list_for_each_entry(scmd, work_q, eh_entry) {
> >   				if (scmd_id(scmd)>  id&&
> >   				(!tgtr_scmd ||
> > -				     scmd_id(tgtr_scmd)>  scmd_id(scmd)))
> > +				     scmd_id(tgtr_scmd)>  scmd_id(scmd))) {
> >   						tgtr_scmd = scmd;
> > +						if (1 + id == scmd_id(scmd))
> > +							break;
> > +				}
> >   			}
> >   		}
> >   		if (!tgtr_scmd)
> >   			/* no more commands, that's it */
> >   			break;
> >
> > +		id = scmd_id(tgtr_scmd);
> > +
> >   		SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset "
> >   						  "to target %d\n",
> >   						  current->comm, id));
> 
> 
> Thanks. Fixes the extra target resets I have been seeing and commands 
> not getting cleaned up properly.
> 
> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>

The routine is still a bit fiendishly complicated, though.  What about
unwinding all of the id processing? it's only required if we want to do
target resets in numerical order (a number which has no real meaning on
a hot plug bus anyway).

What about just a simple list processor that chunks down the work_q,
resets the target and pulls all equal targets out, like this?

James

---

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1de30eb..484b128 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1156,51 +1156,40 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
 				struct list_head *work_q,
 				struct list_head *done_q)
 {
-	struct scsi_cmnd *scmd, *tgtr_scmd, *next;
-	unsigned int id = 0;
-	int rtn;
+	LIST_HEAD(tmp_list);
 
-	do {
-		tgtr_scmd = NULL;
-		list_for_each_entry(scmd, work_q, eh_entry) {
-			if (id == scmd_id(scmd)) {
-				tgtr_scmd = scmd;
-				break;
-			}
-		}
-		if (!tgtr_scmd) {
-			/* not one exactly equal; find the next highest */
-			list_for_each_entry(scmd, work_q, eh_entry) {
-				if (scmd_id(scmd) > id &&
-				    (!tgtr_scmd ||
-				     scmd_id(tgtr_scmd) > scmd_id(scmd)))
-						tgtr_scmd = scmd;
-			}
-		}
-		if (!tgtr_scmd)
-			/* no more commands, that's it */
-			break;
+	list_splice(work_q, &tmp_list);
+
+	while (!list_empty(&tmp_list)) {
+		struct scsi_cmnd *next, *scmd; 
+		int rtn;
+		unsigned int id;
+
+		scmd = list_entry(tmp_list.next, struct scsi_cmnd, eh_entry);
+		id = scmd_id(scmd);
 
 		SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset "
 						  "to target %d\n",
 						  current->comm, id));
-		rtn = scsi_try_target_reset(tgtr_scmd);
-		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
-			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-				if (id == scmd_id(scmd))
-					if (!scsi_device_online(scmd->device) ||
-					    rtn == FAST_IO_FAIL ||
-					    !scsi_eh_tur(tgtr_scmd))
-						scsi_eh_finish_cmd(scmd,
-								   done_q);
-			}
-		} else
+		rtn = scsi_try_target_reset(scmd);
+		if (rtn != SUCCESS && rtn != FAST_IO_FAIL)
 			SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Target reset"
 							  " failed target: "
 							  "%d\n",
 							  current->comm, id));
-		id++;
-	} while(id != 0);
+		list_for_each_entry_safe(scmd, next, &tmp_list, eh_entry) {
+			if (scmd_id(scmd) != id)
+				continue;
+
+			if ((rtn == SUCCESS || rtn == FAST_IO_FAIL)
+			    && (!scsi_device_online(scmd->device) ||
+				 rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)))
+				scsi_eh_finish_cmd(scmd, done_q);
+			else
+				/* push back on work queue for further processing */
+				list_move(&scmd->eh_entry, work_q);
+		}
+	}
 
 	return list_empty(work_q);
 }



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-25 20:53   ` James Bottomley
@ 2010-10-26 14:04     ` Hillf Danton
  2010-10-26 20:13       ` Mike Christie
  2010-10-26 23:07     ` Mike Christie
  1 sibling, 1 reply; 14+ messages in thread
From: Hillf Danton @ 2010-10-26 14:04 UTC (permalink / raw)
  To: James Bottomley; +Cc: Mike Christie, linux-scsi

On Tue, Oct 26, 2010 at 4:53 AM, James Bottomley
<James.Bottomley@suse.de> wrote:
> On Mon, 2010-10-18 at 00:33 -0500, Mike Christie wrote:
>> On 10/14/2010 09:04 AM, Hillf Danton wrote:
>> > There seems that the id of the tgtr_scmd processed by
>> > scsi_try_target_reset() does not match the id in case that
>> > SUCCESS is returned by scsi_try_target_reset().
>> >
>> > The mismatch is fixed, and the loop to find the next highest
>> > is also simplified.
>> >
>> > Signed-off-by: Hillf Danton<dhillf@gmail.com>
>> > ---
>> >
>> > --- a/drivers/scsi/scsi_error.c     2010-09-13 07:07:38.000000000 +0800
>> > +++ b/drivers/scsi/scsi_error.c     2010-10-14 21:45:56.000000000 +0800
>> > @@ -1173,14 +1173,19 @@ static int scsi_eh_target_reset(struct S
>> >                     list_for_each_entry(scmd, work_q, eh_entry) {
>> >                             if (scmd_id(scmd)>  id&&
>> >                             (!tgtr_scmd ||
>> > -                                scmd_id(tgtr_scmd)>  scmd_id(scmd)))
>> > +                                scmd_id(tgtr_scmd)>  scmd_id(scmd))) {
>> >                                             tgtr_scmd = scmd;
>> > +                                           if (1 + id == scmd_id(scmd))
>> > +                                                   break;
>> > +                           }
>> >                     }
>> >             }
>> >             if (!tgtr_scmd)
>> >                     /* no more commands, that's it */
>> >                     break;
>> >
>> > +           id = scmd_id(tgtr_scmd);
>> > +
>> >             SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset "
>> >                                               "to target %d\n",
>> >                                               current->comm, id));
>>
>>
>> Thanks. Fixes the extra target resets I have been seeing and commands
>> not getting cleaned up properly.
>>
>> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
>
> The routine is still a bit fiendishly complicated, though.  What about
> unwinding all of the id processing? it's only required if we want to do
> target resets in numerical order (a number which has no real meaning on
> a hot plug bus anyway).
>
> What about just a simple list processor that chunks down the work_q,
> resets the target and pulls all equal targets out, like this?
>
> James

The work looks fine. Thanks.

Reviewed-by: Hillf Danton <dhillf@gmail.com>

>
> ---
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 1de30eb..484b128 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -1156,51 +1156,40 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
>                                struct list_head *work_q,
>                                struct list_head *done_q)
>  {
> -       struct scsi_cmnd *scmd, *tgtr_scmd, *next;
> -       unsigned int id = 0;
> -       int rtn;
> +       LIST_HEAD(tmp_list);
>
> -       do {
> -               tgtr_scmd = NULL;
> -               list_for_each_entry(scmd, work_q, eh_entry) {
> -                       if (id == scmd_id(scmd)) {
> -                               tgtr_scmd = scmd;
> -                               break;
> -                       }
> -               }
> -               if (!tgtr_scmd) {
> -                       /* not one exactly equal; find the next highest */
> -                       list_for_each_entry(scmd, work_q, eh_entry) {
> -                               if (scmd_id(scmd) > id &&
> -                                   (!tgtr_scmd ||
> -                                    scmd_id(tgtr_scmd) > scmd_id(scmd)))
> -                                               tgtr_scmd = scmd;
> -                       }
> -               }
> -               if (!tgtr_scmd)
> -                       /* no more commands, that's it */
> -                       break;
> +       list_splice(work_q, &tmp_list);
> +
> +       while (!list_empty(&tmp_list)) {
> +               struct scsi_cmnd *next, *scmd;
> +               int rtn;
> +               unsigned int id;
> +
> +               scmd = list_entry(tmp_list.next, struct scsi_cmnd, eh_entry);
> +               id = scmd_id(scmd);
>
>                SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset "
>                                                  "to target %d\n",
>                                                  current->comm, id));
> -               rtn = scsi_try_target_reset(tgtr_scmd);
> -               if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
> -                       list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> -                               if (id == scmd_id(scmd))
> -                                       if (!scsi_device_online(scmd->device) ||
> -                                           rtn == FAST_IO_FAIL ||
> -                                           !scsi_eh_tur(tgtr_scmd))
> -                                               scsi_eh_finish_cmd(scmd,
> -                                                                  done_q);
> -                       }
> -               } else
> +               rtn = scsi_try_target_reset(scmd);
> +               if (rtn != SUCCESS && rtn != FAST_IO_FAIL)
>                        SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Target reset"
>                                                          " failed target: "
>                                                          "%d\n",
>                                                          current->comm, id));
> -               id++;
> -       } while(id != 0);
> +               list_for_each_entry_safe(scmd, next, &tmp_list, eh_entry) {
> +                       if (scmd_id(scmd) != id)
> +                               continue;
> +
> +                       if ((rtn == SUCCESS || rtn == FAST_IO_FAIL)
> +                           && (!scsi_device_online(scmd->device) ||
> +                                rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)))
> +                               scsi_eh_finish_cmd(scmd, done_q);
> +                       else
> +                               /* push back on work queue for further processing */
> +                               list_move(&scmd->eh_entry, work_q);
> +               }
> +       }
>
>        return list_empty(work_q);
>  }
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-26 20:13       ` Mike Christie
@ 2010-10-26 20:10         ` James Bottomley
  2010-10-27 13:12         ` Hillf Danton
  1 sibling, 0 replies; 14+ messages in thread
From: James Bottomley @ 2010-10-26 20:10 UTC (permalink / raw)
  To: Mike Christie; +Cc: Hillf Danton, linux-scsi

On Tue, 2010-10-26 at 15:13 -0500, Mike Christie wrote:
> I was not sure if you are in a rush to get this in.

No real rush ... although technically it's an enhancement rather than a
bug fix ...

>  I have been testing 
> it, but am hitting a bug below. This does not seem to be related to your 
> patch. I am hitting it with or without your patch.
> 
> Your patch looks ok, and it seems to work.
> 
> 
> Oct 27 07:51:11 noisymax kernel: ------------[ cut here ]------------
> Oct 27 07:51:11 noisymax kernel: WARNING: at lib/list_debug.c:30 
> __list_add+0x8f/0xa0()
> Oct 27 07:51:11 noisymax kernel: Hardware name: X7DB8
> Oct 27 07:51:11 noisymax kernel: list_add corruption. prev->next should 
> be next (ffff88013e86be50), but was ffff88012b7d2518. 
> (prev=ffff88012b7d2518).
> Oct 27 07:51:11 noisymax kernel: Modules linked in: bnx2i cnic uio 
> autofs4 fcoe libfcoe libfc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 
> iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 
> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables 
> ipv6 ib_mthca ib_mad ib_core e1000e be2iscsi iscsi_boot_sysfs libiscsi 
> scsi_transport_iscsi bnx2x crc32c libcrc32c mdio be2net 8250_pnp 8250 
> serial_core shpchp button lpfc qla2xxx scsi_transport_fc [last unloaded: 
> mperf]
> Oct 27 07:51:11 noisymax kernel: Pid: 3076, comm: scsi_eh_12 Not tainted 
> 2.6.36+ #2
> Oct 27 07:51:11 noisymax kernel: Call Trace:
> Oct 27 07:51:11 noisymax kernel: [<ffffffff8104103a>] 
> warn_slowpath_common+0x7a/0xb0
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81041111>] 
> warn_slowpath_fmt+0x41/0x50
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812bf180>] ? dev_printk+0x40/0x50
> Oct 27 07:51:11 noisymax kernel: [<ffffffff8122e6df>] __list_add+0x8f/0xa0
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812d2bd6>] 
> scsi_eh_finish_cmd+0x36/0x40
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812d3600>] 
> scsi_eh_ready_devs+0x320/0x860
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812d41c4>] 
> scsi_error_handler+0x4a4/0x640
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812d3d20>] ? 
> scsi_error_handler+0x0/0x640
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81062af6>] kthread+0x96/0xa0
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81003bd4>] 
> kernel_thread_helper+0x4/0x10
> Oct 27 07:51:11 noisymax kernel: [<ffffffff813ed180>] ? 
> restore_args+0x0/0x30
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81062a60>] ? kthread+0x0/0xa0
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81003bd0>] ? 
> kernel_thread_helper+0x0/0x10
> Oct 27 07:51:11 noisymax kernel: ---[ end trace 74b14fae9ff067be ]---

This looks like it could possibly be a removal race error (like the one
there's a fix in scsi-misc for) ... what was going on on the box at the
time?

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-26 14:04     ` Hillf Danton
@ 2010-10-26 20:13       ` Mike Christie
  2010-10-26 20:10         ` James Bottomley
  2010-10-27 13:12         ` Hillf Danton
  0 siblings, 2 replies; 14+ messages in thread
From: Mike Christie @ 2010-10-26 20:13 UTC (permalink / raw)
  To: Hillf Danton; +Cc: James Bottomley, linux-scsi

On 10/26/2010 09:04 AM, Hillf Danton wrote:
> On Tue, Oct 26, 2010 at 4:53 AM, James Bottomley
> <James.Bottomley@suse.de>  wrote:
>> On Mon, 2010-10-18 at 00:33 -0500, Mike Christie wrote:
>>> On 10/14/2010 09:04 AM, Hillf Danton wrote:
>>>> There seems that the id of the tgtr_scmd processed by
>>>> scsi_try_target_reset() does not match the id in case that
>>>> SUCCESS is returned by scsi_try_target_reset().
>>>>
>>>> The mismatch is fixed, and the loop to find the next highest
>>>> is also simplified.
>>>>
>>>> Signed-off-by: Hillf Danton<dhillf@gmail.com>
>>>> ---
>>>>
>>>> --- a/drivers/scsi/scsi_error.c     2010-09-13 07:07:38.000000000 +0800
>>>> +++ b/drivers/scsi/scsi_error.c     2010-10-14 21:45:56.000000000 +0800
>>>> @@ -1173,14 +1173,19 @@ static int scsi_eh_target_reset(struct S
>>>>                      list_for_each_entry(scmd, work_q, eh_entry) {
>>>>                              if (scmd_id(scmd)>    id&&
>>>>                              (!tgtr_scmd ||
>>>> -                                scmd_id(tgtr_scmd)>    scmd_id(scmd)))
>>>> +                                scmd_id(tgtr_scmd)>    scmd_id(scmd))) {
>>>>                                              tgtr_scmd = scmd;
>>>> +                                           if (1 + id == scmd_id(scmd))
>>>> +                                                   break;
>>>> +                           }
>>>>                      }
>>>>              }
>>>>              if (!tgtr_scmd)
>>>>                      /* no more commands, that's it */
>>>>                      break;
>>>>
>>>> +           id = scmd_id(tgtr_scmd);
>>>> +
>>>>              SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset "
>>>>                                                "to target %d\n",
>>>>                                                current->comm, id));
>>>
>>>
>>> Thanks. Fixes the extra target resets I have been seeing and commands
>>> not getting cleaned up properly.
>>>
>>> Reviewed-by: Mike Christie<michaelc@cs.wisc.edu>
>>
>> The routine is still a bit fiendishly complicated, though.  What about
>> unwinding all of the id processing? it's only required if we want to do
>> target resets in numerical order (a number which has no real meaning on
>> a hot plug bus anyway).
>>
>> What about just a simple list processor that chunks down the work_q,
>> resets the target and pulls all equal targets out, like this?
>>

I was not sure if you are in a rush to get this in. I have been testing 
it, but am hitting a bug below. This does not seem to be related to your 
patch. I am hitting it with or without your patch.

Your patch looks ok, and it seems to work.


Oct 27 07:51:11 noisymax kernel: ------------[ cut here ]------------
Oct 27 07:51:11 noisymax kernel: WARNING: at lib/list_debug.c:30 
__list_add+0x8f/0xa0()
Oct 27 07:51:11 noisymax kernel: Hardware name: X7DB8
Oct 27 07:51:11 noisymax kernel: list_add corruption. prev->next should 
be next (ffff88013e86be50), but was ffff88012b7d2518. 
(prev=ffff88012b7d2518).
Oct 27 07:51:11 noisymax kernel: Modules linked in: bnx2i cnic uio 
autofs4 fcoe libfcoe libfc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 
iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables 
ipv6 ib_mthca ib_mad ib_core e1000e be2iscsi iscsi_boot_sysfs libiscsi 
scsi_transport_iscsi bnx2x crc32c libcrc32c mdio be2net 8250_pnp 8250 
serial_core shpchp button lpfc qla2xxx scsi_transport_fc [last unloaded: 
mperf]
Oct 27 07:51:11 noisymax kernel: Pid: 3076, comm: scsi_eh_12 Not tainted 
2.6.36+ #2
Oct 27 07:51:11 noisymax kernel: Call Trace:
Oct 27 07:51:11 noisymax kernel: [<ffffffff8104103a>] 
warn_slowpath_common+0x7a/0xb0
Oct 27 07:51:11 noisymax kernel: [<ffffffff81041111>] 
warn_slowpath_fmt+0x41/0x50
Oct 27 07:51:11 noisymax kernel: [<ffffffff812bf180>] ? dev_printk+0x40/0x50
Oct 27 07:51:11 noisymax kernel: [<ffffffff8122e6df>] __list_add+0x8f/0xa0
Oct 27 07:51:11 noisymax kernel: [<ffffffff812d2bd6>] 
scsi_eh_finish_cmd+0x36/0x40
Oct 27 07:51:11 noisymax kernel: [<ffffffff812d3600>] 
scsi_eh_ready_devs+0x320/0x860
Oct 27 07:51:11 noisymax kernel: [<ffffffff812d41c4>] 
scsi_error_handler+0x4a4/0x640
Oct 27 07:51:11 noisymax kernel: [<ffffffff812d3d20>] ? 
scsi_error_handler+0x0/0x640
Oct 27 07:51:11 noisymax kernel: [<ffffffff81062af6>] kthread+0x96/0xa0
Oct 27 07:51:11 noisymax kernel: [<ffffffff81003bd4>] 
kernel_thread_helper+0x4/0x10
Oct 27 07:51:11 noisymax kernel: [<ffffffff813ed180>] ? 
restore_args+0x0/0x30
Oct 27 07:51:11 noisymax kernel: [<ffffffff81062a60>] ? kthread+0x0/0xa0
Oct 27 07:51:11 noisymax kernel: [<ffffffff81003bd0>] ? 
kernel_thread_helper+0x0/0x10
Oct 27 07:51:11 noisymax kernel: ---[ end trace 74b14fae9ff067be ]---


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-25 20:53   ` James Bottomley
  2010-10-26 14:04     ` Hillf Danton
@ 2010-10-26 23:07     ` Mike Christie
  2010-10-26 23:09       ` James Bottomley
  1 sibling, 1 reply; 14+ messages in thread
From: Mike Christie @ 2010-10-26 23:07 UTC (permalink / raw)
  To: James Bottomley; +Cc: Hillf Danton, linux-scsi

On 10/25/2010 03:53 PM, James Bottomley wrote:
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 1de30eb..484b128 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -1156,51 +1156,40 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
>   				struct list_head *work_q,
>   				struct list_head *done_q)
>   {
> -	struct scsi_cmnd *scmd, *tgtr_scmd, *next;
> -	unsigned int id = 0;
> -	int rtn;
> +	LIST_HEAD(tmp_list);
>
> -	do {
> -		tgtr_scmd = NULL;
> -		list_for_each_entry(scmd, work_q, eh_entry) {
> -			if (id == scmd_id(scmd)) {
> -				tgtr_scmd = scmd;
> -				break;
> -			}
> -		}
> -		if (!tgtr_scmd) {
> -			/* not one exactly equal; find the next highest */
> -			list_for_each_entry(scmd, work_q, eh_entry) {
> -				if (scmd_id(scmd)>  id&&
> -				    (!tgtr_scmd ||
> -				     scmd_id(tgtr_scmd)>  scmd_id(scmd)))
> -						tgtr_scmd = scmd;
> -			}
> -		}
> -		if (!tgtr_scmd)
> -			/* no more commands, that's it */
> -			break;
> +	list_splice(work_q,&tmp_list);


I think I am hitting multiple bugs. One might be with this patch, but 
strangely they both hit __list_add bugs in the scsi eh. I think this 
needs to be list_splice_init().

> +		list_for_each_entry_safe(scmd, next,&tmp_list, eh_entry) {
> +			if (scmd_id(scmd) != id)
> +				continue;
> +
> +			if ((rtn == SUCCESS || rtn == FAST_IO_FAIL)
> +			&&  (!scsi_device_online(scmd->device) ||
> +				 rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)))
> +				scsi_eh_finish_cmd(scmd, done_q);

Because if the target resets work and move all the commands to the 
done_q here.


> +			else
> +				/* push back on work queue for further processing */
> +				list_move(&scmd->eh_entry, work_q);
> +		}
>
>   	return list_empty(work_q);
>   }


Then this work_q is showing garbage in it from the splice I think. 
Without the list_splice_init I get that trace I cut and pasted in the 
other mail when I just force the scsi eh to run the target reset code.

In the test, the target reset succeeds. The scsi_eh_finish_cmd call 
above moves all the cmds to the done_q. But the work_q still has junk in 
it, and so we return that we need to do more work and eventually we end 
up running scsi_eh_offline_sdevs which calls scsi_eh_finish_cmd on a 
command we already moved to the done_q.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-26 23:07     ` Mike Christie
@ 2010-10-26 23:09       ` James Bottomley
  2010-10-27 18:18         ` Mike Christie
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2010-10-26 23:09 UTC (permalink / raw)
  To: Mike Christie; +Cc: Hillf Danton, linux-scsi

On Tue, 2010-10-26 at 18:07 -0500, Mike Christie wrote:
> On 10/25/2010 03:53 PM, James Bottomley wrote:
> >
> > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > index 1de30eb..484b128 100644
> > --- a/drivers/scsi/scsi_error.c
> > +++ b/drivers/scsi/scsi_error.c
> > @@ -1156,51 +1156,40 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
> >   				struct list_head *work_q,
> >   				struct list_head *done_q)
> >   {
> > -	struct scsi_cmnd *scmd, *tgtr_scmd, *next;
> > -	unsigned int id = 0;
> > -	int rtn;
> > +	LIST_HEAD(tmp_list);
> >
> > -	do {
> > -		tgtr_scmd = NULL;
> > -		list_for_each_entry(scmd, work_q, eh_entry) {
> > -			if (id == scmd_id(scmd)) {
> > -				tgtr_scmd = scmd;
> > -				break;
> > -			}
> > -		}
> > -		if (!tgtr_scmd) {
> > -			/* not one exactly equal; find the next highest */
> > -			list_for_each_entry(scmd, work_q, eh_entry) {
> > -				if (scmd_id(scmd)>  id&&
> > -				    (!tgtr_scmd ||
> > -				     scmd_id(tgtr_scmd)>  scmd_id(scmd)))
> > -						tgtr_scmd = scmd;
> > -			}
> > -		}
> > -		if (!tgtr_scmd)
> > -			/* no more commands, that's it */
> > -			break;
> > +	list_splice(work_q,&tmp_list);
> 
> 
> I think I am hitting multiple bugs. One might be with this patch, but 
> strangely they both hit __list_add bugs in the scsi eh. I think this 
> needs to be list_splice_init().

I think that's exactly right ... I keep getting tripped up by the _init
requirements for reusing lists (and list_heads).

> > +		list_for_each_entry_safe(scmd, next,&tmp_list, eh_entry) {
> > +			if (scmd_id(scmd) != id)
> > +				continue;
> > +
> > +			if ((rtn == SUCCESS || rtn == FAST_IO_FAIL)
> > +			&&  (!scsi_device_online(scmd->device) ||
> > +				 rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)))
> > +				scsi_eh_finish_cmd(scmd, done_q);
> 
> Because if the target resets work and move all the commands to the 
> done_q here.
> 
> 
> > +			else
> > +				/* push back on work queue for further processing */
> > +				list_move(&scmd->eh_entry, work_q);
> > +		}
> >
> >   	return list_empty(work_q);
> >   }
> 
> 
> Then this work_q is showing garbage in it from the splice I think. 
> Without the list_splice_init I get that trace I cut and pasted in the 
> other mail when I just force the scsi eh to run the target reset code.
> 
> In the test, the target reset succeeds. The scsi_eh_finish_cmd call 
> above moves all the cmds to the done_q. But the work_q still has junk in 
> it, and so we return that we need to do more work and eventually we end 
> up running scsi_eh_offline_sdevs which calls scsi_eh_finish_cmd on a 
> command we already moved to the done_q.

So does changing to the _init version fix at least the junk done_q
problem?

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-26 20:13       ` Mike Christie
  2010-10-26 20:10         ` James Bottomley
@ 2010-10-27 13:12         ` Hillf Danton
  1 sibling, 0 replies; 14+ messages in thread
From: Hillf Danton @ 2010-10-27 13:12 UTC (permalink / raw)
  To: Mike Christie; +Cc: James Bottomley, linux-scsi

On Wed, Oct 27, 2010 at 4:13 AM, Mike Christie <michaelc@cs.wisc.edu> wrote:
> On 10/26/2010 09:04 AM, Hillf Danton wrote:
>>
>> On Tue, Oct 26, 2010 at 4:53 AM, James Bottomley
>> <James.Bottomley@suse.de>  wrote:
>>>
>>> On Mon, 2010-10-18 at 00:33 -0500, Mike Christie wrote:
>>>>
>>>> On 10/14/2010 09:04 AM, Hillf Danton wrote:
>>>>>
>>>>> There seems that the id of the tgtr_scmd processed by
>>>>> scsi_try_target_reset() does not match the id in case that
>>>>> SUCCESS is returned by scsi_try_target_reset().
>>>>>
>>>>> The mismatch is fixed, and the loop to find the next highest
>>>>> is also simplified.
>>>>>
>>>>> Signed-off-by: Hillf Danton<dhillf@gmail.com>
>>>>> ---
>>>>>
>>>>> --- a/drivers/scsi/scsi_error.c     2010-09-13 07:07:38.000000000 +0800
>>>>> +++ b/drivers/scsi/scsi_error.c     2010-10-14 21:45:56.000000000 +0800
>>>>> @@ -1173,14 +1173,19 @@ static int scsi_eh_target_reset(struct S
>>>>>                     list_for_each_entry(scmd, work_q, eh_entry) {
>>>>>                             if (scmd_id(scmd)>    id&&
>>>>>                             (!tgtr_scmd ||
>>>>> -                                scmd_id(tgtr_scmd)>    scmd_id(scmd)))
>>>>> +                                scmd_id(tgtr_scmd)>    scmd_id(scmd)))
>>>>> {
>>>>>                                             tgtr_scmd = scmd;
>>>>> +                                           if (1 + id ==
>>>>> scmd_id(scmd))
>>>>> +                                                   break;
>>>>> +                           }
>>>>>                     }
>>>>>             }
>>>>>             if (!tgtr_scmd)
>>>>>                     /* no more commands, that's it */
>>>>>                     break;
>>>>>
>>>>> +           id = scmd_id(tgtr_scmd);
>>>>> +
>>>>>             SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset
>>>>> "
>>>>>                                               "to target %d\n",
>>>>>                                               current->comm, id));
>>>>
>>>>
>>>> Thanks. Fixes the extra target resets I have been seeing and commands
>>>> not getting cleaned up properly.
>>>>
>>>> Reviewed-by: Mike Christie<michaelc@cs.wisc.edu>
>>>
>>> The routine is still a bit fiendishly complicated, though.  What about
>>> unwinding all of the id processing? it's only required if we want to do
>>> target resets in numerical order (a number which has no real meaning on
>>> a hot plug bus anyway).
>>>
>>> What about just a simple list processor that chunks down the work_q,
>>> resets the target and pulls all equal targets out, like this?
>>>
>
> I was not sure if you are in a rush to get this in. I have been testing it,
> but am hitting a bug below. This does not seem to be related to your patch.
> I am hitting it with or without your patch.

Sorry for the rush, Mike.
Bug fix will be redelivered soon.

>
> Your patch looks ok, and it seems to work.
>
>
> Oct 27 07:51:11 noisymax kernel: ------------[ cut here ]------------
> Oct 27 07:51:11 noisymax kernel: WARNING: at lib/list_debug.c:30
> __list_add+0x8f/0xa0()
> Oct 27 07:51:11 noisymax kernel: Hardware name: X7DB8
> Oct 27 07:51:11 noisymax kernel: list_add corruption. prev->next should be
> next (ffff88013e86be50), but was ffff88012b7d2518. (prev=ffff88012b7d2518).
> Oct 27 07:51:11 noisymax kernel: Modules linked in: bnx2i cnic uio autofs4
> fcoe libfcoe libfc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6
> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables
> ipv6 ib_mthca ib_mad ib_core e1000e be2iscsi iscsi_boot_sysfs libiscsi
> scsi_transport_iscsi bnx2x crc32c libcrc32c mdio be2net 8250_pnp 8250
> serial_core shpchp button lpfc qla2xxx scsi_transport_fc [last unloaded:
> mperf]
> Oct 27 07:51:11 noisymax kernel: Pid: 3076, comm: scsi_eh_12 Not tainted
> 2.6.36+ #2
> Oct 27 07:51:11 noisymax kernel: Call Trace:
> Oct 27 07:51:11 noisymax kernel: [<ffffffff8104103a>]
> warn_slowpath_common+0x7a/0xb0
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81041111>]
> warn_slowpath_fmt+0x41/0x50
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812bf180>] ? dev_printk+0x40/0x50
> Oct 27 07:51:11 noisymax kernel: [<ffffffff8122e6df>] __list_add+0x8f/0xa0
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812d2bd6>]
> scsi_eh_finish_cmd+0x36/0x40
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812d3600>]
> scsi_eh_ready_devs+0x320/0x860
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812d41c4>]
> scsi_error_handler+0x4a4/0x640
> Oct 27 07:51:11 noisymax kernel: [<ffffffff812d3d20>] ?
> scsi_error_handler+0x0/0x640
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81062af6>] kthread+0x96/0xa0
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81003bd4>]
> kernel_thread_helper+0x4/0x10
> Oct 27 07:51:11 noisymax kernel: [<ffffffff813ed180>] ?
> restore_args+0x0/0x30
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81062a60>] ? kthread+0x0/0xa0
> Oct 27 07:51:11 noisymax kernel: [<ffffffff81003bd0>] ?
> kernel_thread_helper+0x0/0x10
> Oct 27 07:51:11 noisymax kernel: ---[ end trace 74b14fae9ff067be ]---
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-26 23:09       ` James Bottomley
@ 2010-10-27 18:18         ` Mike Christie
  2010-10-28 13:29           ` Hillf Danton
  2010-12-14 20:57           ` Mike Christie
  0 siblings, 2 replies; 14+ messages in thread
From: Mike Christie @ 2010-10-27 18:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: Hillf Danton, linux-scsi

On 10/26/2010 06:09 PM, James Bottomley wrote:
> On Tue, 2010-10-26 at 18:07 -0500, Mike Christie wrote:
>> On 10/25/2010 03:53 PM, James Bottomley wrote:
>>>
>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>> index 1de30eb..484b128 100644
>>> --- a/drivers/scsi/scsi_error.c
>>> +++ b/drivers/scsi/scsi_error.c
>>> @@ -1156,51 +1156,40 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
>>>    				struct list_head *work_q,
>>>    				struct list_head *done_q)
>>>    {
>>> -	struct scsi_cmnd *scmd, *tgtr_scmd, *next;
>>> -	unsigned int id = 0;
>>> -	int rtn;
>>> +	LIST_HEAD(tmp_list);
>>>
>>> -	do {
>>> -		tgtr_scmd = NULL;
>>> -		list_for_each_entry(scmd, work_q, eh_entry) {
>>> -			if (id == scmd_id(scmd)) {
>>> -				tgtr_scmd = scmd;
>>> -				break;
>>> -			}
>>> -		}
>>> -		if (!tgtr_scmd) {
>>> -			/* not one exactly equal; find the next highest */
>>> -			list_for_each_entry(scmd, work_q, eh_entry) {
>>> -				if (scmd_id(scmd)>   id&&
>>> -				    (!tgtr_scmd ||
>>> -				     scmd_id(tgtr_scmd)>   scmd_id(scmd)))
>>> -						tgtr_scmd = scmd;
>>> -			}
>>> -		}
>>> -		if (!tgtr_scmd)
>>> -			/* no more commands, that's it */
>>> -			break;
>>> +	list_splice(work_q,&tmp_list);
>>
>>
>> I think I am hitting multiple bugs. One might be with this patch, but
>> strangely they both hit __list_add bugs in the scsi eh. I think this
>> needs to be list_splice_init().
>
> I think that's exactly right ... I keep getting tripped up by the _init
> requirements for reusing lists (and list_heads).
>
>>> +		list_for_each_entry_safe(scmd, next,&tmp_list, eh_entry) {
>>> +			if (scmd_id(scmd) != id)
>>> +				continue;
>>> +
>>> +			if ((rtn == SUCCESS || rtn == FAST_IO_FAIL)
>>> +			&&   (!scsi_device_online(scmd->device) ||
>>> +				 rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)))
>>> +				scsi_eh_finish_cmd(scmd, done_q);
>>
>> Because if the target resets work and move all the commands to the
>> done_q here.
>>
>>
>>> +			else
>>> +				/* push back on work queue for further processing */
>>> +				list_move(&scmd->eh_entry, work_q);
>>> +		}
>>>
>>>    	return list_empty(work_q);
>>>    }
>>
>>
>> Then this work_q is showing garbage in it from the splice I think.
>> Without the list_splice_init I get that trace I cut and pasted in the
>> other mail when I just force the scsi eh to run the target reset code.
>>
>> In the test, the target reset succeeds. The scsi_eh_finish_cmd call
>> above moves all the cmds to the done_q. But the work_q still has junk in
>> it, and so we return that we need to do more work and eventually we end
>> up running scsi_eh_offline_sdevs which calls scsi_eh_finish_cmd on a
>> command we already moved to the done_q.
>
> So does changing to the _init version fix at least the junk done_q
> problem?
>

Yeah.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-27 18:18         ` Mike Christie
@ 2010-10-28 13:29           ` Hillf Danton
  2010-12-14 20:57           ` Mike Christie
  1 sibling, 0 replies; 14+ messages in thread
From: Hillf Danton @ 2010-10-28 13:29 UTC (permalink / raw)
  To: Mike Christie; +Cc: James Bottomley, linux-scsi

On Thu, Oct 28, 2010 at 2:18 AM, Mike Christie <michaelc@cs.wisc.edu> wrote:
> On 10/26/2010 06:09 PM, James Bottomley wrote:
>>
>> On Tue, 2010-10-26 at 18:07 -0500, Mike Christie wrote:
>>>
>>> On 10/25/2010 03:53 PM, James Bottomley wrote:
>>>>
>>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>>> index 1de30eb..484b128 100644
>>>> --- a/drivers/scsi/scsi_error.c
>>>> +++ b/drivers/scsi/scsi_error.c
>>>> @@ -1156,51 +1156,40 @@ static int scsi_eh_target_reset(struct Scsi_Host
>>>> *shost,
>>>>                                struct list_head *work_q,
>>>>                                struct list_head *done_q)
>>>>   {
>>>> -       struct scsi_cmnd *scmd, *tgtr_scmd, *next;
>>>> -       unsigned int id = 0;
>>>> -       int rtn;
>>>> +       LIST_HEAD(tmp_list);
>>>>
>>>> -       do {
>>>> -               tgtr_scmd = NULL;
>>>> -               list_for_each_entry(scmd, work_q, eh_entry) {
>>>> -                       if (id == scmd_id(scmd)) {
>>>> -                               tgtr_scmd = scmd;
>>>> -                               break;
>>>> -                       }
>>>> -               }
>>>> -               if (!tgtr_scmd) {
>>>> -                       /* not one exactly equal; find the next highest
>>>> */
>>>> -                       list_for_each_entry(scmd, work_q, eh_entry) {
>>>> -                               if (scmd_id(scmd)>   id&&
>>>> -                                   (!tgtr_scmd ||
>>>> -                                    scmd_id(tgtr_scmd)>
>>>> scmd_id(scmd)))
>>>> -                                               tgtr_scmd = scmd;
>>>> -                       }
>>>> -               }
>>>> -               if (!tgtr_scmd)
>>>> -                       /* no more commands, that's it */
>>>> -                       break;
>>>> +       list_splice(work_q,&tmp_list);
>>>
>>>
>>> I think I am hitting multiple bugs. One might be with this patch, but
>>> strangely they both hit __list_add bugs in the scsi eh. I think this
>>> needs to be list_splice_init().
>>
>> I think that's exactly right ... I keep getting tripped up by the _init
>> requirements for reusing lists (and list_heads).
>>
>>>> +               list_for_each_entry_safe(scmd, next,&tmp_list, eh_entry)
>>>> {
>>>> +                       if (scmd_id(scmd) != id)
>>>> +                               continue;
>>>> +
>>>> +                       if ((rtn == SUCCESS || rtn == FAST_IO_FAIL)
>>>> +                       &&   (!scsi_device_online(scmd->device) ||
>>>> +                                rtn == FAST_IO_FAIL ||
>>>> !scsi_eh_tur(scmd)))
>>>> +                               scsi_eh_finish_cmd(scmd, done_q);
>>>
>>> Because if the target resets work and move all the commands to the
>>> done_q here.
>>>
>>>
>>>> +                       else
>>>> +                               /* push back on work queue for further
>>>> processing */
>>>> +                               list_move(&scmd->eh_entry, work_q);
>>>> +               }
>>>>
>>>>        return list_empty(work_q);
>>>>   }
>>>
>>>
>>> Then this work_q is showing garbage in it from the splice I think.
>>> Without the list_splice_init I get that trace I cut and pasted in the
>>> other mail when I just force the scsi eh to run the target reset code.
>>>
>>> In the test, the target reset succeeds. The scsi_eh_finish_cmd call
>>> above moves all the cmds to the done_q. But the work_q still has junk in
>>> it, and so we return that we need to do more work and eventually we end
>>> up running scsi_eh_offline_sdevs which calls scsi_eh_finish_cmd on a
>>> command we already moved to the done_q.
>>
>> So does changing to the _init version fix at least the junk done_q
>> problem?
>>
>
> Yeah.
>

Blame me if it is buggy, please.

---

--- a/drivers/scsi/scsi_error.c	2010-09-13 07:07:38.000000000 +0800
+++ b/drivers/scsi/scsi_error.c	2010-10-28 21:27:24.000000000 +0800
@@ -1156,51 +1156,42 @@ static int scsi_eh_target_reset(struct S
 				struct list_head *work_q,
 				struct list_head *done_q)
 {
-	struct scsi_cmnd *scmd, *tgtr_scmd, *next;
+	struct scsi_cmnd *scmd, *next;
 	unsigned int id = 0;
 	int rtn;
+	int success;
+	LIST_HEAD(head);

-	do {
-		tgtr_scmd = NULL;
-		list_for_each_entry(scmd, work_q, eh_entry) {
-			if (id == scmd_id(scmd)) {
-				tgtr_scmd = scmd;
-				break;
-			}
-		}
-		if (!tgtr_scmd) {
-			/* not one exactly equal; find the next highest */
-			list_for_each_entry(scmd, work_q, eh_entry) {
-				if (scmd_id(scmd) > id &&
-				    (!tgtr_scmd ||
-				     scmd_id(tgtr_scmd) > scmd_id(scmd)))
-						tgtr_scmd = scmd;
-			}
-		}
-		if (!tgtr_scmd)
-			/* no more commands, that's it */
-			break;
+	while (! list_empty(work_q)) {
+		scmd = list_entry(work_q->next, struct scsi_cmnd, eh_entry);
+		id = scmd_id(scmd);

 		SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset "
 						  "to target %d\n",
 						  current->comm, id));
-		rtn = scsi_try_target_reset(tgtr_scmd);
-		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
-			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-				if (id == scmd_id(scmd))
-					if (!scsi_device_online(scmd->device) ||
-					    rtn == FAST_IO_FAIL ||
-					    !scsi_eh_tur(tgtr_scmd))
-						scsi_eh_finish_cmd(scmd,
-								   done_q);
-			}
-		} else
+		rtn = scsi_try_target_reset(scmd);
+		success = (rtn == SUCCESS || rtn == FAST_IO_FAIL);
+		if (! success)
 			SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Target reset"
 							  " failed target: "
 							  "%d\n",
 							  current->comm, id));
-		id++;
-	} while(id != 0);
+		list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
+			if (scmd_id(scmd) != id)
+				continue;
+			if (! success) {
+				list_move(&scmd->eh_entry, &head);
+				continue;
+			}
+			if (!scsi_device_online(scmd->device) ||
+			    rtn == FAST_IO_FAIL ||
+			    !scsi_eh_tur(scmd))
+				scsi_eh_finish_cmd(scmd, done_q);
+			else
+				list_move(&scmd->eh_entry, &head);
+		}
+	}
+	list_splice(&head, work_q);

 	return list_empty(work_q);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-10-27 18:18         ` Mike Christie
  2010-10-28 13:29           ` Hillf Danton
@ 2010-12-14 20:57           ` Mike Christie
  2010-12-15  2:31             ` James Bottomley
  2010-12-16 15:10             ` Hillf Danton
  1 sibling, 2 replies; 14+ messages in thread
From: Mike Christie @ 2010-12-14 20:57 UTC (permalink / raw)
  To: James Bottomley; +Cc: Hillf Danton, linux-scsi

On 10/27/2010 01:18 PM, Mike Christie wrote:
> On 10/26/2010 06:09 PM, James Bottomley wrote:
>> On Tue, 2010-10-26 at 18:07 -0500, Mike Christie wrote:
>>> On 10/25/2010 03:53 PM, James Bottomley wrote:
>>>>
>>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>>> index 1de30eb..484b128 100644
>>>> --- a/drivers/scsi/scsi_error.c
>>>> +++ b/drivers/scsi/scsi_error.c
>>>> @@ -1156,51 +1156,40 @@ static int scsi_eh_target_reset(struct
>>>> Scsi_Host *shost,
>>>> struct list_head *work_q,
>>>> struct list_head *done_q)
>>>> {
>>>> - struct scsi_cmnd *scmd, *tgtr_scmd, *next;
>>>> - unsigned int id = 0;
>>>> - int rtn;
>>>> + LIST_HEAD(tmp_list);
>>>>
>>>> - do {
>>>> - tgtr_scmd = NULL;
>>>> - list_for_each_entry(scmd, work_q, eh_entry) {
>>>> - if (id == scmd_id(scmd)) {
>>>> - tgtr_scmd = scmd;
>>>> - break;
>>>> - }
>>>> - }
>>>> - if (!tgtr_scmd) {
>>>> - /* not one exactly equal; find the next highest */
>>>> - list_for_each_entry(scmd, work_q, eh_entry) {
>>>> - if (scmd_id(scmd)> id&&
>>>> - (!tgtr_scmd ||
>>>> - scmd_id(tgtr_scmd)> scmd_id(scmd)))
>>>> - tgtr_scmd = scmd;
>>>> - }
>>>> - }
>>>> - if (!tgtr_scmd)
>>>> - /* no more commands, that's it */
>>>> - break;
>>>> + list_splice(work_q,&tmp_list);
>>>
>>>
>>> I think I am hitting multiple bugs. One might be with this patch, but
>>> strangely they both hit __list_add bugs in the scsi eh. I think this
>>> needs to be list_splice_init().
>>
>> I think that's exactly right ... I keep getting tripped up by the _init
>> requirements for reusing lists (and list_heads).
>>
>>>> + list_for_each_entry_safe(scmd, next,&tmp_list, eh_entry) {
>>>> + if (scmd_id(scmd) != id)
>>>> + continue;
>>>> +
>>>> + if ((rtn == SUCCESS || rtn == FAST_IO_FAIL)
>>>> + && (!scsi_device_online(scmd->device) ||
>>>> + rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)))
>>>> + scsi_eh_finish_cmd(scmd, done_q);
>>>
>>> Because if the target resets work and move all the commands to the
>>> done_q here.
>>>
>>>
>>>> + else
>>>> + /* push back on work queue for further processing */
>>>> + list_move(&scmd->eh_entry, work_q);
>>>> + }
>>>>
>>>> return list_empty(work_q);
>>>> }
>>>
>>>
>>> Then this work_q is showing garbage in it from the splice I think.
>>> Without the list_splice_init I get that trace I cut and pasted in the
>>> other mail when I just force the scsi eh to run the target reset code.
>>>
>>> In the test, the target reset succeeds. The scsi_eh_finish_cmd call
>>> above moves all the cmds to the done_q. But the work_q still has junk in
>>> it, and so we return that we need to do more work and eventually we end
>>> up running scsi_eh_offline_sdevs which calls scsi_eh_finish_cmd on a
>>> command we already moved to the done_q.
>>
>> So does changing to the _init version fix at least the junk done_q
>> problem?
>>
>
> Yeah.


Hey James,

Your patch with the _init fixed the problem for me. Were you going to 
merge the patch or did you want me to test Hilf's version?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-12-14 20:57           ` Mike Christie
@ 2010-12-15  2:31             ` James Bottomley
  2010-12-16 15:10             ` Hillf Danton
  1 sibling, 0 replies; 14+ messages in thread
From: James Bottomley @ 2010-12-15  2:31 UTC (permalink / raw)
  To: Mike Christie; +Cc: Hillf Danton, linux-scsi

On Tue, 2010-12-14 at 14:57 -0600, Mike Christie wrote:
> Hey James,
> 
> Your patch with the _init fixed the problem for me. Were you going to 
> merge the patch or did you want me to test Hilf's version?

Well, it's actually been in scsi-misc for a while ... of course, I
forgot to do the _init fix (now corrected):

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=038d1001945812e49b7fcc29ce2960bcbfaecb16

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fix id computation in scsi_eh_target_reset()
  2010-12-14 20:57           ` Mike Christie
  2010-12-15  2:31             ` James Bottomley
@ 2010-12-16 15:10             ` Hillf Danton
  1 sibling, 0 replies; 14+ messages in thread
From: Hillf Danton @ 2010-12-16 15:10 UTC (permalink / raw)
  To: Mike Christie; +Cc: James Bottomley, linux-scsi

On Wed, Dec 15, 2010 at 4:57 AM, Mike Christie <michaelc@cs.wisc.edu> wrote:
>
> Hey James,
>
> Your patch with the _init fixed the problem for me. Were you going to merge
---> the patch or did you want me to test Hilf's version?
+++> the patch or did you want me to test Hillf's version?

haha, another typo cleaned up.

Signed-off-by: Hillf Danton

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-12-16 15:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-14 14:04 [PATCH] fix id computation in scsi_eh_target_reset() Hillf Danton
2010-10-18  5:33 ` Mike Christie
2010-10-25 20:53   ` James Bottomley
2010-10-26 14:04     ` Hillf Danton
2010-10-26 20:13       ` Mike Christie
2010-10-26 20:10         ` James Bottomley
2010-10-27 13:12         ` Hillf Danton
2010-10-26 23:07     ` Mike Christie
2010-10-26 23:09       ` James Bottomley
2010-10-27 18:18         ` Mike Christie
2010-10-28 13:29           ` Hillf Danton
2010-12-14 20:57           ` Mike Christie
2010-12-15  2:31             ` James Bottomley
2010-12-16 15:10             ` Hillf Danton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.