All of lore.kernel.org
 help / color / mirror / Atom feed
* [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
@ 2010-10-20 20:49 ` Nicholas A. Bellinger
  0 siblings, 0 replies; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-20 20:49 UTC (permalink / raw)
  To: linux-kernel, linux-scsi
  Cc: Vasu Dev, Tim Chen, Andi Kleen, Matthew Wilcox, James Bottomley,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, MPTFusionLinux, eata.c maintainer, Luben Tuikov,
	mvsas maintainer, pm8001 maintainer Jack Wang, Brian King,
	Mike Anderson, Christof Schmitt, Tejun Heo, Andrew Morton,
	H. Peter Anvin

Greetings all,

So as we get closer to the .37 merge window, I wanted to take this
oppourtunity to recap the current status of the drop-host_lock /
unlocked_qcmds=1 patches, and what is required for the next RFCv5 and
hopefully a merge into .37.  The last RFCv4 was posted here:

http://marc.info/?l=linux-kernel&m=128563953114561&w=2

Since then, Christof Schmitt has sent a patch to drop struct
scsi_cmnd->serial_number usage in zfcp, and Tim Chen has sent an
important fix to drop an extra host_lock access that I originally missed
in qla2xxx SHT->queuecommand() that certainly would have deadlocked a
running machine.   Many thanks to Christof and Tim for your
contributions and review!

So at this point in the game the current score sits at:

*) core drivers/scsi remaining issue(s):

The issue raised by andmike during RFCv4 described as:

"If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE is set it
would be correct for the scsi_decide_disposition cases but it would
appear this would stop __scsi_try_to_abort_cmd from being called in the
time out case as REQ_ATOM_COMPLETE is set prior to calling
blk_rq_timed_out."

The complete discussion is here:

http://marc.info/?l=linux-scsi&m=128535319915212&w=2

We still need folks with experience to dig into this code, so you know
the scsi_error.c code please jump in!

*) LLD libraries running by default w/ unlocked_qcmds=1

libiscsi: need ack from mnc
libsas: need ack from jejb
libfc: remaining rport state + host_lock less issue.  Need more input
       from mnc for James Smart and Joe on this...
libata: jgarzik thinks this should be OK, review and ack from tejun
        would also be very helpful.

The main issue remaining here is the audit of libfc rport (and other..?)
code that assumes host_lock is held to protect state.  mnc, do you have
any more thoughts for James Smart and Joe here..?

*) Individual LLDs running by default w/ unlocked_qcmds=1

aic94xx: need ack maintainer at adaptec..?)
mvsas: need ack maintainer at marvell..?)
pm8001: need ack Jang Wang
qla4xxx, qla2xxx: need ack Andrew Vasquez
fnic:  need ack Joe Eykholt

Aside from the required ACKs, I am not aware of any other mainline LLDs
doing the legacy SHT->queuecommand() -> unlock() -> do_lld_work() ->
lock() that have not already been converted.

The main question here is if any out of tree SCSI LLDs use this legacy
optimization..  Should we come up with a compile time way to alert
vendors to this..?

*) Individual LLDs converted to use explict scsi_cmd_get_serial()

mpt2sas: Add scsi_cmd_get_serial() call
mpt/fusion: Add scsi_cmd_get_serial() call
dpt_i2o: Add scsi_cmd_get_serial() call
eata: Add scsi_cmd_get_serial() call
u14-34f: Add scsi_cmd_get_serial() call
zfcp: Remove scsi_cmnd->serial_number from debug traces

Aside from the required ACKs, I am not aware of any other mainline LLDs
that use struct scsi_cmnd->serial_number internally that have not
already been converted.  The same applies to out of tree modules for
this, but is certainly not critical.

So as the clock winds down to get this merged into .37, if anyone else
has any concerns or comments please let myself and the other relivent
maintainers know and we will try to address them accordingly.

Thanks!

--nab


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
@ 2010-10-20 20:49 ` Nicholas A. Bellinger
  0 siblings, 0 replies; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-20 20:49 UTC (permalink / raw)
  To: linux-kernel, linux-scsi
  Cc: Vasu Dev, Tim Chen, Andi Kleen, Matthew Wilcox, James Bottomley,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, MPTFusionLinux, eata.c maintainer, Luben Tuikov,
	mvsas maintainer, pm8001 maintainer Jack Wang, Brian King,
	Mike Anderson, Christof Schmitt

Greetings all,

So as we get closer to the .37 merge window, I wanted to take this
oppourtunity to recap the current status of the drop-host_lock /
unlocked_qcmds=1 patches, and what is required for the next RFCv5 and
hopefully a merge into .37.  The last RFCv4 was posted here:

http://marc.info/?l=linux-kernel&m=128563953114561&w=2

Since then, Christof Schmitt has sent a patch to drop struct
scsi_cmnd->serial_number usage in zfcp, and Tim Chen has sent an
important fix to drop an extra host_lock access that I originally missed
in qla2xxx SHT->queuecommand() that certainly would have deadlocked a
running machine.   Many thanks to Christof and Tim for your
contributions and review!

So at this point in the game the current score sits at:

*) core drivers/scsi remaining issue(s):

The issue raised by andmike during RFCv4 described as:

"If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE is set it
would be correct for the scsi_decide_disposition cases but it would
appear this would stop __scsi_try_to_abort_cmd from being called in the
time out case as REQ_ATOM_COMPLETE is set prior to calling
blk_rq_timed_out."

The complete discussion is here:

http://marc.info/?l=linux-scsi&m=128535319915212&w=2

We still need folks with experience to dig into this code, so you know
the scsi_error.c code please jump in!

*) LLD libraries running by default w/ unlocked_qcmds=1

libiscsi: need ack from mnc
libsas: need ack from jejb
libfc: remaining rport state + host_lock less issue.  Need more input
       from mnc for James Smart and Joe on this...
libata: jgarzik thinks this should be OK, review and ack from tejun
        would also be very helpful.

The main issue remaining here is the audit of libfc rport (and other..?)
code that assumes host_lock is held to protect state.  mnc, do you have
any more thoughts for James Smart and Joe here..?

*) Individual LLDs running by default w/ unlocked_qcmds=1

aic94xx: need ack maintainer at adaptec..?)
mvsas: need ack maintainer at marvell..?)
pm8001: need ack Jang Wang
qla4xxx, qla2xxx: need ack Andrew Vasquez
fnic:  need ack Joe Eykholt

Aside from the required ACKs, I am not aware of any other mainline LLDs
doing the legacy SHT->queuecommand() -> unlock() -> do_lld_work() ->
lock() that have not already been converted.

The main question here is if any out of tree SCSI LLDs use this legacy
optimization..  Should we come up with a compile time way to alert
vendors to this..?

*) Individual LLDs converted to use explict scsi_cmd_get_serial()

mpt2sas: Add scsi_cmd_get_serial() call
mpt/fusion: Add scsi_cmd_get_serial() call
dpt_i2o: Add scsi_cmd_get_serial() call
eata: Add scsi_cmd_get_serial() call
u14-34f: Add scsi_cmd_get_serial() call
zfcp: Remove scsi_cmnd->serial_number from debug traces

Aside from the required ACKs, I am not aware of any other mainline LLDs
that use struct scsi_cmnd->serial_number internally that have not
already been converted.  The same applies to out of tree modules for
this, but is certainly not critical.

So as the clock winds down to get this merged into .37, if anyone else
has any concerns or comments please let myself and the other relivent
maintainers know and we will try to address them accordingly.

Thanks!

--nab

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-20 20:49 ` Nicholas A. Bellinger
@ 2010-10-21 19:26   ` Luben Tuikov
  -1 siblings, 0 replies; 36+ messages in thread
From: Luben Tuikov @ 2010-10-21 19:26 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, Nicholas A. Bellinger
  Cc: Vasu Dev, Tim Chen, Andi Kleen, Matthew Wilcox, James Bottomley,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, MPTFusionLinux, eata.c maintainer, mvsas maintainer,
	pm8001 maintainer Jack Wang, Brian King, Mike Anderson,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

--- On Wed, 10/20/10, Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:

> From: Nicholas A. Bellinger <nab@linux-iscsi.org>
> Subject: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
> To: "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-scsi" <linux-scsi@vger.kernel.org>
> Cc: "Vasu Dev" <vasu.dev@linux.intel.com>, "Tim Chen" <tim.c.chen@linux.intel.com>, "Andi Kleen" <ak@linux.intel.com>, "Matthew Wilcox" <willy@linux.intel.com>, "James Bottomley" <James.Bottomley@suse.de>, "Mike Christie" <michaelc@cs.wisc.edu>, "Jens Axboe" <jaxboe@fusionio.com>, "James Smart" <james.smart@emulex.com>, "Andrew Vasquez" <andrew.vasquez@qlogic.com>, "FUJITA Tomonori" <fujita.tomonori@lab.ntt.co.jp>, "Hannes Reinecke" <hare@suse.de>, "Joe Eykholt" <jeykholt@cisco.com>, "Christoph Hellwig" <hch@lst.de>, "Jon Hawley" <warthog9@kernel.org>, "MPTFusionLinux" <DL-MPTFusionLinux@lsi.com>, "eata.c maintainer" <dario.ballabio@inwind.it>, "Luben Tuikov" <ltuikov@yahoo.com>, "mvsas maintainer" <kewei@marvell.com>, "pm8001 maintainer Jack Wang" <jack_wang@usish.com>, "Brian King" <brking@linux.vnet.ibm.com>, "Mike Anderson" <andmike@linux.vnet.ibm.com>, "Christof Schmitt" <christof.schmitt@de.ibm.com>, "Tejun Heo" <tj@kernel.org>, "Andrew Morton"
 <akpm@linux-foundation.org>, "H. Peter Anvin" <hpa@zytor.com>
> Date: Wednesday, October 20, 2010, 1:49 PM
> Greetings all,
> 
> So as we get closer to the .37 merge window, I wanted to
> take this
> oppourtunity to recap the current status of the
> drop-host_lock /
> unlocked_qcmds=1 patches, and what is required for the next
> RFCv5 and
> hopefully a merge into .37.  The last RFCv4 was posted
> here:
> 
> http://marc.info/?l=linux-kernel&m=128563953114561&w=2
> 
> Since then, Christof Schmitt has sent a patch to drop
> struct
> scsi_cmnd->serial_number usage in zfcp, and Tim Chen has
> sent an
> important fix to drop an extra host_lock access that I
> originally missed
> in qla2xxx SHT->queuecommand() that certainly would have
> deadlocked a
> running machine.   Many thanks to Christof
> and Tim for your
> contributions and review!
> 
> So at this point in the game the current score sits at:
> 
> *) core drivers/scsi remaining issue(s):
> 
> The issue raised by andmike during RFCv4 described as:
> 
> "If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE
> is set it
> would be correct for the scsi_decide_disposition cases but
> it would
> appear this would stop __scsi_try_to_abort_cmd from being
> called in the
> time out case as REQ_ATOM_COMPLETE is set prior to calling
> blk_rq_timed_out."
> 
> The complete discussion is here:
> 
> http://marc.info/?l=linux-scsi&m=128535319915212&w=2
> 
> We still need folks with experience to dig into this code,
> so you know
> the scsi_error.c code please jump in!
> 
> *) LLD libraries running by default w/ unlocked_qcmds=1
> 
> libiscsi: need ack from mnc
> libsas: need ack from jejb
> libfc: remaining rport state + host_lock less issue. 
> Need more input
>        from mnc for James Smart
> and Joe on this...
> libata: jgarzik thinks this should be OK, review and ack
> from tejun
>         would also be very helpful.
> 
> The main issue remaining here is the audit of libfc rport
> (and other..?)
> code that assumes host_lock is held to protect state. 
> mnc, do you have
> any more thoughts for James Smart and Joe here..?
> 
> *) Individual LLDs running by default w/ unlocked_qcmds=1
> 
> aic94xx: need ack maintainer at adaptec..?)

Adaptec doesn't exist anymore--it's just a memory in the minds of
many good engineers.  Anyway, as a former Adaptec employee and
the author of aic94xx and the SAS code in the Linux kernel (which
Bottomley copied from a git tree from an Adaptec server or maybe
from the linux-scsi ML, munged it up and submitted as his own
into linux-scsi git tree) I can give an ACK on both. Both the
aic94xx and the SAS code were written with the host lock as a
kludge and we unlock/lock it as you can see in the code.

> mvsas: need ack maintainer at marvell..?)
> pm8001: need ack Jang Wang
> qla4xxx, qla2xxx: need ack Andrew Vasquez
> fnic:  need ack Joe Eykholt
> 
> Aside from the required ACKs, I am not aware of any other
> mainline LLDs
> doing the legacy SHT->queuecommand() -> unlock()
> -> do_lld_work() ->
> lock() that have not already been converted.
> 
> The main question here is if any out of tree SCSI LLDs use
> this legacy
> optimization..  Should we come up with a compile time
> way to alert
> vendors to this..?
> 
> *) Individual LLDs converted to use explict
> scsi_cmd_get_serial()
> 
> mpt2sas: Add scsi_cmd_get_serial() call
> mpt/fusion: Add scsi_cmd_get_serial() call
> dpt_i2o: Add scsi_cmd_get_serial() call
> eata: Add scsi_cmd_get_serial() call
> u14-34f: Add scsi_cmd_get_serial() call
> zfcp: Remove scsi_cmnd->serial_number from debug traces
> 
> Aside from the required ACKs, I am not aware of any other
> mainline LLDs
> that use struct scsi_cmnd->serial_number internally that
> have not
> already been converted.  The same applies to out of
> tree modules for
> this, but is certainly not critical.
> 
> So as the clock winds down to get this merged into .37, if
> anyone else
> has any concerns or comments please let myself and the
> other relivent
> maintainers know and we will try to address them
> accordingly.
> 
> Thanks!
> 
> --nab
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
@ 2010-10-21 19:26   ` Luben Tuikov
  0 siblings, 0 replies; 36+ messages in thread
From: Luben Tuikov @ 2010-10-21 19:26 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, Nicholas A. Bellinger
  Cc: Vasu Dev, Tim Chen, Andi Kleen, Matthew Wilcox, James Bottomley,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, MPTFusionLinux, eata.c maintainer, mvsas maintainer,
	pm8001 maintainer Jack Wang, Brian King, Mike Anderson,
	Christof Schmitt, Tejun Heo

--- On Wed, 10/20/10, Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:

> From: Nicholas A. Bellinger <nab@linux-iscsi.org>
> Subject: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
> To: "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-scsi" <linux-scsi@vger.kernel.org>
> Cc: "Vasu Dev" <vasu.dev@linux.intel.com>, "Tim Chen" <tim.c.chen@linux.intel.com>, "Andi Kleen" <ak@linux.intel.com>, "Matthew Wilcox" <willy@linux.intel.com>, "James Bottomley" <James.Bottomley@suse.de>, "Mike Christie" <michaelc@cs.wisc.edu>, "Jens Axboe" <jaxboe@fusionio.com>, "James Smart" <james.smart@emulex.com>, "Andrew Vasquez" <andrew.vasquez@qlogic.com>, "FUJITA Tomonori" <fujita.tomonori@lab.ntt.co.jp>, "Hannes Reinecke" <hare@suse.de>, "Joe Eykholt" <jeykholt@cisco.com>, "Christoph Hellwig" <hch@lst.de>, "Jon Hawley" <warthog9@kernel.org>, "MPTFusionLinux" <DL-MPTFusionLinux@lsi.com>, "eata.c maintainer" <dario.ballabio@inwind.it>, "Luben Tuikov" <ltuikov@yahoo.com>, "mvsas maintainer" <kewei@marvell.com>, "pm8001 maintainer Jack Wang" <jack_wang@usish.com>, "Brian King" <brking@linux.vnet.ibm.com>, "Mike Anderson" <andmike@linux.vnet.ibm.com>, "Christof Schmitt" <christof.schmitt@de.ibm.com>, "Tejun Heo" <tj@kernel.org>, "Andrew Morton"
 <akpm@linux-foundation.org>, "H. Peter Anvin" <hpa@zytor.com>
> Date: Wednesday, October 20, 2010, 1:49 PM
> Greetings all,
> 
> So as we get closer to the .37 merge window, I wanted to
> take this
> oppourtunity to recap the current status of the
> drop-host_lock /
> unlocked_qcmds=1 patches, and what is required for the next
> RFCv5 and
> hopefully a merge into .37.  The last RFCv4 was posted
> here:
> 
> http://marc.info/?l=linux-kernel&m=128563953114561&w=2
> 
> Since then, Christof Schmitt has sent a patch to drop
> struct
> scsi_cmnd->serial_number usage in zfcp, and Tim Chen has
> sent an
> important fix to drop an extra host_lock access that I
> originally missed
> in qla2xxx SHT->queuecommand() that certainly would have
> deadlocked a
> running machine.   Many thanks to Christof
> and Tim for your
> contributions and review!
> 
> So at this point in the game the current score sits at:
> 
> *) core drivers/scsi remaining issue(s):
> 
> The issue raised by andmike during RFCv4 described as:
> 
> "If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE
> is set it
> would be correct for the scsi_decide_disposition cases but
> it would
> appear this would stop __scsi_try_to_abort_cmd from being
> called in the
> time out case as REQ_ATOM_COMPLETE is set prior to calling
> blk_rq_timed_out."
> 
> The complete discussion is here:
> 
> http://marc.info/?l=linux-scsi&m=128535319915212&w=2
> 
> We still need folks with experience to dig into this code,
> so you know
> the scsi_error.c code please jump in!
> 
> *) LLD libraries running by default w/ unlocked_qcmds=1
> 
> libiscsi: need ack from mnc
> libsas: need ack from jejb
> libfc: remaining rport state + host_lock less issue. 
> Need more input
>        from mnc for James Smart
> and Joe on this...
> libata: jgarzik thinks this should be OK, review and ack
> from tejun
>         would also be very helpful.
> 
> The main issue remaining here is the audit of libfc rport
> (and other..?)
> code that assumes host_lock is held to protect state. 
> mnc, do you have
> any more thoughts for James Smart and Joe here..?
> 
> *) Individual LLDs running by default w/ unlocked_qcmds=1
> 
> aic94xx: need ack maintainer at adaptec..?)

Adaptec doesn't exist anymore--it's just a memory in the minds of
many good engineers.  Anyway, as a former Adaptec employee and
the author of aic94xx and the SAS code in the Linux kernel (which
Bottomley copied from a git tree from an Adaptec server or maybe
from the linux-scsi ML, munged it up and submitted as his own
into linux-scsi git tree) I can give an ACK on both. Both the
aic94xx and the SAS code were written with the host lock as a
kludge and we unlock/lock it as you can see in the code.

> mvsas: need ack maintainer at marvell..?)
> pm8001: need ack Jang Wang
> qla4xxx, qla2xxx: need ack Andrew Vasquez
> fnic:  need ack Joe Eykholt
> 
> Aside from the required ACKs, I am not aware of any other
> mainline LLDs
> doing the legacy SHT->queuecommand() -> unlock()
> -> do_lld_work() ->
> lock() that have not already been converted.
> 
> The main question here is if any out of tree SCSI LLDs use
> this legacy
> optimization..  Should we come up with a compile time
> way to alert
> vendors to this..?
> 
> *) Individual LLDs converted to use explict
> scsi_cmd_get_serial()
> 
> mpt2sas: Add scsi_cmd_get_serial() call
> mpt/fusion: Add scsi_cmd_get_serial() call
> dpt_i2o: Add scsi_cmd_get_serial() call
> eata: Add scsi_cmd_get_serial() call
> u14-34f: Add scsi_cmd_get_serial() call
> zfcp: Remove scsi_cmnd->serial_number from debug traces
> 
> Aside from the required ACKs, I am not aware of any other
> mainline LLDs
> that use struct scsi_cmnd->serial_number internally that
> have not
> already been converted.  The same applies to out of
> tree modules for
> this, but is certainly not critical.
> 
> So as the clock winds down to get this merged into .37, if
> anyone else
> has any concerns or comments please let myself and the
> other relivent
> maintainers know and we will try to address them
> accordingly.
> 
> Thanks!
> 
> --nab
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
       [not found] ` <20101021150840.GA24309@linux.vnet.ibm.com>
@ 2010-10-26 22:08   ` Nicholas A. Bellinger
  2010-10-26 22:27     ` James Bottomley
  0 siblings, 1 reply; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-26 22:08 UTC (permalink / raw)
  To: Mike Anderson
  Cc: linux-kernel, linux-scsi, Vasu Dev, Tim Chen, Andi Kleen,
	Matthew Wilcox, James Bottomley, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

On Thu, 2010-10-21 at 08:08 -0700, Mike Anderson wrote:
> Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> > *) core drivers/scsi remaining issue(s):
> > 
> > The issue raised by andmike during RFCv4 described as:
> > 
> > "If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE is set it
> > would be correct for the scsi_decide_disposition cases but it would
> > appear this would stop __scsi_try_to_abort_cmd from being called in the
> > time out case as REQ_ATOM_COMPLETE is set prior to calling
> > blk_rq_timed_out."
> > 
> > The complete discussion is here:
> > 
> > http://marc.info/?l=linux-scsi&m=128535319915212&w=2
> > 
> > We still need folks with experience to dig into this code, so you know
> > the scsi_error.c code please jump in!
> > 
> 
> I provided two logging traces below for the two cases mentioned in the
> above email thread. The second trace used a modified version of
> scsi_debug so that I could generate the needed unit attention codes.
> 1.) On the check for complete comment the logging below marked with
> "***" indicated that during a timeout that the complete bit is set. We
> would not want to have a check for complete skip the call to
> __scsi_try_to_abort_cmd. 
> 
> scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> sd 1:0:0:0: [sdb] Done: TIMEOUT
> sd 1:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 20 00
> Waking error handler thread
> Error handler scsi_eh_1 waking up
> sd 1:0:0:0: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1
> Total of 1 commands on 1 devices require eh work
> scsi_eh_1: aborting cmd:0xffff88019e171b80
> 
> ***
> sd 1:0:0:0: scsi_try_to_abort_cmd: Comp bit set
> ***
> 
> scsi_debug: abort
> scsi_debug: cmd 00 00 00 00 00 00 
> scsi_eh_done scmd: ffff88019e171b80 result: 0
> scsi_send_eh_cmnd: scmd: ffff88019e171b80, timeleft: 10000
> scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> scsi_eh_tur: scmd ffff88019e171b80 rtn 2002
> scsi_eh_1: flush retry cmd: ffff88019e171b80
> scsi_restart_operations: waking up host to restart
> scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> Error handler scsi_eh_1 sleeping
> 

Hi Mike and Co,

After considering a couple of different approches here, I ended up with
the following simple patch that has been tested on linus HEAD from this
afternoon w/ forcing scsi_debug cmd TIMEOUT during modprobe and via
sg_dd ops.  (See comments below)

[PATCH] scsi: Add SCSI_EH_SOFTIRQ_DONE usage

This patch introduces a SCSI_EH_SOFTIRQ_DONE flag that is set in scsi_softirq_done()
from block soft_irq context that is used to signal when scsi_try_to_abort_cmd() should
be calling __scsi_try_to_abort_cmd() for a timed out struct scsi_cmnd instead of
returning SUCCESS via checking only blk_test_rq_complete().  This is done because
blk_rq_timed_out_timer() calls blk_mark_rq_complete() before blk_rq_timed_out() ->
struct request_queue->rq_timed_out_fn().

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/scsi_error.c |    3 ++-
 drivers/scsi/scsi_lib.c   |    6 ++++++
 drivers/scsi/scsi_priv.h  |    1 +
 3 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 6e496c3..e75b388 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -651,7 +651,8 @@ static int scsi_try_to_abort_cmd(struct scsi_cmnd
*scmd)
         * block/blk.h:blk_mark_rq_complete() has already been called
         * from the block softirq
         */
-       if (blk_test_rq_complete(scmd->request))
+       if ((blk_test_rq_complete(scmd->request)) &&
+           (scmd->eh_eflags & SCSI_EH_SOFTIRQ_DONE))
                return SUCCESS;

        return __scsi_try_to_abort_cmd(scmd);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8041fe1..7e3c477 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1402,6 +1402,12 @@ static void scsi_softirq_done(struct request *rq)
        int disposition;

        INIT_LIST_HEAD(&cmd->eh_entry);
+       /*
+        * Set the SCSI_EH_SOFTIRQ_DONE flag to signal scsi_error.c:
+        * scsi_try_to_abort_cmd() that this cmd has reached
+        * scsi_softirq_done()
+        */
+       cmd->eh_eflags |= SCSI_EH_SOFTIRQ_DONE;

        /*
         * Set the serial numbers back to zero
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index b4056d1..2f64350 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -17,6 +17,7 @@ struct scsi_nl_hdr;
  * Scsi Error Handler Flags
  */
 #define SCSI_EH_CANCEL_CMD     0x0001  /* Cancel this cmd */
+#define SCSI_EH_SOFTIRQ_DONE   0x0002  /* cmd reached
scsi_softirq_done() */

 #define SCSI_SENSE_VALID(scmd) \
        (((scmd)->sense_buffer[0] & 0x70) == 0x70)


-----------------------------------------------------------------

So far with this patch I see the follow two concerns:

*) SCSI_EH_SOFTIRQ_DONE uses cmd->eh_eflags which gets explictly cleared
in scsi_eh_finish_cmd().  AFAICT, the checks for scsi_try_to_abort_cmd()
are always done before scsi_eh_finish_cmd() gets called.   Comments
here..?

*) A potential race between SCSI_EH_SOFTIRQ_DONE being set in
scsi_softirq_done(), and the check in scsi_try_to_abort_cmd().  As this
code originaly used a non atomic cmd->serial_number check here between
the same two pieces of code, this *should* be OK right..?

> 
> 2.) In the logging output below my previous comment on the START_UNIT
> path hitting the serial_number check in scsi_try_to_abort_cmd is shown
> to be incorrect as the START_UNIT is handled as a failed and not a
> cancel action.
> 
> scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> scsi_debug:    [sense_key,asc,ascq]: [0x6,0x4,0x2]
> scsi_debug:    <1 0 0 0> non-zero result=0x8000002
> sd 1:0:0:0: scsi_check_sense: return FAILED for restart
> sd 1:0:0:0: [sdb] Done: FAILED
> sd 1:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 20 00
> sd 1:0:0:0: [sdb]  Sense Key : Unit Attention [current] 
> sd 1:0:0:0: [sdb]  Add. Sense: Logical unit not ready, initializing command required
> Waking error handler thread
> Error handler scsi_eh_1 waking up
> 
> ***
> sd 1:0:0:0: scsi_eh_prt_fail_stats: cmds failed: 1, cancel: 0
> ***
> 
> Total of 1 commands on 1 devices require eh work
> sd 1:0:0:0: scsi_check_sense: return FAILED for restart
> scsi_eh_1: Sending START_UNIT to sdev: 0xffff8801a2134000
> scsi_debug: cmd 1b 00 00 00 01 00 
> scsi_eh_done scmd: ffff8801a1730180 result: 0
> scsi_send_eh_cmnd: scmd: ffff8801a1730180, timeleft: 29999
> scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> scsi_debug: cmd 00 00 00 00 00 00 
> scsi_eh_done scmd: ffff8801a1730180 result: 0
> scsi_send_eh_cmnd: scmd: ffff8801a1730180, timeleft: 10000
> scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> scsi_eh_tur: scmd ffff8801a1730180 rtn 2002
> scsi_eh_1: flush retry cmd: ffff8801a1730180
> scsi_restart_operations: waking up host to restart
> scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> Error handler scsi_eh_1 sleeping
> 
> 

AFAICT this second case should be addressed by above patch
as well..  I will go ahead and test with your scsi_debug patch to
trigger this case, and everything looks good for the above two cases I
will respin against the latest linus HEAD.  Mike still wants to do a
bunch of testing to this code over the next weeks, and we will continue
to shake out any remaining items here.

That said, the libfc rport state + host_lock less issue would be the
only known remaining item to get this series merged for .37.  We can
always leave unlocked_qcmd=0 for libfc for the initial merge of these
patches, and have mnc and james smart dig into the nitty gritty details
post merge window.  Comments please libfc folks..?

Many, many thanks again to Mike Anderson for his detailed explanations
of these two cases.

--nab



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-26 22:08   ` Nicholas A. Bellinger
@ 2010-10-26 22:27     ` James Bottomley
  2010-10-26 22:34       ` Nicholas A. Bellinger
  0 siblings, 1 reply; 36+ messages in thread
From: James Bottomley @ 2010-10-26 22:27 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Mike Anderson, linux-kernel, linux-scsi, Vasu Dev, Tim Chen,
	Andi Kleen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

On Tue, 2010-10-26 at 15:08 -0700, Nicholas A. Bellinger wrote:
> On Thu, 2010-10-21 at 08:08 -0700, Mike Anderson wrote:
> > Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> > > *) core drivers/scsi remaining issue(s):
> > > 
> > > The issue raised by andmike during RFCv4 described as:
> > > 
> > > "If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE is set it
> > > would be correct for the scsi_decide_disposition cases but it would
> > > appear this would stop __scsi_try_to_abort_cmd from being called in the
> > > time out case as REQ_ATOM_COMPLETE is set prior to calling
> > > blk_rq_timed_out."
> > > 
> > > The complete discussion is here:
> > > 
> > > http://marc.info/?l=linux-scsi&m=128535319915212&w=2
> > > 
> > > We still need folks with experience to dig into this code, so you know
> > > the scsi_error.c code please jump in!
> > > 
> > 
> > I provided two logging traces below for the two cases mentioned in the
> > above email thread. The second trace used a modified version of
> > scsi_debug so that I could generate the needed unit attention codes.
> > 1.) On the check for complete comment the logging below marked with
> > "***" indicated that during a timeout that the complete bit is set. We
> > would not want to have a check for complete skip the call to
> > __scsi_try_to_abort_cmd. 
> > 
> > scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> > sd 1:0:0:0: [sdb] Done: TIMEOUT
> > sd 1:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> > sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 20 00
> > Waking error handler thread
> > Error handler scsi_eh_1 waking up
> > sd 1:0:0:0: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1
> > Total of 1 commands on 1 devices require eh work
> > scsi_eh_1: aborting cmd:0xffff88019e171b80
> > 
> > ***
> > sd 1:0:0:0: scsi_try_to_abort_cmd: Comp bit set
> > ***
> > 
> > scsi_debug: abort
> > scsi_debug: cmd 00 00 00 00 00 00 
> > scsi_eh_done scmd: ffff88019e171b80 result: 0
> > scsi_send_eh_cmnd: scmd: ffff88019e171b80, timeleft: 10000
> > scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> > scsi_eh_tur: scmd ffff88019e171b80 rtn 2002
> > scsi_eh_1: flush retry cmd: ffff88019e171b80
> > scsi_restart_operations: waking up host to restart
> > scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> > Error handler scsi_eh_1 sleeping
> > 
> 
> Hi Mike and Co,
> 
> After considering a couple of different approches here, I ended up with
> the following simple patch that has been tested on linus HEAD from this
> afternoon w/ forcing scsi_debug cmd TIMEOUT during modprobe and via
> sg_dd ops.  (See comments below)
> 
> [PATCH] scsi: Add SCSI_EH_SOFTIRQ_DONE usage
> 
> This patch introduces a SCSI_EH_SOFTIRQ_DONE flag that is set in scsi_softirq_done()
> from block soft_irq context that is used to signal when scsi_try_to_abort_cmd() should
> be calling __scsi_try_to_abort_cmd() for a timed out struct scsi_cmnd instead of
> returning SUCCESS via checking only blk_test_rq_complete().  This is done because
> blk_rq_timed_out_timer() calls blk_mark_rq_complete() before blk_rq_timed_out() ->
> struct request_queue->rq_timed_out_fn().

This is getting pretty far off into the weeds.  I think the first step
should be queue lock push down into ->queuecommand.  This would still
necessitate locking around the serial number.  The next step might be
serial number elimination/reduction which might need to address issues
like this (or not ... block already knows the information, there's not
really much need for us to track it twice).  There might also be other
lock aquisition reduction as part of step 2.

James



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-26 22:27     ` James Bottomley
@ 2010-10-26 22:34       ` Nicholas A. Bellinger
  2010-10-26 22:50         ` James Bottomley
  0 siblings, 1 reply; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-26 22:34 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, linux-kernel, linux-scsi, Vasu Dev, Tim Chen,
	Andi Kleen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

On Tue, 2010-10-26 at 17:27 -0500, James Bottomley wrote:
> On Tue, 2010-10-26 at 15:08 -0700, Nicholas A. Bellinger wrote:
> > On Thu, 2010-10-21 at 08:08 -0700, Mike Anderson wrote:
> > > Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> > > > *) core drivers/scsi remaining issue(s):
> > > > 
> > > > The issue raised by andmike during RFCv4 described as:
> > > > 
> > > > "If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE is set it
> > > > would be correct for the scsi_decide_disposition cases but it would
> > > > appear this would stop __scsi_try_to_abort_cmd from being called in the
> > > > time out case as REQ_ATOM_COMPLETE is set prior to calling
> > > > blk_rq_timed_out."
> > > > 
> > > > The complete discussion is here:
> > > > 
> > > > http://marc.info/?l=linux-scsi&m=128535319915212&w=2
> > > > 
> > > > We still need folks with experience to dig into this code, so you know
> > > > the scsi_error.c code please jump in!
> > > > 
> > > 
> > > I provided two logging traces below for the two cases mentioned in the
> > > above email thread. The second trace used a modified version of
> > > scsi_debug so that I could generate the needed unit attention codes.
> > > 1.) On the check for complete comment the logging below marked with
> > > "***" indicated that during a timeout that the complete bit is set. We
> > > would not want to have a check for complete skip the call to
> > > __scsi_try_to_abort_cmd. 
> > > 
> > > scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> > > sd 1:0:0:0: [sdb] Done: TIMEOUT
> > > sd 1:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> > > sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 20 00
> > > Waking error handler thread
> > > Error handler scsi_eh_1 waking up
> > > sd 1:0:0:0: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1
> > > Total of 1 commands on 1 devices require eh work
> > > scsi_eh_1: aborting cmd:0xffff88019e171b80
> > > 
> > > ***
> > > sd 1:0:0:0: scsi_try_to_abort_cmd: Comp bit set
> > > ***
> > > 
> > > scsi_debug: abort
> > > scsi_debug: cmd 00 00 00 00 00 00 
> > > scsi_eh_done scmd: ffff88019e171b80 result: 0
> > > scsi_send_eh_cmnd: scmd: ffff88019e171b80, timeleft: 10000
> > > scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> > > scsi_eh_tur: scmd ffff88019e171b80 rtn 2002
> > > scsi_eh_1: flush retry cmd: ffff88019e171b80
> > > scsi_restart_operations: waking up host to restart
> > > scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> > > Error handler scsi_eh_1 sleeping
> > > 
> > 
> > Hi Mike and Co,
> > 
> > After considering a couple of different approches here, I ended up with
> > the following simple patch that has been tested on linus HEAD from this
> > afternoon w/ forcing scsi_debug cmd TIMEOUT during modprobe and via
> > sg_dd ops.  (See comments below)
> > 
> > [PATCH] scsi: Add SCSI_EH_SOFTIRQ_DONE usage
> > 
> > This patch introduces a SCSI_EH_SOFTIRQ_DONE flag that is set in scsi_softirq_done()
> > from block soft_irq context that is used to signal when scsi_try_to_abort_cmd() should
> > be calling __scsi_try_to_abort_cmd() for a timed out struct scsi_cmnd instead of
> > returning SUCCESS via checking only blk_test_rq_complete().  This is done because
> > blk_rq_timed_out_timer() calls blk_mark_rq_complete() before blk_rq_timed_out() ->
> > struct request_queue->rq_timed_out_fn().
> 
> This is getting pretty far off into the weeds.  I think the first step
> should be queue lock push down into ->queuecommand.  This would still
> necessitate locking around the serial number.

Hmmm, I am not sure I understand what you mean here.  My understanding
is that the whole point of the series was to remove any locking around
the serial number in order to make scsi_dispatch_cmd() lockless,
right..?

That is what the current patch series already does, in that it makes the
use of cmd->serial_number optional and requires LLDs who use this for
anything beyond an informational purpose to explictly call
scsi_cmd_get_serial(). 

> The next step might be
> serial number elimination/reduction which might need to address issues
> like this (or not ... block already knows the information, there's not
> really much need for us to track it twice).  There might also be other
> lock aquisition reduction as part of step 2.
> 

Correct, with this series cmd->serial_number still exists, but is
completly optional and no longer used for any SCSI ML functionality.  It
remains used only beyond an information purpose by a handful of LLDs in
their legacy error recovery handling, which have been updated in this
series to explictly call scsi_cmd_get_serial().

So other than those handful special cases, all of the LLDs included in
the series which explict set SHT->unlocked_qcmd=1 to run in
scsi_dispatch_cmd() lockless mode will no longer need ->serial_number.

--nab


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-26 22:34       ` Nicholas A. Bellinger
@ 2010-10-26 22:50         ` James Bottomley
  2010-10-26 23:00           ` Nicholas A. Bellinger
  0 siblings, 1 reply; 36+ messages in thread
From: James Bottomley @ 2010-10-26 22:50 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Mike Anderson, linux-kernel, linux-scsi, Vasu Dev, Tim Chen,
	Andi Kleen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

On Tue, 2010-10-26 at 15:34 -0700, Nicholas A. Bellinger wrote:
> On Tue, 2010-10-26 at 17:27 -0500, James Bottomley wrote:
> > On Tue, 2010-10-26 at 15:08 -0700, Nicholas A. Bellinger wrote:
> > > [PATCH] scsi: Add SCSI_EH_SOFTIRQ_DONE usage
> > > 
> > > This patch introduces a SCSI_EH_SOFTIRQ_DONE flag that is set in scsi_softirq_done()
> > > from block soft_irq context that is used to signal when scsi_try_to_abort_cmd() should
> > > be calling __scsi_try_to_abort_cmd() for a timed out struct scsi_cmnd instead of
> > > returning SUCCESS via checking only blk_test_rq_complete().  This is done because
> > > blk_rq_timed_out_timer() calls blk_mark_rq_complete() before blk_rq_timed_out() ->
> > > struct request_queue->rq_timed_out_fn().
> > 
> > This is getting pretty far off into the weeds.  I think the first step
> > should be queue lock push down into ->queuecommand.  This would still
> > necessitate locking around the serial number.
> 
> Hmmm, I am not sure I understand what you mean here.  My understanding
> is that the whole point of the series was to remove any locking around
> the serial number in order to make scsi_dispatch_cmd() lockless,
> right..?

The point is that several things have to happen for that to be a
reality.  The easiest and most obvious thing is lock push down in
->queuecommand.

The next is most likely serial number elimination.

But the point is that we don't have to do the whole thing all at once
(and spend months trying to get the series right).

> That is what the current patch series already does, in that it makes the
> use of cmd->serial_number optional and requires LLDs who use this for
> anything beyond an informational purpose to explictly call
> scsi_cmd_get_serial(). 

OK, so this patch is a corner case where the error handler is using the
serial number value to deduce something the block layer already knows
(whether the command completed or not). I don't think introducing a
substitute flag is the right way, the information should just be
extracted properly.

But arguments about this don't have to impede the lock push down.

James

> > The next step might be
> > serial number elimination/reduction which might need to address issues
> > like this (or not ... block already knows the information, there's not
> > really much need for us to track it twice).  There might also be other
> > lock aquisition reduction as part of step 2.
> > 
> 
> Correct, with this series cmd->serial_number still exists, but is
> completly optional and no longer used for any SCSI ML functionality.  It
> remains used only beyond an information purpose by a handful of LLDs in
> their legacy error recovery handling, which have been updated in this
> series to explictly call scsi_cmd_get_serial().
> 
> So other than those handful special cases, all of the LLDs included in
> the series which explict set SHT->unlocked_qcmd=1 to run in
> scsi_dispatch_cmd() lockless mode will no longer need ->serial_number.
> 
> --nab
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-26 22:50         ` James Bottomley
@ 2010-10-26 23:00           ` Nicholas A. Bellinger
  2010-10-26 23:11             ` James Bottomley
  0 siblings, 1 reply; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-26 23:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, linux-kernel, linux-scsi, Vasu Dev, Tim Chen,
	Andi Kleen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

On Tue, 2010-10-26 at 22:50 +0000, James Bottomley wrote:
> On Tue, 2010-10-26 at 15:34 -0700, Nicholas A. Bellinger wrote:
> > On Tue, 2010-10-26 at 17:27 -0500, James Bottomley wrote:
> > > On Tue, 2010-10-26 at 15:08 -0700, Nicholas A. Bellinger wrote:
> > > > [PATCH] scsi: Add SCSI_EH_SOFTIRQ_DONE usage
> > > > 
> > > > This patch introduces a SCSI_EH_SOFTIRQ_DONE flag that is set in scsi_softirq_done()
> > > > from block soft_irq context that is used to signal when scsi_try_to_abort_cmd() should
> > > > be calling __scsi_try_to_abort_cmd() for a timed out struct scsi_cmnd instead of
> > > > returning SUCCESS via checking only blk_test_rq_complete().  This is done because
> > > > blk_rq_timed_out_timer() calls blk_mark_rq_complete() before blk_rq_timed_out() ->
> > > > struct request_queue->rq_timed_out_fn().
> > > 
> > > This is getting pretty far off into the weeds.  I think the first step
> > > should be queue lock push down into ->queuecommand.  This would still
> > > necessitate locking around the serial number.
> > 
> > Hmmm, I am not sure I understand what you mean here.  My understanding
> > is that the whole point of the series was to remove any locking around
> > the serial number in order to make scsi_dispatch_cmd() lockless,
> > right..?
> 
> The point is that several things have to happen for that to be a
> reality.  The easiest and most obvious thing is lock push down in
> ->queuecommand.
> 

Ok, can you please explain to me what you mean here..?  As I really
thought we came to consensus that:

*) running in unlocked_qcmd=1 was to be made the default for LLDs using
the legacy optimization of ->queuecommand() -> unlock() ->
do_some_lld_work() -> lock() -> return to scsi_dispatch_cmd()

*) For the mpt-fusion / mpt2sas drivers which did not use the legacy
optimization, but Tim Chen has tested with the former and I am awaiting
an ACK from LSI on the latter.

*) All other drivers will function with host_lock held (eg: legacy mode)
in scsi_dispatch_cmd().

*) All drivers using cmd->serial_number for anything beyond an
informational purpose converted to use scsi_cmd_get_serial().

> The next is most likely serial number elimination.
> 
> But the point is that we don't have to do the whole thing all at once
> (and spend months trying to get the series right).

Not exactly correct, we have the whole thing ready right now with libfc
running unlocked_qcmd=0 legacy mode.

Once I verify START_STOP case in scsi_error.c, I am happy to respin a
mergeable tree from linus HEAD for you to pull ASAP.

> 
> > That is what the current patch series already does, in that it makes the
> > use of cmd->serial_number optional and requires LLDs who use this for
> > anything beyond an informational purpose to explictly call
> > scsi_cmd_get_serial(). 
> 
> OK, so this patch is a corner case where the error handler is using the
> serial number value to deduce something the block layer already knows
> (whether the command completed or not). I don't think introducing a
> substitute flag is the right way, the information should just be
> extracted properly.
> 

Well, according to andmike this is two corner cases that are a result of
the drop-host_lock-v4 series using only blk_test_rq_complete() in
scsi_try_to_abort_cmd(), which will be true because of:

blk_rq_timed_out_timer() ->  blk_mark_rq_complete()

> But arguments about this don't have to impede the lock push down.
> 

Sure, but I assume you mean the lock push down only for the legacy LLDs
that are not already internally unlocking host_lock in
SHT->queuecommand() in mainline code right..?  

--nab



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-26 23:00           ` Nicholas A. Bellinger
@ 2010-10-26 23:11             ` James Bottomley
  2010-10-26 23:31               ` Nicholas A. Bellinger
  0 siblings, 1 reply; 36+ messages in thread
From: James Bottomley @ 2010-10-26 23:11 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Mike Anderson, linux-kernel, linux-scsi, Vasu Dev, Tim Chen,
	Andi Kleen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

On Tue, 2010-10-26 at 16:00 -0700, Nicholas A. Bellinger wrote:
> On Tue, 2010-10-26 at 22:50 +0000, James Bottomley wrote:
> > On Tue, 2010-10-26 at 15:34 -0700, Nicholas A. Bellinger wrote:
> > > On Tue, 2010-10-26 at 17:27 -0500, James Bottomley wrote:
> > > > On Tue, 2010-10-26 at 15:08 -0700, Nicholas A. Bellinger wrote:
> > > > > [PATCH] scsi: Add SCSI_EH_SOFTIRQ_DONE usage
> > > > > 
> > > > > This patch introduces a SCSI_EH_SOFTIRQ_DONE flag that is set in scsi_softirq_done()
> > > > > from block soft_irq context that is used to signal when scsi_try_to_abort_cmd() should
> > > > > be calling __scsi_try_to_abort_cmd() for a timed out struct scsi_cmnd instead of
> > > > > returning SUCCESS via checking only blk_test_rq_complete().  This is done because
> > > > > blk_rq_timed_out_timer() calls blk_mark_rq_complete() before blk_rq_timed_out() ->
> > > > > struct request_queue->rq_timed_out_fn().
> > > > 
> > > > This is getting pretty far off into the weeds.  I think the first step
> > > > should be queue lock push down into ->queuecommand.  This would still
> > > > necessitate locking around the serial number.
> > > 
> > > Hmmm, I am not sure I understand what you mean here.  My understanding
> > > is that the whole point of the series was to remove any locking around
> > > the serial number in order to make scsi_dispatch_cmd() lockless,
> > > right..?
> > 
> > The point is that several things have to happen for that to be a
> > reality.  The easiest and most obvious thing is lock push down in
> > ->queuecommand.
> > 
> 
> Ok, can you please explain to me what you mean here..?  As I really
> thought we came to consensus that:
> 
> *) running in unlocked_qcmd=1 was to be made the default for LLDs using
> the legacy optimization of ->queuecommand() -> unlock() ->
> do_some_lld_work() -> lock() -> return to scsi_dispatch_cmd()

So I was thinking no flag for changing locking behaviour ... simply push
the lock down into the ->queuecommand routines that need it, like we did
for the error handler.

James

> *) For the mpt-fusion / mpt2sas drivers which did not use the legacy
> optimization, but Tim Chen has tested with the former and I am awaiting
> an ACK from LSI on the latter.
> 
> *) All other drivers will function with host_lock held (eg: legacy mode)
> in scsi_dispatch_cmd().
> 
> *) All drivers using cmd->serial_number for anything beyond an
> informational purpose converted to use scsi_cmd_get_serial().
> 
> > The next is most likely serial number elimination.
> > 
> > But the point is that we don't have to do the whole thing all at once
> > (and spend months trying to get the series right).
> 
> Not exactly correct, we have the whole thing ready right now with libfc
> running unlocked_qcmd=0 legacy mode.
> 
> Once I verify START_STOP case in scsi_error.c, I am happy to respin a
> mergeable tree from linus HEAD for you to pull ASAP.
> 
> > 
> > > That is what the current patch series already does, in that it makes the
> > > use of cmd->serial_number optional and requires LLDs who use this for
> > > anything beyond an informational purpose to explictly call
> > > scsi_cmd_get_serial(). 
> > 
> > OK, so this patch is a corner case where the error handler is using the
> > serial number value to deduce something the block layer already knows
> > (whether the command completed or not). I don't think introducing a
> > substitute flag is the right way, the information should just be
> > extracted properly.
> > 
> 
> Well, according to andmike this is two corner cases that are a result of
> the drop-host_lock-v4 series using only blk_test_rq_complete() in
> scsi_try_to_abort_cmd(), which will be true because of:
> 
> blk_rq_timed_out_timer() ->  blk_mark_rq_complete()
> 
> > But arguments about this don't have to impede the lock push down.
> > 
> 
> Sure, but I assume you mean the lock push down only for the legacy LLDs
> that are not already internally unlocking host_lock in
> SHT->queuecommand() in mainline code right..?  
> 
> --nab
> 
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-26 23:11             ` James Bottomley
@ 2010-10-26 23:31               ` Nicholas A. Bellinger
  2010-10-27  7:53                 ` Andi Kleen
  0 siblings, 1 reply; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-26 23:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Anderson, linux-kernel, linux-scsi, Vasu Dev, Tim Chen,
	Andi Kleen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

On Tue, 2010-10-26 at 18:11 -0500, James Bottomley wrote:
> On Tue, 2010-10-26 at 16:00 -0700, Nicholas A. Bellinger wrote:
> > On Tue, 2010-10-26 at 22:50 +0000, James Bottomley wrote:
> > > On Tue, 2010-10-26 at 15:34 -0700, Nicholas A. Bellinger wrote:
> > > > On Tue, 2010-10-26 at 17:27 -0500, James Bottomley wrote:
> > > > > On Tue, 2010-10-26 at 15:08 -0700, Nicholas A. Bellinger wrote:
> > > > > > [PATCH] scsi: Add SCSI_EH_SOFTIRQ_DONE usage
> > > > > > 
> > > > > > This patch introduces a SCSI_EH_SOFTIRQ_DONE flag that is set in scsi_softirq_done()
> > > > > > from block soft_irq context that is used to signal when scsi_try_to_abort_cmd() should
> > > > > > be calling __scsi_try_to_abort_cmd() for a timed out struct scsi_cmnd instead of
> > > > > > returning SUCCESS via checking only blk_test_rq_complete().  This is done because
> > > > > > blk_rq_timed_out_timer() calls blk_mark_rq_complete() before blk_rq_timed_out() ->
> > > > > > struct request_queue->rq_timed_out_fn().
> > > > > 
> > > > > This is getting pretty far off into the weeds.  I think the first step
> > > > > should be queue lock push down into ->queuecommand.  This would still
> > > > > necessitate locking around the serial number.
> > > > 
> > > > Hmmm, I am not sure I understand what you mean here.  My understanding
> > > > is that the whole point of the series was to remove any locking around
> > > > the serial number in order to make scsi_dispatch_cmd() lockless,
> > > > right..?
> > > 
> > > The point is that several things have to happen for that to be a
> > > reality.  The easiest and most obvious thing is lock push down in
> > > ->queuecommand.
> > > 
> > 
> > Ok, can you please explain to me what you mean here..?  As I really
> > thought we came to consensus that:
> > 
> > *) running in unlocked_qcmd=1 was to be made the default for LLDs using
> > the legacy optimization of ->queuecommand() -> unlock() ->
> > do_some_lld_work() -> lock() -> return to scsi_dispatch_cmd()
> 
> So I was thinking no flag for changing locking behaviour ... simply push
> the lock down into the ->queuecommand routines that need it, like we did
> for the error handler.
> 

Yes, I think this makes alot of sense as the next step beyond what has
been currently proposed so far with these series for .37.  The main bit
with this approach is that is requires touching every single legacy
LLD's ->queuecommand() which honestly is a bit scary on some of the
legacy stuff.  This instead of the approach this series has taken so far
to simply to still be able to use the LLDs we are unsure about running
in host_lock less mode in mainline+ out of tree (who do not muck with
host_lock in ->queuecommand of course :) will still 'just work'.

Its hard to say how to say which approach is better at this point, but I
am still leaning toward IMHO the unlocked_qcmds=1 would be the safest
incremental first step for .37, and then do the second series for .38 to
get any LLD needing host_lock using it directly inside of their
->queuecommand().

This sounds like a pretty reasonable compromise that I think is slightly
less risky for the LLDs with the ghosts and cob-webs hanging off of
them.

What do you think..?

--nab

> James
> 
> > *) For the mpt-fusion / mpt2sas drivers which did not use the legacy
> > optimization, but Tim Chen has tested with the former and I am awaiting
> > an ACK from LSI on the latter.
> > 
> > *) All other drivers will function with host_lock held (eg: legacy mode)
> > in scsi_dispatch_cmd().
> > 
> > *) All drivers using cmd->serial_number for anything beyond an
> > informational purpose converted to use scsi_cmd_get_serial().
> > 
> > > The next is most likely serial number elimination.
> > > 
> > > But the point is that we don't have to do the whole thing all at once
> > > (and spend months trying to get the series right).
> > 
> > Not exactly correct, we have the whole thing ready right now with libfc
> > running unlocked_qcmd=0 legacy mode.
> > 
> > Once I verify START_STOP case in scsi_error.c, I am happy to respin a
> > mergeable tree from linus HEAD for you to pull ASAP.
> > 
> > > 
> > > > That is what the current patch series already does, in that it makes the
> > > > use of cmd->serial_number optional and requires LLDs who use this for
> > > > anything beyond an informational purpose to explictly call
> > > > scsi_cmd_get_serial(). 
> > > 
> > > OK, so this patch is a corner case where the error handler is using the
> > > serial number value to deduce something the block layer already knows
> > > (whether the command completed or not). I don't think introducing a
> > > substitute flag is the right way, the information should just be
> > > extracted properly.
> > > 
> > 
> > Well, according to andmike this is two corner cases that are a result of
> > the drop-host_lock-v4 series using only blk_test_rq_complete() in
> > scsi_try_to_abort_cmd(), which will be true because of:
> > 
> > blk_rq_timed_out_timer() ->  blk_mark_rq_complete()
> > 
> > > But arguments about this don't have to impede the lock push down.
> > > 
> > 
> > Sure, but I assume you mean the lock push down only for the legacy LLDs
> > that are not already internally unlocking host_lock in
> > SHT->queuecommand() in mainline code right..?  
> > 
> > --nab
> > 
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-26 23:31               ` Nicholas A. Bellinger
@ 2010-10-27  7:53                 ` Andi Kleen
  2010-10-27 14:27                   ` James Bottomley
  2010-10-27 18:03                   ` Nicholas A. Bellinger
  0 siblings, 2 replies; 36+ messages in thread
From: Andi Kleen @ 2010-10-27  7:53 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: James Bottomley, Mike Anderson, linux-kernel, linux-scsi,
	Vasu Dev, Tim Chen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

> This sounds like a pretty reasonable compromise that I think is slightly
> less risky for the LLDs with the ghosts and cob-webs hanging off of
> them.

They won't get tested either next release cycle. Essentially
near nobody uses them.

> 
> What do you think..?

Standard linux practice is to simply push the locks down. That's a pretty
mechanical operation and shouldn't be too risky

With some luck you could even do it with coccinelle.

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-27  7:53                 ` Andi Kleen
@ 2010-10-27 14:27                   ` James Bottomley
  2010-10-27 18:06                     ` Nicholas A. Bellinger
  2010-10-28  9:10                     ` Andi Kleen
  2010-10-27 18:03                   ` Nicholas A. Bellinger
  1 sibling, 2 replies; 36+ messages in thread
From: James Bottomley @ 2010-10-27 14:27 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Nicholas A. Bellinger, Mike Anderson, linux-kernel, linux-scsi,
	Vasu Dev, Tim Chen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
> > This sounds like a pretty reasonable compromise that I think is slightly
> > less risky for the LLDs with the ghosts and cob-webs hanging off of
> > them.
> 
> They won't get tested either next release cycle. Essentially
> near nobody uses them.
> 
> > 
> > What do you think..?
> 
> Standard linux practice is to simply push the locks down. That's a pretty
> mechanical operation and shouldn't be too risky
> 
> With some luck you could even do it with coccinelle.

Precisely ... if we can do the push down now as a mechanical
transformation we can put it in the current merge window as a low risk
API change.  This gives us optimal exposure to the rc sequence to sort
out any problems that arise (or drivers that got missed) with the lowest
risk of such problems actually arising.  Given the corner cases and the
late arrival of fixes, the serial number changes are just too risky for
the current merge window.  Having an API that changes depending on a
flag is also a high risk process because it's prone to further sources
of error.

James





^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-27  7:53                 ` Andi Kleen
  2010-10-27 14:27                   ` James Bottomley
@ 2010-10-27 18:03                   ` Nicholas A. Bellinger
  1 sibling, 0 replies; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-27 18:03 UTC (permalink / raw)
  To: Andi Kleen
  Cc: James Bottomley, Mike Anderson, linux-kernel, linux-scsi,
	Vasu Dev, Tim Chen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin

On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
> > This sounds like a pretty reasonable compromise that I think is slightly
> > less risky for the LLDs with the ghosts and cob-webs hanging off of
> > them.
> 
> They won't get tested either next release cycle. Essentially
> near nobody uses them.
> 

This is exactly my point.  Using this series does not introduce
distruptive changes into LLDs that will be getting little or no testing
wrt the changes to enable lock-less operation with the modern LLDs that
we actually care about.  When running with the default of
SHT->unlocked_qcmd=0, the legacy LLDs will continue to function
*exactly* the same, minus those that are now using the explict
scsi_cmd_get_serial() call because they use cmd->serial_number for
something beyond simple informational purposes.

> > 
> > What do you think..?
> 
> Standard linux practice is to simply push the locks down. That's a pretty
> mechanical operation and shouldn't be too risky
> 

No disagreements here whatsoever, as I think that pushing the locks
approach does make alot sense as the final goal.  The question is if
starting with this series is less disruptive and less error prone than
adding a new host_lock -> lock() and unlock() in SHT->queuecommand() of
every single legacy LLD and every single failure path for that legacy
code.

The benfits on this series is having to add less LOC, not having to
touch lots legacy LLD code ->queuecommand() that will get little or no
testing, and the 'by default' setting of using SHT->unlocked_qcmd=0 (eg:
legacy mode).  I believe it makes sense that merging this approach first
and then transitioning to pushing the locks in per LLDs specific
SHT->queuecommand() would be the most logical two steps for a graceful
transition to optional host_lock less scsi_dispatch_cmd().

Best,

--nab


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-27 14:27                   ` James Bottomley
@ 2010-10-27 18:06                     ` Nicholas A. Bellinger
  2010-10-27 18:16                       ` James Bottomley
                                         ` (2 more replies)
  2010-10-28  9:10                     ` Andi Kleen
  1 sibling, 3 replies; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-27 18:06 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andi Kleen, Mike Anderson, linux-kernel, linux-scsi, Vasu Dev,
	Tim Chen, Matthew Wilcox, Mike Christie, Jens Axboe, James Smart,
	Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke, Joe Eykholt,
	Christoph Hellwig, Jon Hawley, Brian King, Christof Schmitt,
	Tejun Heo, Andrew Morton, H. Peter Anvin

On Wed, 2010-10-27 at 09:27 -0500, James Bottomley wrote:
> On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
> > > This sounds like a pretty reasonable compromise that I think is slightly
> > > less risky for the LLDs with the ghosts and cob-webs hanging off of
> > > them.
> > 
> > They won't get tested either next release cycle. Essentially
> > near nobody uses them.
> > 
> > > 
> > > What do you think..?
> > 
> > Standard linux practice is to simply push the locks down. That's a pretty
> > mechanical operation and shouldn't be too risky
> > 
> > With some luck you could even do it with coccinelle.
> 
> Precisely ... if we can do the push down now as a mechanical
> transformation we can put it in the current merge window as a low risk
> API change.

I disagree that touching every single legacy LLD's SHT->queuecommand()
and failure paths in that code is a low rist change.

>   This gives us optimal exposure to the rc sequence to sort
> out any problems that arise (or drivers that got missed) with the lowest
> risk of such problems actually arising.

Yes, 

> Given the corner cases and the
> late arrival of fixes, the serial number changes are just too risky for
> the current merge window.

I think with andmike's testing and ACKs for the necessary scsi_error.c
changes this would be an acceptable risk.

> Having an API that changes depending on a
> flag is also a high risk process because it's prone to further sources
> of error.
> 

I think this would be considered high risk if the setting of the flag
explictly was required to obtain the default legacy operation.  With
this series that is not the case, as the default SHT->unlocked_qcmd=0
will allow legacy LLDs to function exactly the manner they expect, while
allowing modern LLDs to run in host_lock-less mode.

Best,

--nab


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-27 18:06                     ` Nicholas A. Bellinger
@ 2010-10-27 18:16                       ` James Bottomley
  2010-10-27 19:20                       ` Mike Anderson
  2010-10-27 23:28                       ` Jeff Garzik
  2 siblings, 0 replies; 36+ messages in thread
From: James Bottomley @ 2010-10-27 18:16 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Andi Kleen, Mike Anderson, linux-kernel, linux-scsi, Vasu Dev,
	Tim Chen, Matthew Wilcox, Mike Christie, Jens Axboe, James Smart,
	Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke, Joe Eykholt,
	Christoph Hellwig, Jon Hawley, Brian King, Christof Schmitt,
	Tejun Heo, Andrew Morton, H. Peter Anvin

On Wed, 2010-10-27 at 11:06 -0700, Nicholas A. Bellinger wrote:
> On Wed, 2010-10-27 at 09:27 -0500, James Bottomley wrote:
> > On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
> > > > This sounds like a pretty reasonable compromise that I think is slightly
> > > > less risky for the LLDs with the ghosts and cob-webs hanging off of
> > > > them.
> > > 
> > > They won't get tested either next release cycle. Essentially
> > > near nobody uses them.
> > > 
> > > > 
> > > > What do you think..?
> > > 
> > > Standard linux practice is to simply push the locks down. That's a pretty
> > > mechanical operation and shouldn't be too risky
> > > 
> > > With some luck you could even do it with coccinelle.
> > 
> > Precisely ... if we can do the push down now as a mechanical
> > transformation we can put it in the current merge window as a low risk
> > API change.
> 
> I disagree that touching every single legacy LLD's SHT->queuecommand()
> and failure paths in that code is a low rist change.

It can be done mechanically.

> >   This gives us optimal exposure to the rc sequence to sort
> > out any problems that arise (or drivers that got missed) with the lowest
> > risk of such problems actually arising.
> 
> Yes, 
> 
> > Given the corner cases and the
> > late arrival of fixes, the serial number changes are just too risky for
> > the current merge window.
> 
> I think with andmike's testing and ACKs for the necessary scsi_error.c
> changes this would be an acceptable risk.

I already said why I didn't like this change.  Without the serial
number, there's no problem.

> > Having an API that changes depending on a
> > flag is also a high risk process because it's prone to further sources
> > of error.
> > 
> 
> I think this would be considered high risk if the setting of the flag
> explictly was required to obtain the default legacy operation.  With
> this series that is not the case, as the default SHT->unlocked_qcmd=0
> will allow legacy LLDs to function exactly the manner they expect, while
> allowing modern LLDs to run in host_lock-less mode.

Having a variable API based on a flag elsewhere is always a bad idea.

James



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-27 18:06                     ` Nicholas A. Bellinger
  2010-10-27 18:16                       ` James Bottomley
@ 2010-10-27 19:20                       ` Mike Anderson
  2010-10-27 19:55                         ` Nicholas A. Bellinger
  2010-10-27 23:28                       ` Jeff Garzik
  2 siblings, 1 reply; 36+ messages in thread
From: Mike Anderson @ 2010-10-27 19:20 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: James Bottomley, Andi Kleen, linux-kernel, linux-scsi, Vasu Dev,
	Tim Chen, Matthew Wilcox, Mike Christie, Jens Axboe, James Smart,
	Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke, Joe Eykholt,
	Christoph Hellwig, Jon Hawley, Brian King, Christof Schmitt,
	Tejun Heo, Andrew Morton, H. Peter Anvin

Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> On Wed, 2010-10-27 at 09:27 -0500, James Bottomley wrote:
> > On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
> > > > This sounds like a pretty reasonable compromise that I think is slightly
> > > > less risky for the LLDs with the ghosts and cob-webs hanging off of
> > > > them.
> > > 
> > > They won't get tested either next release cycle. Essentially
> > > near nobody uses them.
> > > 
> > > > 
> > > > What do you think..?
> > > 
> > > Standard linux practice is to simply push the locks down. That's a pretty
> > > mechanical operation and shouldn't be too risky
> > > 
> > > With some luck you could even do it with coccinelle.
> > 
> > Precisely ... if we can do the push down now as a mechanical
> > transformation we can put it in the current merge window as a low risk
> > API change.
> 
> I disagree that touching every single legacy LLD's SHT->queuecommand()
> and failure paths in that code is a low rist change.
> 
> >   This gives us optimal exposure to the rc sequence to sort
> > out any problems that arise (or drivers that got missed) with the lowest
> > risk of such problems actually arising.
> 
> Yes, 
> 
> > Given the corner cases and the
> > late arrival of fixes, the serial number changes are just too risky for
> > the current merge window.
> 
> I think with andmike's testing and ACKs for the necessary scsi_error.c
> changes this would be an acceptable risk.
> 

Adding SCSI_EH_SOFTIRQ_DONE in scsi_softirq_done is not going to provide
value in scsi_try_to_abort_cmd. scsi_softirq_done calls scsi_eh_scmd_add
without the SCSI_EH_CANCEL_CMD flag set which will stop
scsi_try_to_abort_cmd from being called.

Removing the serial_number check in scsi_try_to_abort_cmd and not
replacing it may be the correct action as we should be relying on the
block complete checking. That said what James has indicated about
splitting the serial number change out seems like the lower risk approach
at this time.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-27 19:20                       ` Mike Anderson
@ 2010-10-27 19:55                         ` Nicholas A. Bellinger
  0 siblings, 0 replies; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-27 19:55 UTC (permalink / raw)
  To: Mike Anderson
  Cc: James Bottomley, Andi Kleen, linux-kernel, linux-scsi, Vasu Dev,
	Tim Chen, Matthew Wilcox, Mike Christie, Jens Axboe, James Smart,
	Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke, Joe Eykholt,
	Christoph Hellwig, Jon Hawley, Brian King, Christof Schmitt,
	Tejun Heo, Andrew Morton, H. Peter Anvin

On Wed, 2010-10-27 at 12:20 -0700, Mike Anderson wrote:
> Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> > On Wed, 2010-10-27 at 09:27 -0500, James Bottomley wrote:
> > > On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
> > > > > This sounds like a pretty reasonable compromise that I think is slightly
> > > > > less risky for the LLDs with the ghosts and cob-webs hanging off of
> > > > > them.
> > > > 
> > > > They won't get tested either next release cycle. Essentially
> > > > near nobody uses them.
> > > > 
> > > > > 
> > > > > What do you think..?
> > > > 
> > > > Standard linux practice is to simply push the locks down. That's a pretty
> > > > mechanical operation and shouldn't be too risky
> > > > 
> > > > With some luck you could even do it with coccinelle.
> > > 
> > > Precisely ... if we can do the push down now as a mechanical
> > > transformation we can put it in the current merge window as a low risk
> > > API change.
> > 
> > I disagree that touching every single legacy LLD's SHT->queuecommand()
> > and failure paths in that code is a low rist change.
> > 
> > >   This gives us optimal exposure to the rc sequence to sort
> > > out any problems that arise (or drivers that got missed) with the lowest
> > > risk of such problems actually arising.
> > 
> > Yes, 
> > 
> > > Given the corner cases and the
> > > late arrival of fixes, the serial number changes are just too risky for
> > > the current merge window.
> > 
> > I think with andmike's testing and ACKs for the necessary scsi_error.c
> > changes this would be an acceptable risk.
> > 
> 
> Adding SCSI_EH_SOFTIRQ_DONE in scsi_softirq_done is not going to provide
> value in scsi_try_to_abort_cmd. scsi_softirq_done calls scsi_eh_scmd_add
> without the SCSI_EH_CANCEL_CMD flag set which will stop
> scsi_try_to_abort_cmd from being called.
> 
> Removing the serial_number check in scsi_try_to_abort_cmd and not
> replacing it may be the correct action as we should be relying on the
> block complete checking. That said what James has indicated about
> splitting the serial number change out seems like the lower risk approach
> at this time.
> 

Hmm, that is unfortuate..

So in this case it would make sense to drop the explict LLD usage of
scsi_cmd_get_serial(), and re-include this into scsi_dispatch_cmd() for
all LLDs and have to deal with a per scsi_host atomic_t serial_number
counter.  Anyways, I will go ahead an respin another series to follow
this logic shortly.

The other question that was mentioned in my email yesterday would be if
the clearing of a non atomic_t cmd->serial_number from
scsi_softirq_done() -> scsi_try_to_abort_cmd() is safe to begin with..?
Does this need to be converted to an atomic_t as well to present a
subtle race outside of any of the host_lock-less series of patches..?

--nab




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-27 18:06                     ` Nicholas A. Bellinger
  2010-10-27 18:16                       ` James Bottomley
  2010-10-27 19:20                       ` Mike Anderson
@ 2010-10-27 23:28                       ` Jeff Garzik
  2 siblings, 0 replies; 36+ messages in thread
From: Jeff Garzik @ 2010-10-27 23:28 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: James Bottomley, Andi Kleen, Mike Anderson, linux-kernel,
	linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox, Mike Christie,
	Jens Axboe, James Smart, Andrew Vasquez, FUJITA Tomonori,
	Hannes Reinecke, Joe Eykholt, Christoph Hellwig, Jon Hawley,
	Brian King, Christof Schmitt, Tejun Heo, Andrew Morton,
	H. Peter Anvin

On 10/27/2010 02:06 PM, Nicholas A. Bellinger wrote:
> On Wed, 2010-10-27 at 09:27 -0500, James Bottomley wrote:
>> On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
>>>> This sounds like a pretty reasonable compromise that I think is slightly
>>>> less risky for the LLDs with the ghosts and cob-webs hanging off of
>>>> them.
>>>
>>> They won't get tested either next release cycle. Essentially
>>> near nobody uses them.
>>>
>>>>
>>>> What do you think..?
>>>
>>> Standard linux practice is to simply push the locks down. That's a pretty
>>> mechanical operation and shouldn't be too risky
>>>
>>> With some luck you could even do it with coccinelle.
>>
>> Precisely ... if we can do the push down now as a mechanical
>> transformation we can put it in the current merge window as a low risk
>> API change.
>
> I disagree that touching every single legacy LLD's SHT->queuecommand()
> and failure paths in that code is a low rist change.

Think of changes like steps in a math proof.  You want to make a series 
of equivalent transformations such that, each transformation takes the 
code (proof) one step closer to your goal.

unlocked_qcmds does nothing but increase complexity, as opposed to an 
equivalent-transformation mechanical code change much more easily 
provable as correct (or broken). unlocked_qcmds is the type of 
transition step you always want to avoid if possible.  Conditional 
locking logic tends to be complexity at its worst... and in this case, 
it's all avoidable.

	Jeff



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-27 14:27                   ` James Bottomley
  2010-10-27 18:06                     ` Nicholas A. Bellinger
@ 2010-10-28  9:10                     ` Andi Kleen
  2010-10-28 11:18                         ` Boaz Harrosh
  1 sibling, 1 reply; 36+ messages in thread
From: Andi Kleen @ 2010-10-28  9:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Nicholas A. Bellinger, Mike Anderson, linux-kernel, linux-scsi,
	Vasu Dev, Tim Chen, Matthew Wilcox, Mike Christie, Jens Axboe,
	James Smart, Andrew Vasquez, FUJITA Tomonori, Hannes Reinecke,
	Joe Eykholt, Christoph Hellwig, Jon Hawley, Brian King,
	Christof Schmitt, Tejun Heo, Andrew Morton, H. Peter Anvin,
	julia

On Wed, Oct 27, 2010 at 09:27:38AM -0500, James Bottomley wrote:
> On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
> > > This sounds like a pretty reasonable compromise that I think is slightly
> > > less risky for the LLDs with the ghosts and cob-webs hanging off of
> > > them.
> > 
> > They won't get tested either next release cycle. Essentially
> > near nobody uses them.
> > 
> > > 
> > > What do you think..?
> > 
> > Standard linux practice is to simply push the locks down. That's a pretty
> > mechanical operation and shouldn't be too risky
> > 
> > With some luck you could even do it with coccinelle.
> 
> Precisely ... if we can do the push down now as a mechanical
> transformation we can put it in the current merge window as a low risk
> API change.  This gives us optimal exposure to the rc sequence to sort
> out any problems that arise (or drivers that got missed) with the lowest
> risk of such problems actually arising.  Given the corner cases and the
> late arrival of fixes, the serial number changes are just too risky for
> the current merge window.  Having an API that changes depending on a
> flag is also a high risk process because it's prone to further sources
> of error.

Here's a coccinelle script I came up with that does the push down.
It still adds a bogus empty line in front of the irqflags declaration
which I haven't managed to avoid yet. Other than the it seems
to DTRT on the SCSI drivers I tried.

-Andi


@ rule1 @
struct scsi_host_template t;
identifier qc;
@@
t.queuecommand = qc;

@ rule2 @
identifier rule1.qc;
identifier cmnd;
expression E;
statement S, S2;
@@
int qc(struct scsi_cmnd *cmnd, ...) 
{
	... when != S
+	unsigned long irqflags;

+	spin_lock_irqsave(&cmnd->device->host->hostlock, irqflags);
	S2
	...
+	spin_unlock_irqrestore(&cmnd->device->host->hostlock, irqflags);
	return E;
}




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-28  9:10                     ` Andi Kleen
@ 2010-10-28 11:18                         ` Boaz Harrosh
  0 siblings, 0 replies; 36+ messages in thread
From: Boaz Harrosh @ 2010-10-28 11:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: James Bottomley, Nicholas A. Bellinger, Mike Anderson,
	linux-kernel, linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, Brian King, Christof Schmitt, Tejun Heo,
	Andrew Morton, H. Peter Anvin, julia

On 10/28/2010 11:10 AM, Andi Kleen wrote:
> On Wed, Oct 27, 2010 at 09:27:38AM -0500, James Bottomley wrote:
>> On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
>>>> This sounds like a pretty reasonable compromise that I think is slightly
>>>> less risky for the LLDs with the ghosts and cob-webs hanging off of
>>>> them.
>>>
>>> They won't get tested either next release cycle. Essentially
>>> near nobody uses them.
>>>
>>>>
>>>> What do you think..?
>>>
>>> Standard linux practice is to simply push the locks down. That's a pretty
>>> mechanical operation and shouldn't be too risky
>>>
>>> With some luck you could even do it with coccinelle.
>>
>> Precisely ... if we can do the push down now as a mechanical
>> transformation we can put it in the current merge window as a low risk
>> API change.  This gives us optimal exposure to the rc sequence to sort
>> out any problems that arise (or drivers that got missed) with the lowest
>> risk of such problems actually arising.  Given the corner cases and the
>> late arrival of fixes, the serial number changes are just too risky for
>> the current merge window.  Having an API that changes depending on a
>> flag is also a high risk process because it's prone to further sources
>> of error.
> 
> Here's a coccinelle script I came up with that does the push down.
> It still adds a bogus empty line in front of the irqflags declaration
> which I haven't managed to avoid yet. Other than the it seems
> to DTRT on the SCSI drivers I tried.
> 
> -Andi
> 
> 
> @ rule1 @
> struct scsi_host_template t;
> identifier qc;
> @@
> t.queuecommand = qc;
> 
> @ rule2 @
> identifier rule1.qc;
> identifier cmnd;
> expression E;
> statement S, S2;
> @@
> int qc(struct scsi_cmnd *cmnd, ...) 
> {
> 	... when != S
> +	unsigned long irqflags;
> 
> +	spin_lock_irqsave(&cmnd->device->host->hostlock, irqflags);
> 	S2
> 	...
> +	spin_unlock_irqrestore(&cmnd->device->host->hostlock, irqflags);
> 	return E;
> }
> 

I disagree with your approach this introduces a spin_unlock_irqrestore
call site at every return, in the usually huge queuecommand.

I'd say just do:
- Rename XXX_queuecommand => __XXX_queuecommand_unlocked
- Define new XXX_queuecommand

int qc(struct scsi_cmnd *cmnd, ...) 
{
	unsigned long irqflags;
	int ret;

	spin_lock_irqsave(&cmnd->device->host->hostlock, irqflags);
	ret = __XXX_queuecommand_unlocked(cmnd, ...)
	spin_unlock_irqrestore(&cmnd->device->host->hostlock, irqflags);
	return ret;
}

Then when the driver is manually converted the __queuecommand_unlocked
can be set into the scsi_host_template and the added function can
be dropped.

My $0.017
Boaz


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
@ 2010-10-28 11:18                         ` Boaz Harrosh
  0 siblings, 0 replies; 36+ messages in thread
From: Boaz Harrosh @ 2010-10-28 11:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: James Bottomley, Nicholas A. Bellinger, Mike Anderson,
	linux-kernel, linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, Brian King, Christof Schmitt, Tejun Heo,
	Andrew Morton, H. Peter Anvin, julia

On 10/28/2010 11:10 AM, Andi Kleen wrote:
> On Wed, Oct 27, 2010 at 09:27:38AM -0500, James Bottomley wrote:
>> On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
>>>> This sounds like a pretty reasonable compromise that I think is slightly
>>>> less risky for the LLDs with the ghosts and cob-webs hanging off of
>>>> them.
>>>
>>> They won't get tested either next release cycle. Essentially
>>> near nobody uses them.
>>>
>>>>
>>>> What do you think..?
>>>
>>> Standard linux practice is to simply push the locks down. That's a pretty
>>> mechanical operation and shouldn't be too risky
>>>
>>> With some luck you could even do it with coccinelle.
>>
>> Precisely ... if we can do the push down now as a mechanical
>> transformation we can put it in the current merge window as a low risk
>> API change.  This gives us optimal exposure to the rc sequence to sort
>> out any problems that arise (or drivers that got missed) with the lowest
>> risk of such problems actually arising.  Given the corner cases and the
>> late arrival of fixes, the serial number changes are just too risky for
>> the current merge window.  Having an API that changes depending on a
>> flag is also a high risk process because it's prone to further sources
>> of error.
> 
> Here's a coccinelle script I came up with that does the push down.
> It still adds a bogus empty line in front of the irqflags declaration
> which I haven't managed to avoid yet. Other than the it seems
> to DTRT on the SCSI drivers I tried.
> 
> -Andi
> 
> 
> @ rule1 @
> struct scsi_host_template t;
> identifier qc;
> @@
> t.queuecommand = qc;
> 
> @ rule2 @
> identifier rule1.qc;
> identifier cmnd;
> expression E;
> statement S, S2;
> @@
> int qc(struct scsi_cmnd *cmnd, ...) 
> {
> 	... when != S
> +	unsigned long irqflags;
> 
> +	spin_lock_irqsave(&cmnd->device->host->hostlock, irqflags);
> 	S2
> 	...
> +	spin_unlock_irqrestore(&cmnd->device->host->hostlock, irqflags);
> 	return E;
> }
> 

I disagree with your approach this introduces a spin_unlock_irqrestore
call site at every return, in the usually huge queuecommand.

I'd say just do:
- Rename XXX_queuecommand => __XXX_queuecommand_unlocked
- Define new XXX_queuecommand

int qc(struct scsi_cmnd *cmnd, ...) 
{
	unsigned long irqflags;
	int ret;

	spin_lock_irqsave(&cmnd->device->host->hostlock, irqflags);
	ret = __XXX_queuecommand_unlocked(cmnd, ...)
	spin_unlock_irqrestore(&cmnd->device->host->hostlock, irqflags);
	return ret;
}

Then when the driver is manually converted the __queuecommand_unlocked
can be set into the scsi_host_template and the added function can
be dropped.

My $0.017
Boaz


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-28 11:18                         ` Boaz Harrosh
@ 2010-10-28 18:26                           ` Andi Kleen
  -1 siblings, 0 replies; 36+ messages in thread
From: Andi Kleen @ 2010-10-28 18:26 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: James Bottomley, Nicholas A. Bellinger, Mike Anderson,
	linux-kernel, linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, Brian King, Christof Schmitt, Tejun Heo,
	Andrew Morton, H. Peter Anvin, julia

> I disagree with your approach this introduces a spin_unlock_irqrestore
> call site at every return, in the usually huge queuecommand.

I converted the full tree and in practice it turns out there 
are very few returns in nearly all queuecommands. So it won't
make much difference.

Longer term they will be all hopefully gone again anyways.

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
@ 2010-10-28 18:26                           ` Andi Kleen
  0 siblings, 0 replies; 36+ messages in thread
From: Andi Kleen @ 2010-10-28 18:26 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: James Bottomley, Nicholas A. Bellinger, Mike Anderson,
	linux-kernel, linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, Brian King, Christof Schmitt, Tejun Heo,
	Andrew Morton, H. Peter Anvin, julia

> I disagree with your approach this introduces a spin_unlock_irqrestore
> call site at every return, in the usually huge queuecommand.

I converted the full tree and in practice it turns out there 
are very few returns in nearly all queuecommands. So it won't
make much difference.

Longer term they will be all hopefully gone again anyways.

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-28 11:18                         ` Boaz Harrosh
  (?)
  (?)
@ 2010-10-28 20:27                         ` Nicholas A. Bellinger
  2010-10-29  7:50                           ` Andi Kleen
  -1 siblings, 1 reply; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-28 20:27 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Andi Kleen, James Bottomley, Mike Anderson, linux-kernel,
	linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox, Mike Christie,
	Jens Axboe, James Smart, Andrew Vasquez, FUJITA Tomonori,
	Hannes Reinecke, Joe Eykholt, Christoph Hellwig, Jon Hawley,
	Brian King, Christof Schmitt, Tejun Heo, Andrew Morton,
	H. Peter Anvin, julia

On Thu, 2010-10-28 at 13:18 +0200, Boaz Harrosh wrote:
> On 10/28/2010 11:10 AM, Andi Kleen wrote:
> > On Wed, Oct 27, 2010 at 09:27:38AM -0500, James Bottomley wrote:
> >> On Wed, 2010-10-27 at 09:53 +0200, Andi Kleen wrote:
> >>>> This sounds like a pretty reasonable compromise that I think is slightly
> >>>> less risky for the LLDs with the ghosts and cob-webs hanging off of
> >>>> them.
> >>>
> >>> They won't get tested either next release cycle. Essentially
> >>> near nobody uses them.
> >>>
> >>>>
> >>>> What do you think..?
> >>>
> >>> Standard linux practice is to simply push the locks down. That's a pretty
> >>> mechanical operation and shouldn't be too risky
> >>>
> >>> With some luck you could even do it with coccinelle.
> >>
> >> Precisely ... if we can do the push down now as a mechanical
> >> transformation we can put it in the current merge window as a low risk
> >> API change.  This gives us optimal exposure to the rc sequence to sort
> >> out any problems that arise (or drivers that got missed) with the lowest
> >> risk of such problems actually arising.  Given the corner cases and the
> >> late arrival of fixes, the serial number changes are just too risky for
> >> the current merge window.  Having an API that changes depending on a
> >> flag is also a high risk process because it's prone to further sources
> >> of error.
> > 
> > Here's a coccinelle script I came up with that does the push down.
> > It still adds a bogus empty line in front of the irqflags declaration
> > which I haven't managed to avoid yet. Other than the it seems
> > to DTRT on the SCSI drivers I tried.
> > 
> > -Andi
> > 
> > 
> > @ rule1 @
> > struct scsi_host_template t;
> > identifier qc;
> > @@
> > t.queuecommand = qc;
> > 
> > @ rule2 @
> > identifier rule1.qc;
> > identifier cmnd;
> > expression E;
> > statement S, S2;
> > @@
> > int qc(struct scsi_cmnd *cmnd, ...) 
> > {
> > 	... when != S
> > +	unsigned long irqflags;
> > 
> > +	spin_lock_irqsave(&cmnd->device->host->hostlock, irqflags);
> > 	S2
> > 	...
> > +	spin_unlock_irqrestore(&cmnd->device->host->hostlock, irqflags);
> > 	return E;
> > }
> > 
> 
> I disagree with your approach this introduces a spin_unlock_irqrestore
> call site at every return, in the usually huge queuecommand.
> 
> I'd say just do:
> - Rename XXX_queuecommand => __XXX_queuecommand_unlocked
> - Define new XXX_queuecommand
> 
> int qc(struct scsi_cmnd *cmnd, ...) 
> {
> 	unsigned long irqflags;
> 	int ret;
> 
> 	spin_lock_irqsave(&cmnd->device->host->hostlock, irqflags);
> 	ret = __XXX_queuecommand_unlocked(cmnd, ...)
> 	spin_unlock_irqrestore(&cmnd->device->host->hostlock, irqflags);
> 	return ret;
> }
> 
> Then when the driver is manually converted the __queuecommand_unlocked
> can be set into the scsi_host_template and the added function can
> be dropped.
> 

I would have to agree that approach does make a bit more sense..  Now
can some brave soul (/me looks at ak) code another script to automate
this for the identified legacy LLDs cases that need push down..?

8-)

--nab



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-28 20:27                         ` Nicholas A. Bellinger
@ 2010-10-29  7:50                           ` Andi Kleen
  2010-10-29 23:50                               ` Nicholas A. Bellinger
  2010-10-29 23:57                             ` Nicholas A. Bellinger
  0 siblings, 2 replies; 36+ messages in thread
From: Andi Kleen @ 2010-10-29  7:50 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Boaz Harrosh, James Bottomley, Mike Anderson, linux-kernel,
	linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox, Mike Christie,
	Jens Axboe, James Smart, Andrew Vasquez, FUJITA Tomonori,
	Hannes Reinecke, Joe Eykholt, Christoph Hellwig, Jon Hawley,
	Brian King, Christof Schmitt, Tejun Heo, Andrew Morton,
	H. Peter Anvin, julia

> I would have to agree that approach does make a bit more sense..  Now
> can some brave soul (/me looks at ak) code another script to automate
> this for the identified legacy LLDs cases that need push down..?

For the drivers I looked at it doesn't really make too much difference,
I don't think there was any with a really large number of returns.
You can see that in the diffstat for the full changes.

Writing another script is probably not too hard, but the problem
is that I needed a significant amount of manual post processing
(both the select the right files to patch and to get rid
of misplaced newlines and some mismatches in cocci) 
So it's not a fully automated procedure.

-Andi


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-29  7:50                           ` Andi Kleen
@ 2010-10-29 23:50                               ` Nicholas A. Bellinger
  2010-10-29 23:57                             ` Nicholas A. Bellinger
  1 sibling, 0 replies; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-29 23:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Boaz Harrosh, James Bottomley, Mike Anderson, linux-kernel,
	linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox, Mike Christie,
	Jens Axboe, James Smart, Andrew Vasquez, FUJITA Tomonori,
	Hannes Reinecke, Joe Eykholt, Christoph Hellwig, Jon Hawley,
	Brian King, Christof Schmitt, Tejun Heo, Andrew Morton,
	H. Peter Anvin, julia, Jeff Garzik

On Fri, 2010-10-29 at 09:50 +0200, Andi Kleen wrote:
> > I would have to agree that approach does make a bit more sense..  Now
> > can some brave soul (/me looks at ak) code another script to automate
> > this for the identified legacy LLDs cases that need push down..?
> 
> For the drivers I looked at it doesn't really make too much difference,
> I don't think there was any with a really large number of returns.
> You can see that in the diffstat for the full changes.
> 

Yep, this is a valid point.  During the push down this week, I only ran
into a handful (say ~15) overly-complex SHT->queuecommand() w more than
a few return statements.  This was also isoloated for the most part into
legacy/ancient LLDs.

> Writing another script is probably not too hard, but the problem
> is that I needed a significant amount of manual post processing
> (both the select the right files to patch and to get rid
> of misplaced newlines and some mismatches in cocci) 
> So it's not a fully automated procedure.
> 

<nod>

So I really do not have a strong preference here.  I think Boaz's
approach is a bit cleaner, but in the end I think it should not hold up
on a global push-down merge for lio-4.0.git/host_lock-less-for-37-v9.

Also, at this point I have not received any other feedback from LLD
maintainers to say "My LLD should be running in lock-less mode" or "My
LLD cannot run in lock-less mode, and needs the host_lock push down".

One additional point that was raised by jgarzik was that those LLDs
running in 'lock-less' mode may need to have their internal
->queuecommand() spinlocks converted to spin_lock_irqsave() +
spin_unlock_irqrestore().  This has already been fixed for libata
following Jeff's recommendation, but we also identified a few other
potential LLDs that may need more work.  From the last response locatd
here:

http://marc.info/?l=linux-scsi&m=128825558912565&w=2

"Ok, so originally libiscsi, fnic, and lpfc where using
spin_unlock(host->host_lock) -> spin_lock(host->host_lock)

And libsas, qla2xxx, qla4xxx where using
spin_unlock_irq(host->host_lock) -> spin_lock_irq(host->host_lock)

Just to verify, you are thinking that those *not* using spin_unlock_irq
+ spin_lock_irq() for the legacy optimization dispatch should have their
per LLD host lock converted to spin_lock_irqsave() +
spin_unlock_irqrestore(), right..?"

So I think that means an audit of locks obtained in libiscsi, fnic and lpfc
will be in to ensure they do not assume irqs have alreadby been externally
disabled for lock-less operation..

Any comments folks..?

--nab


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
@ 2010-10-29 23:50                               ` Nicholas A. Bellinger
  0 siblings, 0 replies; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-29 23:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Boaz Harrosh, James Bottomley, Mike Anderson, linux-kernel,
	linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox, Mike Christie,
	Jens Axboe, James Smart, Andrew Vasquez, FUJITA Tomonori,
	Hannes Reinecke, Joe Eykholt, Christoph Hellwig, Jon Hawley,
	Brian King, Christof Schmitt, Tejun Heo, Andrew Morton,
	H. Peter Anvin, julia, Jeff

On Fri, 2010-10-29 at 09:50 +0200, Andi Kleen wrote:
> > I would have to agree that approach does make a bit more sense..  Now
> > can some brave soul (/me looks at ak) code another script to automate
> > this for the identified legacy LLDs cases that need push down..?
> 
> For the drivers I looked at it doesn't really make too much difference,
> I don't think there was any with a really large number of returns.
> You can see that in the diffstat for the full changes.
> 

Yep, this is a valid point.  During the push down this week, I only ran
into a handful (say ~15) overly-complex SHT->queuecommand() w more than
a few return statements.  This was also isoloated for the most part into
legacy/ancient LLDs.

> Writing another script is probably not too hard, but the problem
> is that I needed a significant amount of manual post processing
> (both the select the right files to patch and to get rid
> of misplaced newlines and some mismatches in cocci) 
> So it's not a fully automated procedure.
> 

<nod>

So I really do not have a strong preference here.  I think Boaz's
approach is a bit cleaner, but in the end I think it should not hold up
on a global push-down merge for lio-4.0.git/host_lock-less-for-37-v9.

Also, at this point I have not received any other feedback from LLD
maintainers to say "My LLD should be running in lock-less mode" or "My
LLD cannot run in lock-less mode, and needs the host_lock push down".

One additional point that was raised by jgarzik was that those LLDs
running in 'lock-less' mode may need to have their internal
->queuecommand() spinlocks converted to spin_lock_irqsave() +
spin_unlock_irqrestore().  This has already been fixed for libata
following Jeff's recommendation, but we also identified a few other
potential LLDs that may need more work.  From the last response locatd
here:

http://marc.info/?l=linux-scsi&m=128825558912565&w=2

"Ok, so originally libiscsi, fnic, and lpfc where using
spin_unlock(host->host_lock) -> spin_lock(host->host_lock)

And libsas, qla2xxx, qla4xxx where using
spin_unlock_irq(host->host_lock) -> spin_lock_irq(host->host_lock)

Just to verify, you are thinking that those *not* using spin_unlock_irq
+ spin_lock_irq() for the legacy optimization dispatch should have their
per LLD host lock converted to spin_lock_irqsave() +
spin_unlock_irqrestore(), right..?"

So I think that means an audit of locks obtained in libiscsi, fnic and lpfc
will be in to ensure they do not assume irqs have alreadby been externally
disabled for lock-less operation..

Any comments folks..?

--nab


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-29  7:50                           ` Andi Kleen
  2010-10-29 23:50                               ` Nicholas A. Bellinger
@ 2010-10-29 23:57                             ` Nicholas A. Bellinger
  1 sibling, 0 replies; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-29 23:57 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Boaz Harrosh, James Bottomley, Mike Anderson, linux-kernel,
	linux-scsi, Vasu Dev, Tim Chen, Mike Christie, James Smart,
	FUJITA Tomonori, Christoph Hellwig, Tejun Heo, Andrew Morton,
	H. Peter Anvin, julia, Jeff Garzik

<trimming the CC' list, as the last response appears to have bounced>

On Fri, 2010-10-29 at 09:50 +0200, Andi Kleen wrote:
> > I would have to agree that approach does make a bit more sense..  Now
> > can some brave soul (/me looks at ak) code another script to automate
> > this for the identified legacy LLDs cases that need push down..?
> 
> For the drivers I looked at it doesn't really make too much difference,
> I don't think there was any with a really large number of returns.
> You can see that in the diffstat for the full changes.
> 

Yep, this is a valid point.  During the push down this week, I only ran
into a handful (say ~15) overly-complex SHT->queuecommand() w more than
a few return statements.  This was also isoloated for the most part into
legacy/ancient LLDs.

> Writing another script is probably not too hard, but the problem
> is that I needed a significant amount of manual post processing
> (both the select the right files to patch and to get rid
> of misplaced newlines and some mismatches in cocci) 
> So it's not a fully automated procedure.
> 

<nod>

So I really do not have a strong preference here.  I think Boaz's
approach is a bit cleaner, but in the end I think it should not hold up
on a global push-down merge for lio-4.0.git/host_lock-less-for-37-v9.

Also, at this point I have not received any other feedback from LLD
maintainers to say "My LLD should be running in lock-less mode" or "My
LLD cannot run in lock-less mode, and needs the host_lock push down".

One additional point that was raised by jgarzik was that those LLDs
running in 'lock-less' mode may need to have their internal
->queuecommand() spinlocks converted to spin_lock_irqsave() +
spin_unlock_irqrestore().  This has already been fixed for libata
following Jeff's recommendation, but we also identified a few other
potential LLDs that may need more work.  From the last response locatd
here:

http://marc.info/?l=linux-scsi&m=128825558912565&w=2

"Ok, so originally libiscsi, fnic, and lpfc where using
spin_unlock(host->host_lock) -> spin_lock(host->host_lock)

And libsas, qla2xxx, qla4xxx where using
spin_unlock_irq(host->host_lock) -> spin_lock_irq(host->host_lock)

Just to verify, you are thinking that those *not* using spin_unlock_irq
+ spin_lock_irq() for the legacy optimization dispatch should have their
per LLD host lock converted to spin_lock_irqsave() +
spin_unlock_irqrestore(), right..?"

So I think that means an audit of locks obtained in libiscsi, fnic and
lpfc will be in to ensure they do not assume irqs have alreadby been
externally disabled for lock-less operation..

Comments folks..?

--nab



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-28 18:26                           ` Andi Kleen
@ 2010-10-31 12:14                             ` Boaz Harrosh
  -1 siblings, 0 replies; 36+ messages in thread
From: Boaz Harrosh @ 2010-10-31 12:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: James Bottomley, Nicholas A. Bellinger, Mike Anderson,
	linux-kernel, linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, Brian King, Christof Schmitt, Tejun Heo,
	Andrew Morton, H. Peter Anvin, julia

On 10/28/2010 08:26 PM, Andi Kleen wrote:
>> I disagree with your approach this introduces a spin_unlock_irqrestore
>> call site at every return, in the usually huge queuecommand.
> 
> I converted the full tree and in practice it turns out there 
> are very few returns in nearly all queuecommands. So it won't
> make much difference.
> 

"few returns" is too much. Any thing bigger than 1 is a total wast.

And the mess?!? Where to add the flags? where the returns? Need a 
"{...}" or not. Lots of manual intervention, and possible errors.
I bet with my approach you wouldn't need to manually fix a single
file.

> Longer term they will be all hopefully gone again anyways.
> 

That one I'm most afraid of. These that did not get fixed in this
round, will not be fixed for a long time (if ever). I'd even go
anal and not open code the lock but actually call the original
__queue_command through a MACRO, that can be change in one place.
(And will solve your patchset bisect-ability)

- XXX_queue_command(...
+ XXX_queue_command_unlocked(...


+ XXX_queue_command(...
+ {
+ 	return SCSI_LOCKED_QUEUECOMMAND(XXX_queue_command_unlocked, ...);
+ }
> -Andi

But since I'm only blabing, the one that "do", gets to decide ;-) .
Perhaps next time.

Boaz

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
@ 2010-10-31 12:14                             ` Boaz Harrosh
  0 siblings, 0 replies; 36+ messages in thread
From: Boaz Harrosh @ 2010-10-31 12:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: James Bottomley, Nicholas A. Bellinger, Mike Anderson,
	linux-kernel, linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, Brian King, Christof Schmitt, Tejun Heo,
	Andrew Morton, H. Peter Anvin, julia

On 10/28/2010 08:26 PM, Andi Kleen wrote:
>> I disagree with your approach this introduces a spin_unlock_irqrestore
>> call site at every return, in the usually huge queuecommand.
> 
> I converted the full tree and in practice it turns out there 
> are very few returns in nearly all queuecommands. So it won't
> make much difference.
> 

"few returns" is too much. Any thing bigger than 1 is a total wast.

And the mess?!? Where to add the flags? where the returns? Need a 
"{...}" or not. Lots of manual intervention, and possible errors.
I bet with my approach you wouldn't need to manually fix a single
file.

> Longer term they will be all hopefully gone again anyways.
> 

That one I'm most afraid of. These that did not get fixed in this
round, will not be fixed for a long time (if ever). I'd even go
anal and not open code the lock but actually call the original
__queue_command through a MACRO, that can be change in one place.
(And will solve your patchset bisect-ability)

- XXX_queue_command(...
+ XXX_queue_command_unlocked(...


+ XXX_queue_command(...
+ {
+ 	return SCSI_LOCKED_QUEUECOMMAND(XXX_queue_command_unlocked, ...);
+ }
> -Andi

But since I'm only blabing, the one that "do", gets to decide ;-) .
Perhaps next time.

Boaz

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-31 12:14                             ` Boaz Harrosh
@ 2010-11-01 11:45                               ` Andi Kleen
  -1 siblings, 0 replies; 36+ messages in thread
From: Andi Kleen @ 2010-11-01 11:45 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: James Bottomley, Nicholas A. Bellinger, Mike Anderson,
	linux-kernel, linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, Brian King, Christof Schmitt, Tejun Heo,
	Andrew Morton, H. Peter Anvin, julia

> + XXX_queue_command(...
> + {
> + 	return SCSI_LOCKED_QUEUECOMMAND(XXX_queue_command_unlocked, ...);
> + }
> > -Andi
> 
> But since I'm only blabing, the one that "do", gets to decide ;-) .
> Perhaps next time.

With semantics patches "do"ing is actually not that much effort ...

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
@ 2010-11-01 11:45                               ` Andi Kleen
  0 siblings, 0 replies; 36+ messages in thread
From: Andi Kleen @ 2010-11-01 11:45 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: James Bottomley, Nicholas A. Bellinger, Mike Anderson,
	linux-kernel, linux-scsi, Vasu Dev, Tim Chen, Matthew Wilcox,
	Mike Christie, Jens Axboe, James Smart, Andrew Vasquez,
	FUJITA Tomonori, Hannes Reinecke, Joe Eykholt, Christoph Hellwig,
	Jon Hawley, Brian King, Christof Schmitt, Tejun Heo,
	Andrew Morton, H. Peter Anvin, julia

> + XXX_queue_command(...
> + {
> + 	return SCSI_LOCKED_QUEUECOMMAND(XXX_queue_command_unlocked, ...);
> + }
> > -Andi
> 
> But since I'm only blabing, the one that "do", gets to decide ;-) .
> Perhaps next time.

With semantics patches "do"ing is actually not that much effort ...

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-21  0:30   ` Giridhar Malavali
@ 2010-10-21  0:39     ` Nicholas A. Bellinger
  0 siblings, 0 replies; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-21  0:39 UTC (permalink / raw)
  To: Giridhar Malavali
  Cc: linux-kernel, LinuxSCSI, James Bottomley, Mike Christie, Andrew Vasquez

On Wed, 2010-10-20 at 17:30 -0700, Giridhar Malavali wrote:
> 
> 
> On 10/20/10 4:19 PM, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:
> 
> > On Wed, 2010-10-20 at 15:37 -0700, Giridhar Malavali wrote:
> >> 
> >> 
> > 
> > <Trimming long CC'list>
> > 
> > Hi Giri,
> > 
> >> On 10/20/10 1:49 PM, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:
> >> 
> >>> Greetings all,
> >>> 
> >>> *) Individual LLDs running by default w/ unlocked_qcmds=1
> >>> 
> >>> aic94xx: need ack maintainer at adaptec..?)
> >>> mvsas: need ack maintainer at marvell..?)
> >>> pm8001: need ack Jang Wang
> >>> qla4xxx, qla2xxx: need ack Andrew Vasquez
> >>> fnic:  need ack Joe Eykholt
> >> 
> >> The qla2xxx driver is modified not to depend on the host_lock and also to
> >> drop usage of scsi_cmnd->serial_number. Both the patches are submitted to
> >> linux-scsi and you can find more information at
> >> 
> >> http://marc.info/?l=linux-scsi=128716779923700=2
> >> <http://marc.info/?l=linux-scsi&m=128716779923700&w=2>
> > 
> > Sure, but for the new fast unlocked_qcmds=1 operation in
> > qla2xxx_queuecommand(), the host_lock access needs to be complete
> > removed from SHT->queuecommand().   The above patch just moves the
> > vha->host->host_lock unlock up in queuecommand(),  right..?
> > 
> > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> > index b0c7139..77203b0 100644
> > --- a/drivers/scsi/qla2xxx/qla_os.c
> > +++ b/drivers/scsi/qla2xxx/qla_os.c
> > @@ -545,6 +545,7 @@ qla2xxx_queuecommand(struct scsi_cmnd *cmd, void
> > (*done)(struct scsi_cmnd *))
> >         srb_t *sp;
> >         int rval;
> > 
> > +       spin_unlock_irq(vha->host->host_lock);
> >         if (ha->flags.eeh_busy) {
> >                 if (ha->flags.pci_channel_io_perm_failure)
> >                         cmd->result = DID_NO_CONNECT << 16;
> > 
> > <SNIP>
> > 
> > @@ -603,9 +599,11 @@ qc24_host_busy_lock:
> >         return SCSI_MLQUEUE_HOST_BUSY;
> > 
> >  qc24_target_busy:
> > +       spin_lock_irq(vha->host->host_lock);
> >         return SCSI_MLQUEUE_TARGET_BUSY;
> > 
> >  qc24_fail_command:
> > +       spin_lock_irq(vha->host->host_lock);
> >         done(cmd);
> > 
> >         return 0;
> > 
> >> http://marc.info/?l=linux-scsi=128716779623683=2
> >> <http://marc.info/?l=linux-scsi&m=128716779623683&w=2>
> >> 
> > 
> > <nod> I had been only updating LLDs that actually used ->serial_number
> > beyond a simple informational purposes for error recovery.  Thanks for
> > removing this one preemptively!  8-)
> > 
> > Best,
> > 
> > --nab
> > 
> 
> Hi Nicholas,
> 
> Yes, I understand. I was thinking that you are going to submit the patches
> for all LLD with your final submission.
> 
> I will submit the patch which removes host_lock in queuecommand routine
> completely then. 

Hmmmmm..

I think you will want to coordinate with James Bottomley here before
dropping the existing qla2xxx SHT->queuecomand() -> unlock() ->
do_lld_work() -> lock() code.  Doing this legacy optimization still does
provide some form of benefit (which btw I don't think anyone has ever
actually determined how much this), but lets make sure the legacy
optimization remains in place until we can resolve the remaining issues
so James can merge the initial pieces.  From there you can drop
host_lock in qla2xxx_queuecommand() and safely enable unlocked_qcmds=1
operation by default to realize the lock-less queue small block IOP
gains.

Best,

--nab

> 
> -- Giri
> 
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
  2010-10-20 23:19 ` Nicholas A. Bellinger
@ 2010-10-21  0:30   ` Giridhar Malavali
  2010-10-21  0:39     ` Nicholas A. Bellinger
  0 siblings, 1 reply; 36+ messages in thread
From: Giridhar Malavali @ 2010-10-21  0:30 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: linux-kernel, LinuxSCSI, James Bottomley, Mike Christie, Andrew Vasquez




On 10/20/10 4:19 PM, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:

> On Wed, 2010-10-20 at 15:37 -0700, Giridhar Malavali wrote:
>> 
>> 
> 
> <Trimming long CC'list>
> 
> Hi Giri,
> 
>> On 10/20/10 1:49 PM, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:
>> 
>>> Greetings all,
>>> 
>>> So as we get closer to the .37 merge window, I wanted to take this
>>> oppourtunity to recap the current status of the drop-host_lock /
>>> unlocked_qcmds=1 patches, and what is required for the next RFCv5 and
>>> hopefully a merge into .37.  The last RFCv4 was posted here:
>>> 
>>> http://marc.info/?l=linux-kernel=128563953114561=2
>>> <http://marc.info/?l=linux-kernel=128563953114561=2
>>> <http://marc.info/?l=linux-kernel&m=128563953114561&w=2> >
>>> 
>>> Since then, Christof Schmitt has sent a patch to drop struct
>>> scsi_cmnd->serial_number usage in zfcp, and Tim Chen has sent an
>>> important fix to drop an extra host_lock access that I originally missed
>>> in qla2xxx SHT->queuecommand() that certainly would have deadlocked a
>>> running machine.   Many thanks to Christof and Tim for your
>>> contributions and review!
>>> 
>>> So at this point in the game the current score sits at:
>>> 
>>> *) core drivers/scsi remaining issue(s):
>>> 
>>> The issue raised by andmike during RFCv4 described as:
>>> 
>>> "If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE is set it
>>> would be correct for the scsi_decide_disposition cases but it would
>>> appear this would stop __scsi_try_to_abort_cmd from being called in the
>>> time out case as REQ_ATOM_COMPLETE is set prior to calling
>>> blk_rq_timed_out."
>>> 
>>> The complete discussion is here:
>>> 
>>> http://marc.info/?l=linux-scsi=128535319915212=2
>>> <http://marc.info/?l=linux-scsi=128535319915212=2
>>> <http://marc.info/?l=linux-scsi&m=128535319915212&w=2> >
>>> 
>>> We still need folks with experience to dig into this code, so you know
>>> the scsi_error.c code please jump in!
>>> 
>>> *) LLD libraries running by default w/ unlocked_qcmds=1
>>> 
>>> libiscsi: need ack from mnc
>>> libsas: need ack from jejb
>>> libfc: remaining rport state + host_lock less issue.  Need more input
>>>        from mnc for James Smart and Joe on this...
>>> libata: jgarzik thinks this should be OK, review and ack from tejun
>>>         would also be very helpful.
>>> 
>>> The main issue remaining here is the audit of libfc rport (and other..?)
>>> code that assumes host_lock is held to protect state.  mnc, do you have
>>> any more thoughts for James Smart and Joe here..?
>>> 
>>> *) Individual LLDs running by default w/ unlocked_qcmds=1
>>> 
>>> aic94xx: need ack maintainer at adaptec..?)
>>> mvsas: need ack maintainer at marvell..?)
>>> pm8001: need ack Jang Wang
>>> qla4xxx, qla2xxx: need ack Andrew Vasquez
>>> fnic:  need ack Joe Eykholt
>> 
>> The qla2xxx driver is modified not to depend on the host_lock and also to
>> drop usage of scsi_cmnd->serial_number. Both the patches are submitted to
>> linux-scsi and you can find more information at
>> 
>> http://marc.info/?l=linux-scsi=128716779923700=2
>> <http://marc.info/?l=linux-scsi&m=128716779923700&w=2>
> 
> Sure, but for the new fast unlocked_qcmds=1 operation in
> qla2xxx_queuecommand(), the host_lock access needs to be complete
> removed from SHT->queuecommand().   The above patch just moves the
> vha->host->host_lock unlock up in queuecommand(),  right..?
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index b0c7139..77203b0 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -545,6 +545,7 @@ qla2xxx_queuecommand(struct scsi_cmnd *cmd, void
> (*done)(struct scsi_cmnd *))
>         srb_t *sp;
>         int rval;
> 
> +       spin_unlock_irq(vha->host->host_lock);
>         if (ha->flags.eeh_busy) {
>                 if (ha->flags.pci_channel_io_perm_failure)
>                         cmd->result = DID_NO_CONNECT << 16;
> 
> <SNIP>
> 
> @@ -603,9 +599,11 @@ qc24_host_busy_lock:
>         return SCSI_MLQUEUE_HOST_BUSY;
> 
>  qc24_target_busy:
> +       spin_lock_irq(vha->host->host_lock);
>         return SCSI_MLQUEUE_TARGET_BUSY;
> 
>  qc24_fail_command:
> +       spin_lock_irq(vha->host->host_lock);
>         done(cmd);
> 
>         return 0;
> 
>> http://marc.info/?l=linux-scsi=128716779623683=2
>> <http://marc.info/?l=linux-scsi&m=128716779623683&w=2>
>> 
> 
> <nod> I had been only updating LLDs that actually used ->serial_number
> beyond a simple informational purposes for error recovery.  Thanks for
> removing this one preemptively!  8-)
> 
> Best,
> 
> --nab
> 

Hi Nicholas,

Yes, I understand. I was thinking that you are going to submit the patches
for all LLD with your final submission.

I will submit the patch which removes host_lock in queuecommand routine
completely then. 

-- Giri

> 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37
       [not found] <C8E4BD3D.A1E3%giridhar.malavali@qlogic.com>
@ 2010-10-20 23:19 ` Nicholas A. Bellinger
  2010-10-21  0:30   ` Giridhar Malavali
  0 siblings, 1 reply; 36+ messages in thread
From: Nicholas A. Bellinger @ 2010-10-20 23:19 UTC (permalink / raw)
  To: Giridhar Malavali
  Cc: linux-kernel, LinuxSCSI, James Bottomley, Mike Christie, Andrew Vasquez

On Wed, 2010-10-20 at 15:37 -0700, Giridhar Malavali wrote:
> 
> 

<Trimming long CC'list>

Hi Giri,

> On 10/20/10 1:49 PM, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:
> 
> > Greetings all,
> > 
> > So as we get closer to the .37 merge window, I wanted to take this
> > oppourtunity to recap the current status of the drop-host_lock /
> > unlocked_qcmds=1 patches, and what is required for the next RFCv5 and
> > hopefully a merge into .37.  The last RFCv4 was posted here:
> > 
> > http://marc.info/?l=linux-kernel=128563953114561=2
> > <http://marc.info/?l=linux-kernel&m=128563953114561&w=2>
> > 
> > Since then, Christof Schmitt has sent a patch to drop struct
> > scsi_cmnd->serial_number usage in zfcp, and Tim Chen has sent an
> > important fix to drop an extra host_lock access that I originally missed
> > in qla2xxx SHT->queuecommand() that certainly would have deadlocked a
> > running machine.   Many thanks to Christof and Tim for your
> > contributions and review!
> > 
> > So at this point in the game the current score sits at:
> > 
> > *) core drivers/scsi remaining issue(s):
> > 
> > The issue raised by andmike during RFCv4 described as:
> > 
> > "If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE is set it
> > would be correct for the scsi_decide_disposition cases but it would
> > appear this would stop __scsi_try_to_abort_cmd from being called in the
> > time out case as REQ_ATOM_COMPLETE is set prior to calling
> > blk_rq_timed_out."
> > 
> > The complete discussion is here:
> > 
> > http://marc.info/?l=linux-scsi=128535319915212=2
> > <http://marc.info/?l=linux-scsi&m=128535319915212&w=2>
> > 
> > We still need folks with experience to dig into this code, so you know
> > the scsi_error.c code please jump in!
> > 
> > *) LLD libraries running by default w/ unlocked_qcmds=1
> > 
> > libiscsi: need ack from mnc
> > libsas: need ack from jejb
> > libfc: remaining rport state + host_lock less issue.  Need more input
> >        from mnc for James Smart and Joe on this...
> > libata: jgarzik thinks this should be OK, review and ack from tejun
> >         would also be very helpful.
> > 
> > The main issue remaining here is the audit of libfc rport (and other..?)
> > code that assumes host_lock is held to protect state.  mnc, do you have
> > any more thoughts for James Smart and Joe here..?
> > 
> > *) Individual LLDs running by default w/ unlocked_qcmds=1
> > 
> > aic94xx: need ack maintainer at adaptec..?)
> > mvsas: need ack maintainer at marvell..?)
> > pm8001: need ack Jang Wang
> > qla4xxx, qla2xxx: need ack Andrew Vasquez
> > fnic:  need ack Joe Eykholt
> 
> The qla2xxx driver is modified not to depend on the host_lock and also to
> drop usage of scsi_cmnd->serial_number. Both the patches are submitted to
> linux-scsi and you can find more information at
> 
> http://marc.info/?l=linux-scsi&m=128716779923700&w=2

Sure, but for the new fast unlocked_qcmds=1 operation in
qla2xxx_queuecommand(), the host_lock access needs to be complete
removed from SHT->queuecommand().   The above patch just moves the
vha->host->host_lock unlock up in queuecommand(),  right..?

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index b0c7139..77203b0 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -545,6 +545,7 @@ qla2xxx_queuecommand(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *))
 	srb_t *sp;
 	int rval;
 
+	spin_unlock_irq(vha->host->host_lock);
 	if (ha->flags.eeh_busy) {
 		if (ha->flags.pci_channel_io_perm_failure)
 			cmd->result = DID_NO_CONNECT << 16;

<SNIP>

@@ -603,9 +599,11 @@ qc24_host_busy_lock:
 	return SCSI_MLQUEUE_HOST_BUSY;
 
 qc24_target_busy:
+	spin_lock_irq(vha->host->host_lock);
 	return SCSI_MLQUEUE_TARGET_BUSY;
 
 qc24_fail_command:
+	spin_lock_irq(vha->host->host_lock);
 	done(cmd);
 
 	return 0;

> http://marc.info/?l=linux-scsi&m=128716779623683&w=2
> 

<nod> I had been only updating LLDs that actually used ->serial_number
beyond a simple informational purposes for error recovery.  Thanks for
removing this one preemptively!  8-)

Best,

--nab



^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2010-11-01 10:42 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-20 20:49 [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37 Nicholas A. Bellinger
2010-10-20 20:49 ` Nicholas A. Bellinger
2010-10-21 19:26 ` Luben Tuikov
2010-10-21 19:26   ` Luben Tuikov
     [not found] ` <20101021150840.GA24309@linux.vnet.ibm.com>
2010-10-26 22:08   ` Nicholas A. Bellinger
2010-10-26 22:27     ` James Bottomley
2010-10-26 22:34       ` Nicholas A. Bellinger
2010-10-26 22:50         ` James Bottomley
2010-10-26 23:00           ` Nicholas A. Bellinger
2010-10-26 23:11             ` James Bottomley
2010-10-26 23:31               ` Nicholas A. Bellinger
2010-10-27  7:53                 ` Andi Kleen
2010-10-27 14:27                   ` James Bottomley
2010-10-27 18:06                     ` Nicholas A. Bellinger
2010-10-27 18:16                       ` James Bottomley
2010-10-27 19:20                       ` Mike Anderson
2010-10-27 19:55                         ` Nicholas A. Bellinger
2010-10-27 23:28                       ` Jeff Garzik
2010-10-28  9:10                     ` Andi Kleen
2010-10-28 11:18                       ` Boaz Harrosh
2010-10-28 11:18                         ` Boaz Harrosh
2010-10-28 18:26                         ` Andi Kleen
2010-10-28 18:26                           ` Andi Kleen
2010-10-31 12:14                           ` Boaz Harrosh
2010-10-31 12:14                             ` Boaz Harrosh
2010-11-01 11:45                             ` Andi Kleen
2010-11-01 11:45                               ` Andi Kleen
2010-10-28 20:27                         ` Nicholas A. Bellinger
2010-10-29  7:50                           ` Andi Kleen
2010-10-29 23:50                             ` Nicholas A. Bellinger
2010-10-29 23:50                               ` Nicholas A. Bellinger
2010-10-29 23:57                             ` Nicholas A. Bellinger
2010-10-27 18:03                   ` Nicholas A. Bellinger
     [not found] <C8E4BD3D.A1E3%giridhar.malavali@qlogic.com>
2010-10-20 23:19 ` Nicholas A. Bellinger
2010-10-21  0:30   ` Giridhar Malavali
2010-10-21  0:39     ` Nicholas A. Bellinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.