linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] firewire-sbp2: Initialise sbp2_orb->rcode for management ORBs
@ 2012-02-18 21:35 Chris Boot
  2012-03-04 12:48 ` Stefan Richter
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Boot @ 2012-02-18 21:35 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, Chris Boot, Stefan Richter

When sending ORBs the struct sbp2_orb->rcode field should be initialised
to -1 otherwise complete_transaction() assumes the request is successful
(RCODE_COMPLETE is 0). When sending managament ORBs, such as LOGIN or
LOGOUT, this was not done and so the initiator would wait for the
request to time out before trying again.

Without this, LOGINs are only retried when the management ORB times out,
rather than the initiator noticing an error occurred and retrying soon
after. For targets that advertise more than one LUN per unit, and can
only accept one management request at a time, this means LUNs are only
logged in one per timeout period.

Signed-off-by: Chris Boot <bootc@bootc.net>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
 drivers/firewire/sbp2.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index b12c6ba..7776c18 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -572,6 +572,9 @@ static int sbp2_send_management_orb(struct sbp2_logical_unit *lu, int node_id,
 	if (dma_mapping_error(device->card->device, orb->response_bus))
 		goto fail_mapping_response;
 
+	/* Initialize rcode to something not RCODE_COMPLETE. */
+	orb->base.rcode = -1;
+
 	orb->request.response.high = 0;
 	orb->request.response.low  = cpu_to_be32(orb->response_bus);
 
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] firewire-sbp2: Initialise sbp2_orb->rcode for management ORBs
  2012-02-18 21:35 [PATCH] firewire-sbp2: Initialise sbp2_orb->rcode for management ORBs Chris Boot
@ 2012-03-04 12:48 ` Stefan Richter
  2012-05-19 10:29   ` Stefan Richter
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Richter @ 2012-03-04 12:48 UTC (permalink / raw)
  To: Chris Boot; +Cc: linux1394-devel, linux-kernel

On Feb 18 Chris Boot wrote:
> When sending ORBs the struct sbp2_orb->rcode field should be initialised
> to -1 otherwise complete_transaction() assumes the request is successful
> (RCODE_COMPLETE is 0). When sending managament ORBs, such as LOGIN or
> LOGOUT, this was not done and so the initiator would wait for the
> request to time out before trying again.
> 
> Without this, LOGINs are only retried when the management ORB times out,
> rather than the initiator noticing an error occurred and retrying soon
> after. For targets that advertise more than one LUN per unit, and can
> only accept one management request at a time, this means LUNs are only
> logged in one per timeout period.
> 
> Signed-off-by: Chris Boot <bootc@bootc.net>
> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
> ---
>  drivers/firewire/sbp2.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
> index b12c6ba..7776c18 100644
> --- a/drivers/firewire/sbp2.c
> +++ b/drivers/firewire/sbp2.c
> @@ -572,6 +572,9 @@ static int sbp2_send_management_orb(struct sbp2_logical_unit *lu, int node_id,
>  	if (dma_mapping_error(device->card->device, orb->response_bus))
>  		goto fail_mapping_response;
>  
> +	/* Initialize rcode to something not RCODE_COMPLETE. */
> +	orb->base.rcode = -1;
> +
>  	orb->request.response.high = 0;
>  	orb->request.response.low  = cpu_to_be32(orb->response_bus);
>  

I left this hanging in my inbox for too long, sorry...

While I agree that the current initialization of orb->base.rcode with 0 is
wrong, I don't think your change alone is sufficient:

Consider the case that a login request to LU 0 causes the target to pull
out the hardware behind that LU out of a powered-down state --- which may
take a very long time --- and login requests to LU 1 would be aborted by
the target with resp_conflict_error on any Management_Agent write
request.  Of course a reasonably clever target would accept login before
full power-up, but you never now.

We retry login 5 times in 0.2 seconds intervals, and this 1 s in total may
not be enough.

--------
And a few notes to myself, to be done on top of the above:

a) Turn the magic value -1 into a defined constant.  Use that constant
in the two ORB initializers and in the SBP-2 status write handler.

b) The reconnect failure handling seems a bit simplistic.  I changed it
from Kristian's original retry loop to a shortcut to re-login in commit
ce896d95cc7886ae05859c5b409a7b2f3b606ec1.  While re-login instead of
reconnect during management-agent-busy situations works too, retrying
the reconnect at least during reconnect-hold period may be more robust in
such situations.

So the generation check should be replaced by checks for 1394 transaction
completion with RCODE_CONFLICT_ERROR or RCODE_BUSY, and maybe for some
other types of 1394 transaction failure or SBP-2 transaction failure.

c) Perhaps retry logout in sbp2_remove() if we clearly detect a
management-agent-busy situation.

d) Reduce log spam of the sort of "management write failed" or "failed to
reconnect" in situations where we know that we deal with such failures
correctly.
-- 
Stefan Richter
-=====-===-- --== --=--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] firewire-sbp2: Initialise sbp2_orb->rcode for management ORBs
  2012-03-04 12:48 ` Stefan Richter
@ 2012-05-19 10:29   ` Stefan Richter
  0 siblings, 0 replies; 3+ messages in thread
From: Stefan Richter @ 2012-05-19 10:29 UTC (permalink / raw)
  To: Chris Boot; +Cc: linux1394-devel, linux-kernel

On Mar 04 Stefan Richter wrote:
> On Feb 18 Chris Boot wrote:
> > When sending ORBs the struct sbp2_orb->rcode field should be initialised
> > to -1 otherwise complete_transaction() assumes the request is successful
> > (RCODE_COMPLETE is 0). When sending managament ORBs, such as LOGIN or
> > LOGOUT, this was not done and so the initiator would wait for the
> > request to time out before trying again.
> > 
> > Without this, LOGINs are only retried when the management ORB times out,
> > rather than the initiator noticing an error occurred and retrying soon
> > after. For targets that advertise more than one LUN per unit, and can
> > only accept one management request at a time, this means LUNs are only
> > logged in one per timeout period.
[...]
> I left this hanging in my inbox for too long, sorry...
> 
> While I agree that the current initialization of orb->base.rcode with 0 is
> wrong, I don't think your change alone is sufficient:
> 
> Consider the case that a login request to LU 0 causes the target to pull
> out the hardware behind that LU out of a powered-down state --- which may
> take a very long time --- and login requests to LU 1 would be aborted by
> the target with resp_conflict_error on any Management_Agent write
> request.  Of course a reasonably clever target would accept login before
> full power-up, but you never now.
> 
> We retry login 5 times in 0.2 seconds intervals, and this 1 s in total may
> not be enough.
[...]

Chris, I obviously haven't done anything about this potentially too short
retry period yet; it is still on my list.

Perhaps we should not count the number of retries but watch the time that
retries take.  I.e. accumulate the time that each try takes; break out of
the retry loop after a maximum time; but reset the accumulated time at a
bus reset as a precaution for buses with many nodes coming online at
different times.
-- 
Stefan Richter
-=====-===-- -=-= =--==
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-19 10:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-18 21:35 [PATCH] firewire-sbp2: Initialise sbp2_orb->rcode for management ORBs Chris Boot
2012-03-04 12:48 ` Stefan Richter
2012-05-19 10:29   ` Stefan Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).