From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command Date: Thu, 23 Mar 2017 14:11:34 +0100 Message-ID: <89441754-d40f-08ba-4531-359ef97b75aa@suse.de> References: <1488359720-130871-1-git-send-email-hare@suse.de> <1488359720-130871-5-git-send-email-hare@suse.de> <20170314175611.GC19037@bblock-ThinkPad-W530> <20170316110142.GB11833@bblock-ThinkPad-W530> <6c23c34a-a59b-3c4b-7c43-92b87eabcf61@suse.de> <20170321190545.GB16847@bblock-ThinkPad-W530> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from mx2.suse.de ([195.135.220.15]:47983 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932997AbdCWNLi (ORCPT ); Thu, 23 Mar 2017 09:11:38 -0400 In-Reply-To: <20170321190545.GB16847@bblock-ThinkPad-W530> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Benjamin Block Cc: "Martin K. Petersen" , James Bottomley , Christoph Hellwig , Bart van Assche , linux-scsi@vger.kernel.org On 03/21/2017 08:05 PM, Benjamin Block wrote: > On Thu, Mar 16, 2017 at 12:53:45PM +0100, Hannes Reinecke wrote: >> On 03/16/2017 12:01 PM, Benjamin Block wrote: >>> On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote: >>>> On 03/14/2017 06:56 PM, Benjamin Block wrote: >>>>> Hello Hannes, >>>>> >>>>> On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote: >>>>>> When a command is sent as part of the error handling there >>>>>> is not point whatsoever to start EH escalation when that >>>>>> command fails; we are _already_ in the error handler, >>>>>> and the escalation is about to commence anyway. >>>>>> So just call 'scsi_try_to_abort_cmd()' to abort outstanding >>>>>> commands and let the main EH routine handle the rest. >>>>>> >>>>>> Signed-off-by: Hannes Reinecke >>>>>> Reviewed-by: Johannes Thumshirn >>>>>> Reviewed-by: Bart Van Assche >>>>>> --- >>>>>> drivers/scsi/scsi_error.c | 11 +---------- >>>>>> 1 file changed, 1 insertion(+), 10 deletions(-) >>>>>> >>>>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >>>>>> index e1ca3b8..4613aa1 100644 >>>>>> --- a/drivers/scsi/scsi_error.c >>>>>> +++ b/drivers/scsi/scsi_error.c >>>>>> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, >>>>>> return hostt->eh_abort_handler(scmd); >>>>>> } >>>>>> >>>>>> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd) >>>>>> -{ >>>>>> - if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS) >>>>>> - if (scsi_try_bus_device_reset(scmd) != SUCCESS) >>>>>> - if (scsi_try_target_reset(scmd) != SUCCESS) >>>>>> - if (scsi_try_bus_reset(scmd) != SUCCESS) >>>>>> - scsi_try_host_reset(scmd); >>>>>> -} >>>>>> - >>>>>> /** >>>>>> * scsi_eh_prep_cmnd - Save a scsi command info as part of error recovery >>>>>> * @scmd: SCSI command structure to hijack >>>>>> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, >>>>>> break; >>>>>> } >>>>>> } else if (rtn != FAILED) { >>>>>> - scsi_abort_eh_cmnd(scmd); >>>>>> + scsi_try_to_abort_cmd(shost->hostt, scmd); >>>>>> rtn = FAILED; >>>>>> } >>>>> >>>>> The idea is sound, but this implementation would cause "use-after-free"s. >>>>> >>>>> I only know our own LLD well enough to judge, but with zFCP there will >>>>> always be a chance that an abort fails - be it memory pressure, >>>>> hardware/firmware behavior or internal EH in zFCP. >>>>> >>>>> Calling queuecommand() will mean for us in the LLD, that we allocate a >>>>> unique internal request struct for the scsi_cmnd (struct >>>>> zfcp_fsf_request) and add that to our internal hash-table with >>>>> outstanding commands. We assume this scsi_cmnd-pointer is ours till we >>>>> complete it via scsi_done are yield it via successful EH-actions. >>>>> >>>>> In case the abort fails, you fail to take back the ownership over the >>>>> scsi command. Which in turn means possible "use-after-free"s when we >>>>> still thinks the scsi command is ours, but EH has already overwritten >>>>> the scsi-command with the original one. When we still get an answer or >>>>> otherwise use the scsi_cmnd-pointer we would access an invalid one. >>>>> >>>> That is actually not try. >>>> As soon as we're calling 'scsi_try_to_abort_command()' ownership is >>>> assumed to reside in the SCSI midlayer; >>> >>> That can not be true. First of all, look at the function itself (v4.10): >>> >>> static int scsi_try_to_abort_cmd... >>> { >>> if (!hostt->eh_abort_handler) >>> return FAILED; >>> >>> return hostt->eh_abort_handler(scmd); >>> } >>> >>> If what you say is true, then this whole API of LLDs providing or >>> choosing not to provide implementations for these function would be >>> fundamentally broken. >>> The function itself returns FAILED when there is no such function.. how >>> is a LLD that does not implement it ever to know that you took ownership >>> by calling scsi_try_to_abort_cmd()? >>> >> Well. Ok. >> >> _Actually_, the whole concept of 'ownership' in SCSI EH is a bit flaky. >> >> There are two ways of entering the error handler: >> - SCSI command timeout >> - Failing to evaluate the SCSI command status >> >> For the latter case ownership already _is_ with the SCSI midlayer, as >> the LLDD called 'scsi_done' and with that moved ownership to the midlayer. >> >> The interesting part is command timeout. >> Once a command timeout triggers the block layer is calling >> 'blk_mark_rq_complete' to avoid any concurrent completions. >> IE any calls to scsi_done() will be short-circuited with that, >> effectively transferring ownership to SCSI midlayer. >> >> Now the SCSI midlayer has to inform the LLDD that it has taken >> ownership; for that it calls the various eh_XXX callbacks into the LLDD. >> While it's quite clear that SUCCESS signals a transfer of ownership to >> SCSI ML, it's less clear what happens in the case of FAILED. >> Problem here is that the eh_XXX callbacks actually serve a dual purpose; >> one it to signal the transfer of ownership to SCSI ML and the other is >> to actually _do_ some action on that command. >> >> But as FAILED is just _one_ value we have no idea in the midlayer if the >> change of ownership actually took place. >> >> Which leads to the curious effect that _ultimatively_ control still >> resides with the LLDD when host_reset fails, so we actually should >> _never_ release the scsi command once host reset fails. >> >> With scsi_try_to_abort() things are slightly different in the way that >> it's called _without_ SCSI EH being engaged. >> However, as scsi_try_to_abort() is called from the timeout handler >> (which assumes that ownership does now reside with the midlayer) I don't >> see a problem with that. >> > > Yeah, I was aware of these things, but I think we are talking about 2 > different things/implications when we talk about 'ownership' in this > context here. Which makes things seem to be worse than they are. > > But we seem to agree on this: if eh_XXX returns FAILED, ownership > ultimatively stays with the LLD. In which case the midlayer has to > accommodate for the possibility that for example scsi_done is called. > > Anyway, see below. > >> >> Where you are right, in fact, is that we should not return FAILED when >> calling scsi_try_to_abort() when cleaning up EH commands; if the driver >> does not implement this function then no cleanup can be done, so calling >> scsi_try_to_abort() is just us being nice. >> >> And I actually can see a problem with cleaning up EH commands if >> scsi_try_to_abort() returns FAILED; then the LLDD has potential _two_ >> stale references, one for the original command and one for the command >> send from SCSI EH. >> The only way I would imagine this ever worked was by _reusing_ the >> references to the original command, effectively sending the TMF with the >> same credentials the original SCSI command had. >> If a driver (for whatever reason) does _not_ do this things will fall >> apart indeed. >> >> However, this was always a problem with SCSI EH; even with the original >> codepath we would have this problem, so I don't think it's a problem >> with this patchset. >> >> Nevertheless, I'll be adding a fix for eh_try_to_abort() in the context >> of cleaning up EH commands. >> > > I slept over this a night or two and then I remembered that to get to > this point in the first place, at least one eh_XXX callback must have > returned SUCCESS for the command that is reused to send the EH command. > So the original reference to it should be forgotten already, if not, > then that is indeed a LLD bug. > > That makes it much less troubling. But then again, sorry to say, but > that still leaves me with one objection: > > When you ignore the FAIL return for the abort, it opens up the > possibility to have an unrelated EH-Command completion being triggered > from a previous EH-Command reference. > > Lets say we have a command A for LUN 1 behind Port α, command B for LUN > 2 behind Port β; and lets assume both FAIL the abort, so they get to the > stage where EH-Commands might be send. Both will be successfully 'freed' > after a LUN-Reset is issued for both 1 and 2. > > After each LUN-Reset we send a TUR, which is the EH-Command. If A_{TUR} > times out and the abort fails (again), then the LLD might still have a > reference to A_{TUR} and with that also to its overwritten scsi_done > function-pointer. > > When we later send B_{TUR}, A_{TUR} might concurrently complete, call > the scsi_done function-pointer to scsi_eh_done and complete the > host-wide EH-completion that was setup for B_{TUR} (shost->eh_action). > This is possible because shost->eh_action has a value different from > NULL for B_{TUR}. > Hmm. Yes, that's true. But that can be fixed by setting some new eh_eflag in the command to indicate a TUR timeout. > And apart from this clear problem, I also find it still troubling that > we call scsi_eh_restore_cmnd() before being sure that the reference to > the EH-Command was forgotten. Because of that, we all of a sudden > 'restore' a reference to a command that the LLD already has forgotten > before, just via a different 'LLD-Object'. And fields like > scsi_cmnd->result from A become writeable again, although the > 'LLD-Object' should only point to A_{TUR}. And at the same time we > 'forget' references to parts of A_{TUR} that we queued that command > with.. such as cmnd, cmnd_len, data_direction, ... . That is kinda the > original point I complained about. > Yeah, I see your point. > All that was handled by the original code that escalated throughout the > whole process, as ugly as it might be, before restoring the command and > continuing with further possible EH-Commands. And yes, you are right, > host-reset might also fail in the old code, leaving us essentially in > the same mess. But then again, if host-reset fails, SCSI EH is pretty > much lost anyway... > Okay. So let's drop this patch until we've sorted this out. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)