linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
@ 2017-07-18  0:44 Jason L Tibbitts III
  2017-07-18  7:37 ` Johannes Thumshirn
  0 siblings, 1 reply; 15+ messages in thread
From: Jason L Tibbitts III @ 2017-07-18  0:44 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-kernel, jthumshirn, dvyukov, hare, jthumshirn, hch,
	martin.petersen

After updating my tape backup server to 4.12 I found that mtx had issues
controlling the tape library.  Good behavior:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 4...done
Loading media from Storage Element 5 into drive 0...done

Bad behavior:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 46...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1046 Failed

This was seen on a machine running Fedora 25 as well as an Ubuntu
machine.  Relevant tickets:
  https://bugzilla.redhat.com/show_bug.cgi?id=1471302
  http://bugzilla.kernel.org/show_bug.cgi?id=196375
  https://bugs.launchpad.net/bugs/1704512

mtx in all cases is 1.3.12; in the Fedora case that's
mtx-1.3.12-14.fc24.x86_64.  I see this with an Overland Neo T48s library
but the Ubuntu user had a Dell ML6000 and we both have completely
different HBAs and cabling (LSI3008 SAS and qla2462 FC).

I bisected this down to:

commit 28676d869bbb5257b5f14c0c95ad3af3a7019dd5
Author: Johannes Thumshirn <jthumshirn@suse.de>
Date:   Fri Apr 7 09:34:15 2017 +0200

    scsi: sg: check for valid direction before starting the request
    
    Check for a valid direction before starting the request, otherwise we
    risk running into an assertion in the scsi midlayer checking for valid
    requests.
    
    [mkp: fixed typo]
    
    Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
    Link: http://www.spinics.net/lists/linux-scsi/msg104400.html
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Hannes Reinecke <hare@suse.com>
    Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

and confirmed that clean unpatched 4.12 shows the problem, while
reverting just that patch fixes the issue.  Unfortunately I don't know
enough to actually fix this, but I can easily test patches.

 - J<

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-18  0:44 [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control Jason L Tibbitts III
@ 2017-07-18  7:37 ` Johannes Thumshirn
  2017-07-18 16:53   ` Jason L Tibbitts III
  2017-07-18 17:33   ` Jason L Tibbitts III
  0 siblings, 2 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2017-07-18  7:37 UTC (permalink / raw)
  To: Jason L Tibbitts III
  Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen

On Mon, Jul 17, 2017 at 07:44:50PM -0500, Jason L Tibbitts III wrote:
> and confirmed that clean unpatched 4.12 shows the problem, while
> reverting just that patch fixes the issue.  Unfortunately I don't know
> enough to actually fix this, but I can easily test patches.

This is fixed with:
commit 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47
Author: Johannes Thumshirn <jthumshirn@suse.de>
Date:   Fri Jul 7 10:56:38 2017 +0200

    scsi: sg: fix SG_DXFER_FROM_DEV transfers
    
    SG_DXFER_FROM_DEV transfers do not necessarily have a dxferp as we set
    it to NULL for the old sg_io read/write interface, but must have a
    length bigger than 0. This fixes a regression introduced by commit
    28676d869bbb ("scsi: sg: check for valid direction before starting the
    request")
    
    Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
    Fixes: 28676d869bbb ("scsi: sg: check for valid direction before starting the request")
    Reported-by: Chris Clayton <chris2553@googlemail.com>
    Tested-by: Chris Clayton <chris2553@googlemail.com>
    Cc: Douglas Gilbert <dgilbert@interlog.com>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Tested-by: Chris Clayton <chris2553@googlemail.com>
    Acked-by: Douglas Gilbert <dgilbert@interlog.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-18  7:37 ` Johannes Thumshirn
@ 2017-07-18 16:53   ` Jason L Tibbitts III
  2017-07-18 17:33   ` Jason L Tibbitts III
  1 sibling, 0 replies; 15+ messages in thread
From: Jason L Tibbitts III @ 2017-07-18 16:53 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen

>>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes:

JT> This is fixed with: commit 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47

Hmm, well, I just pulled and built mainline, which does appear to
contain that patch (though it wasn't there when I first started
investigating this last week) and the problem is still there.  I'll try
building clean 4.12 and applying just that patch over the top.

 - J<

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-18  7:37 ` Johannes Thumshirn
  2017-07-18 16:53   ` Jason L Tibbitts III
@ 2017-07-18 17:33   ` Jason L Tibbitts III
  2017-07-19  7:03     ` Johannes Thumshirn
  1 sibling, 1 reply; 15+ messages in thread
From: Jason L Tibbitts III @ 2017-07-18 17:33 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen

I have verified that building a clean v4.12 with
68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 cherry picked on top still
shows the problem:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 45...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1045 Failed

Nothing appears to be logged; is there any kind of debugging information
I can collect which might help to track this down?  I'm not particularly
good at this but I am pretty sure that I'm building everything properly
and am actually booting the patched kernel.

 - J<

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-18 17:33   ` Jason L Tibbitts III
@ 2017-07-19  7:03     ` Johannes Thumshirn
  2017-07-19  8:13       ` Jason L Tibbitts III
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Thumshirn @ 2017-07-19  7:03 UTC (permalink / raw)
  To: Jason L Tibbitts III
  Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen

On Tue, Jul 18, 2017 at 12:33:59PM -0500, Jason L Tibbitts III wrote:
> I have verified that building a clean v4.12 with
> 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 cherry picked on top still
> shows the problem:
> 
> [root@backup2 ~]# mtx -f /dev/sg7 next 0
> Unloading drive 0 into Storage Element 45...mtx: Request Sense: Long
> Report=yes
> mtx: Request Sense: Valid Residual=no
> mtx: Request Sense: Error Code=0 (Unknown?!)
> mtx: Request Sense: Sense Key=No Sense
> mtx: Request Sense: FileMark=no
> mtx: Request Sense: EOM=no
> mtx: Request Sense: ILI=no
> mtx: Request Sense: Additional Sense Code = 00
> mtx: Request Sense: Additional Sense Qualifier = 00
> mtx: Request Sense: BPV=no
> mtx: Request Sense: Error in CDB=no
> mtx: Request Sense: SKSV=no
> MOVE MEDIUM from Element Address 1 to 1045 Failed
> 
> Nothing appears to be logged; is there any kind of debugging information
> I can collect which might help to track this down?  I'm not particularly
> good at this but I am pretty sure that I'm building everything properly
> and am actually booting the patched kernel.


Can you please apply this debugging patch, so I can see what's going on.

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 1e82d4128a84..2505fa0b2062 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -753,6 +753,10 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
 
 static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
 {
+
+	pr_info("%s: dxfer_direction: %d, dxfer_len: %d\n",
+		__func__, hp->dxfer_direction, hp->dxfer_len);
+
 	switch (hp->dxfer_direction) {
 	case SG_DXFER_NONE:
 		if (hp->dxferp || hp->dxfer_len > 0)

Thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-19  7:03     ` Johannes Thumshirn
@ 2017-07-19  8:13       ` Jason L Tibbitts III
  2017-07-19  8:36         ` Johannes Thumshirn
  0 siblings, 1 reply; 15+ messages in thread
From: Jason L Tibbitts III @ 2017-07-19  8:13 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen

>>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes:

JT> Can you please apply this debugging patch, so I can see what's going
JT> on.

Sure, no problem.

I generally run "mtx -f /dev/sg7 status" first just to make sure the
library is there; this has always worked as expected.  With the debug
patch applied, this is sent to the console:
[   33.933422] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56
[   33.940526] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136
[   33.982429] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[   34.569986] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[   34.623898] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240

Then running "mtx -f /dev/sg7 next 0" gives this as stdout/err:

Unloading drive 0 into Storage Element 46...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1046 Failed

And this to the console:
[   45.552524] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56
[   45.559626] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136
[   45.603544] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[   46.204614] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[   46.258463] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[   46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0

Would you also want to see the output from that patch applied to a
functioning kernel?

 - J<

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-19  8:13       ` Jason L Tibbitts III
@ 2017-07-19  8:36         ` Johannes Thumshirn
  2017-07-21 16:43           ` Johannes Thumshirn
  2017-07-25 19:09           ` Douglas Gilbert
  0 siblings, 2 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2017-07-19  8:36 UTC (permalink / raw)
  To: Jason L Tibbitts III, Doug Gilbert
  Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen

On Wed, Jul 19, 2017 at 03:13:34AM -0500, Jason L Tibbitts III wrote:
> [   46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0

Ahh now I see the -2 (SG_DXFER_TO_DEV) is the crucial point here. It is 0 in
your case.

This would "fix" it but I'm not generally sure it is _the_ solution:

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 1e82d4128a84..b421ec81d775 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -764,7 +764,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
 		return true;
 	case SG_DXFER_TO_DEV:
 	case SG_DXFER_TO_FROM_DEV:
-		if (!hp->dxferp || hp->dxfer_len == 0)
+		if (!hp->dxferp)
 			return false;
 		return true;
 	case SG_DXFER_UNKNOWN:

Doug, what are the rules for SG_DXFER_TO_{FROM_}DEV? Apparently we can't
be sure len > 0, can we rely on dxferp being present?

Thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-19  8:36         ` Johannes Thumshirn
@ 2017-07-21 16:43           ` Johannes Thumshirn
  2017-07-21 19:23             ` Jason L Tibbitts III
  2017-07-25 19:09           ` Douglas Gilbert
  1 sibling, 1 reply; 15+ messages in thread
From: Johannes Thumshirn @ 2017-07-21 16:43 UTC (permalink / raw)
  To: Jason L Tibbitts III, Doug Gilbert
  Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen, jth

Doug,

On Wed, Jul 19, 2017 at 10:36:54AM +0200, Johannes Thumshirn wrote:
> On Wed, Jul 19, 2017 at 03:13:34AM -0500, Jason L Tibbitts III wrote:
> > [   46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0
> 
> Ahh now I see the -2 (SG_DXFER_TO_DEV) is the crucial point here. It is 0 in
> your case.
> 
> This would "fix" it but I'm not generally sure it is _the_ solution:
> 
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 1e82d4128a84..b421ec81d775 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -764,7 +764,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
>  		return true;
>  	case SG_DXFER_TO_DEV:
>  	case SG_DXFER_TO_FROM_DEV:
> -		if (!hp->dxferp || hp->dxfer_len == 0)
> +		if (!hp->dxferp)
>  			return false;
>  		return true;
>  	case SG_DXFER_UNKNOWN:
> 
> Doug, what are the rules for SG_DXFER_TO_{FROM_}DEV? Apparently we can't
> be sure len > 0, can we rely on dxferp being present?

Any comments on this?

Jason, can you try the above? If it works and Doug doesn't respond, I'm
inclined yo submit this band aid.

Thanks,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-21 16:43           ` Johannes Thumshirn
@ 2017-07-21 19:23             ` Jason L Tibbitts III
  2017-07-25  7:18               ` Johannes Thumshirn
  0 siblings, 1 reply; 15+ messages in thread
From: Jason L Tibbitts III @ 2017-07-21 19:23 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch,
	martin.petersen, jth

>>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes:

JT> Jason, can you try the above? If it works and Doug doesn't respond,
JT> I'm inclined yo submit this band aid.

Unfortunately it doesn't appear to work for me.  Maybe I'm building the
wrong thing, though.  I checked out 4.12, cherry picked
68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 and then applied your one liner
on top of that.  There appears to be no change in behavior:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 47...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1047 Failed

I can also apply the debugging patch and try again if that would give
you more useful information.

 - J<

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-21 19:23             ` Jason L Tibbitts III
@ 2017-07-25  7:18               ` Johannes Thumshirn
  2017-07-25 18:25                 ` Jason L Tibbitts III
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Thumshirn @ 2017-07-25  7:18 UTC (permalink / raw)
  To: Jason L Tibbitts III
  Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch,
	martin.petersen

On Fri, Jul 21, 2017 at 02:23:16PM -0500, Jason L Tibbitts III wrote:
> I can also apply the debugging patch and try again if that would give
> you more useful information.

Yes please (on top of the snippet I've sent you last).

Thanks a lot,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-25  7:18               ` Johannes Thumshirn
@ 2017-07-25 18:25                 ` Jason L Tibbitts III
  2017-07-26  7:39                   ` Johannes Thumshirn
  0 siblings, 1 reply; 15+ messages in thread
From: Jason L Tibbitts III @ 2017-07-25 18:25 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch,
	martin.petersen

>>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes:

JT> Yes please (on top of the snippet I've sent you last).

OK, I'm at 4.12 with 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 cherry
picked, plus the fix patch and the debug patch you've sent previously.
To make sure we're on the same page, I'll include the patch at the end.

Running "mtx -f /dev/sg7 status" gives proper output with this logged to
the console:

[   36.742905] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56
[   36.750036] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136
[   36.791673] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[   37.339790] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[   37.393597] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240

And running "mtx -f /dev/sg7 next 0" gives the following output:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 1...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1001 Failed

And the following is logged to the console:

[  192.732294] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56
[  192.739492] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136
[  192.781507] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[  193.392401] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[  193.448970] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[  193.495130] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0

That's not any different than what I provided before, and honestly I
wouldn't expect it to be.

Is there something else I can log or some debugging switch I can twiddle
to give you any more information?  I can also try to be more available
to try and avoid the timezone-induced day-long cycle time.  I'm
available on IRC (tibbs on freenode and oftc) and can try to stay up
late or get up early or something to try and avoid this time zone
mismatch.

Here's what an strace of the last mtx call says:

open("/dev/sg7", O_RDWR)                = 3
ioctl(3, SG_GET_VERSION_NUM, [30536])   = 0
ioctl(3, SG_SET_TIMEOUT, [30000])       = 0
brk(NULL)                               = 0x55d65f68a000
brk(0x55d65f6ab000)                     = 0x55d65f6ab000
brk(NULL)                               = 0x55d65f6ab000
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=6, cmdp="\x12\x00\x00\x00\x38\x00", mx_sb_len=20, iovec_count=0, dxfer_len=56, timeout=30000, flags=0, dxferp="\x08\x80\x05\x02\x45\x00\x00\x02\x42\x44\x54\x20\x20\x20\x20\x20\x46\x6c\x65\x78\x53\x74\x6f\x72\x20\x49\x49\x20\x20\x20\x20\x20"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=1, info=0}) = 0
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=6, cmdp="\x1a\x08\x1d\x00\x88\x00", mx_sb_len=20, iovec_count=0, dxfer_len=136, timeout=300000, flags=0, dxferp="\x17\x00\x00\x00\x9d\x12\x00\x00\x00\x01\x03\xe9\x00\x30\x00\x65\x00\x00\x00\x01\x00\x01\x00\x00", status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=112, duration=61, info=0}) = 0
ioctl(3, CDROMAUDIOBUFSIZ or SCSI_IOCTL_GET_IDLUN, 0x7ffdbfaee7a0) = 0
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=12, cmdp="\xb8\x32\x03\xe9\x00\x30\x00\x00\x10\x90\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=4240, timeout=300000, flags=0, dxferp="\x03\xe9\x00\x30\x00\x00\x09\xc8\x02\x80\x00\x34\x00\x00\x09\xc0\x03\xe9\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=1728, duration=542, info=0}) = 0
ioctl(3, CDROMAUDIOBUFSIZ or SCSI_IOCTL_GET_IDLUN, 0x7ffdbfaee7a0) = 0
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=12, cmdp="\xb8\x34\x00\x01\x00\x01\x00\x00\x10\x90\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=4240, timeout=300000, flags=0, dxferp="\x00\x01\x00\x01\x00\x00\x00\x3c\x04\x80\x00\x34\x00\x00\x00\x34\x00\x01\x09\x00\x00\x00\x00\x00\x00\x81\x03\xe9\x43\x30\x30\x30"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=4172, duration=47, info=0}) = 0
ioctl(3, CDROMAUDIOBUFSIZ or SCSI_IOCTL_GET_IDLUN, 0x7ffdbfaee7a0) = 0
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=12, cmdp="\xb8\x31\x00\x00\x00\x01\x00\x00\x10\x90\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=4240, timeout=300000, flags=0, dxferp="\x00\x00\x00\x01\x00\x00\x00\x3c\x01\x80\x00\x34\x00\x00\x00\x34\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=4172, duration=37, info=0}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
write(1, "Unloading drive 0 into Storage E"..., 43) = 43
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_TO_DEV, cmd_len=12, cmdp="\xa5\x00\x00\x00\x00\x01\x03\xe9\x00\x00\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=0, timeout=300000, flags=0, dxferp=NULL, status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=0, info=0}) = -1 EINVAL (Invalid argument)
write(2, "mtx: Request Sense: Long Report="..., 36) = 36

And here's what I'm testing on top of 4.12:

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 82c33a6..6870e88 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -753,15 +753,22 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
 
 static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
 {
+
+	pr_info("%s: dxfer_direction: %d, dxfer_len: %d\n",
+		__func__, hp->dxfer_direction, hp->dxfer_len);
+
 	switch (hp->dxfer_direction) {
 	case SG_DXFER_NONE:
 		if (hp->dxferp || hp->dxfer_len > 0)
 			return false;
 		return true;
-	case SG_DXFER_TO_DEV:
 	case SG_DXFER_FROM_DEV:
+		if (hp->dxfer_len < 0)
+			return false;
+		return true;
+	case SG_DXFER_TO_DEV:
 	case SG_DXFER_TO_FROM_DEV:
-		if (!hp->dxferp || hp->dxfer_len == 0)
+		if (!hp->dxferp)
 			return false;
 		return true;
 	case SG_DXFER_UNKNOWN:

 - J<

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-19  8:36         ` Johannes Thumshirn
  2017-07-21 16:43           ` Johannes Thumshirn
@ 2017-07-25 19:09           ` Douglas Gilbert
  2017-07-26  7:22             ` Johannes Thumshirn
  1 sibling, 1 reply; 15+ messages in thread
From: Douglas Gilbert @ 2017-07-25 19:09 UTC (permalink / raw)
  To: Johannes Thumshirn, Jason L Tibbitts III
  Cc: linux-scsi, linux-kernel, dvyukov, hare, hch, martin.petersen

On 2017-07-19 04:36 AM, Johannes Thumshirn wrote:
> On Wed, Jul 19, 2017 at 03:13:34AM -0500, Jason L Tibbitts III wrote:
>> [   46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0
> 
> Ahh now I see the -2 (SG_DXFER_TO_DEV) is the crucial point here. It is 0 in
> your case.
> 
> This would "fix" it but I'm not generally sure it is _the_ solution:
> 
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 1e82d4128a84..b421ec81d775 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -764,7 +764,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
>   		return true;
>   	case SG_DXFER_TO_DEV:
>   	case SG_DXFER_TO_FROM_DEV:
> -		if (!hp->dxferp || hp->dxfer_len == 0)
> +		if (!hp->dxferp)
>   			return false;
>   		return true;
>   	case SG_DXFER_UNKNOWN:
> 
> Doug, what are the rules for SG_DXFER_TO_{FROM_}DEV? Apparently we can't
> be sure len > 0, can we rely on dxferp being present?

_TO_ is toward the device (i.e. WRITE) and what T10 call "data-out".
_FROM_ is from the device (e.g. INQUIRY) and what T10 call "data-in".

_TO_FROM_ is basically _FROM_ (and has nothing to do with bidi).
_TO_FROM_ is very old and is meant to try and detect "short reads"
by prefilling the indirect buffer (by reading from dxferp) before
it is overwritten by the _FROM_ (and then writing to dxferp). It
is from the time when not all LLDs or HBAs provided an indication
of a "short read". Today users have the 'sg_io_hdr_t::resid' for
that purpose. Whether the sg driver in lk 4.12 still does that
I haven't checked.

The only limit that should be placed on dxfer_len is something like
<= 2**28 (256M) in my opinion. Big enough that the kernel would
reject it and small enough to catch negative values placed in an
unsigned.


The overall problem is that sg_is_valid_dxfer() introduced in lk 4.12
is doing more sanity checks than were done before. My policy was
to ignore ("don't care") combinations of dxfer_direction, dxferp and
dxfer_len that were harmless. Anyway those three variables are
incomplete since the SCSI command and the device also dictate the
length of the data-in transfer.

Doug Gilbert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-25 19:09           ` Douglas Gilbert
@ 2017-07-26  7:22             ` Johannes Thumshirn
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2017-07-26  7:22 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: Jason L Tibbitts III, linux-scsi, linux-kernel, dvyukov, hare,
	hch, martin.petersen

On Tue, Jul 25, 2017 at 03:09:00PM -0400, Douglas Gilbert wrote:
> _TO_ is toward the device (i.e. WRITE) and what T10 call "data-out".
> _FROM_ is from the device (e.g. INQUIRY) and what T10 call "data-in".
> 
> _TO_FROM_ is basically _FROM_ (and has nothing to do with bidi).
> _TO_FROM_ is very old and is meant to try and detect "short reads"
> by prefilling the indirect buffer (by reading from dxferp) before
> it is overwritten by the _FROM_ (and then writing to dxferp). It
> is from the time when not all LLDs or HBAs provided an indication
> of a "short read". Today users have the 'sg_io_hdr_t::resid' for
> that purpose. Whether the sg driver in lk 4.12 still does that
> I haven't checked.
> 
> The only limit that should be placed on dxfer_len is something like
> <= 2**28 (256M) in my opinion. Big enough that the kernel would
> reject it and small enough to catch negative values placed in an
> unsigned.

OK. I'll give it a shot, thanks.

> 
> 
> The overall problem is that sg_is_valid_dxfer() introduced in lk 4.12
> is doing more sanity checks than were done before. My policy was
> to ignore ("don't care") combinations of dxfer_direction, dxferp and
> dxfer_len that were harmless. Anyway those three variables are
> incomplete since the SCSI command and the device also dictate the
> length of the data-in transfer.

The problem with these don't care combinations was, that user-space could then
easily crash the kernel. This is the reason I introduced sg_is_valid_dxfer().
It's sole purpuse was to avoid more CVEs, but unfortunately it turned into
quite some regressions.

Thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-25 18:25                 ` Jason L Tibbitts III
@ 2017-07-26  7:39                   ` Johannes Thumshirn
  2017-07-26 20:19                     ` Jason L Tibbitts III
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Thumshirn @ 2017-07-26  7:39 UTC (permalink / raw)
  To: Jason L Tibbitts III
  Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch,
	martin.petersen

On Tue, Jul 25, 2017 at 01:25:07PM -0500, Jason L Tibbitts III wrote:
> >>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes:
> ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_TO_DEV, cmd_len=12, cmdp="\xa5\x00\x00\x00\x00\x01\x03\xe9\x00\x00\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=0, timeout=300000, flags=0, dxferp=NULL, status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=0, info=0}) = -1 EINVAL (Invalid argument)

[...]
 	case SG_DXFER_TO_DEV:
  	case SG_DXFER_TO_FROM_DEV:
 		if (!hp->dxferp)
  			return false;
  		return true;

So here we go, dxfer_direction is SG_DXFER_TO_DEV, dxfer_len is 0 and dxferp
is NULL. We bail out because dxferp is NULL here.

I only looked at sg_write() and not sg_new_write() which is called by the
SG_IO path not the fops->write() path.

It's probably best to just check for dxfer_len <= 2^28 to be valid as Doug
suggested:

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 4fe606b000b4..d7ff71e0c85c 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -751,35 +751,6 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
 	return count;
 }
 
-static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
-{
-	switch (hp->dxfer_direction) {
-	case SG_DXFER_NONE:
-		if (hp->dxferp || hp->dxfer_len > 0)
-			return false;
-		return true;
-	case SG_DXFER_FROM_DEV:
-		/*
-		 * for SG_DXFER_FROM_DEV we always set dxfer_len to > 0. dxferp
-		 * can either be NULL or != NULL so there's no point in checking
-		 * it either. So just return true.
-		 */
-		return true;
-	case SG_DXFER_TO_DEV:
-	case SG_DXFER_TO_FROM_DEV:
-		if (!hp->dxferp || hp->dxfer_len == 0)
-			return false;
-		return true;
-	case SG_DXFER_UNKNOWN:
-		if ((!hp->dxferp && hp->dxfer_len) ||
-		    (hp->dxferp && hp->dxfer_len == 0))
-			return false;
-		return true;
-	default:
-		return false;
-	}
-}
-
 static int
 sg_common_write(Sg_fd * sfp, Sg_request * srp,
 		unsigned char *cmnd, int timeout, int blocking)
@@ -800,7 +771,7 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
 			"sg_common_write:  scsi opcode=0x%02x, cmd_size=%d\n",
 			(int) cmnd[0], (int) hp->cmd_len));
 
-	if (!sg_is_valid_dxfer(hp))
+	if (hp->dxfer_len >= SZ_256M)
 		return -EINVAL;
 
 	k = sg_start_req(srp, cmnd);

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control
  2017-07-26  7:39                   ` Johannes Thumshirn
@ 2017-07-26 20:19                     ` Jason L Tibbitts III
  0 siblings, 0 replies; 15+ messages in thread
From: Jason L Tibbitts III @ 2017-07-26 20:19 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Doug Gilbert, linux-scsi, linux-kernel, dvyukov, hare, hch,
	martin.petersen

>>>>> "JT" == Johannes Thumshirn <jthumshirn@suse.de> writes:

JT> It's probably best to just check for dxfer_len <= 2^28 to be valid
JT> as Doug suggested:

I can verify that patch on top of git head (as of a few hours ago) does
function properly.

It didn't apply directly on top of 4.12 but even I can handle fixing
that up.  The result (just deleting the function and changing the call
to a check for hp->dxfer_len >= SZ_256M) works fine and is at the end.

So thanks.  If this goes in, please CC to stable.

 - J<

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 82c33a6..aa6f1de 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -751,29 +751,6 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
 	return count;
 }
 
-static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
-{
-	switch (hp->dxfer_direction) {
-	case SG_DXFER_NONE:
-		if (hp->dxferp || hp->dxfer_len > 0)
-			return false;
-		return true;
-	case SG_DXFER_TO_DEV:
-	case SG_DXFER_FROM_DEV:
-	case SG_DXFER_TO_FROM_DEV:
-		if (!hp->dxferp || hp->dxfer_len == 0)
-			return false;
-		return true;
-	case SG_DXFER_UNKNOWN:
-		if ((!hp->dxferp && hp->dxfer_len) ||
-		    (hp->dxferp && hp->dxfer_len == 0))
-			return false;
-		return true;
-	default:
-		return false;
-	}
-}
-
 static int
 sg_common_write(Sg_fd * sfp, Sg_request * srp,
 		unsigned char *cmnd, int timeout, int blocking)
@@ -794,7 +771,7 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
 			"sg_common_write:  scsi opcode=0x%02x, cmd_size=%d\n",
 			(int) cmnd[0], (int) hp->cmd_len));
 
-	if (!sg_is_valid_dxfer(hp))
+	if (hp->dxfer_len >= SZ_256M)
 		return -EINVAL;
 
 	k = sg_start_req(srp, cmnd);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-07-26 20:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-18  0:44 [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control Jason L Tibbitts III
2017-07-18  7:37 ` Johannes Thumshirn
2017-07-18 16:53   ` Jason L Tibbitts III
2017-07-18 17:33   ` Jason L Tibbitts III
2017-07-19  7:03     ` Johannes Thumshirn
2017-07-19  8:13       ` Jason L Tibbitts III
2017-07-19  8:36         ` Johannes Thumshirn
2017-07-21 16:43           ` Johannes Thumshirn
2017-07-21 19:23             ` Jason L Tibbitts III
2017-07-25  7:18               ` Johannes Thumshirn
2017-07-25 18:25                 ` Jason L Tibbitts III
2017-07-26  7:39                   ` Johannes Thumshirn
2017-07-26 20:19                     ` Jason L Tibbitts III
2017-07-25 19:09           ` Douglas Gilbert
2017-07-26  7:22             ` Johannes Thumshirn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).