All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] scsi-generic and BLKSECTGET
@ 2017-01-16 21:11 ` Eric Farman
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Farman @ 2017-01-16 21:11 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, pbonzini, Eric Farman, linux-scsi, mreitz

(cc'ing linux-scsi for the cover-letter; patches only to QEMU lists.)

In the Linux kernel, I see two (three) places where the BLKSECTGET ioctl is
handled:

(1) block/(compat_)ioctl.c -- (compat_)blkdev_ioctl
(2) drivers/scsi/sg.c -- sg_ioctl

The former has been around forever[1], and returns a short value measured in
sectors.  A sector is generally assumed to be 512 bytes.

The latter has been around for slightly less than forever[2], and returns an
int that measures the value in bytes.  A change to return the block count
was brought up a few years ago[3] and nacked.

As a convenient example, if I use the blockdev tool to drive the ioctl to a
SCSI disk and its scsi-generic equivalent, I get different results:

  # lsscsi -g
  [0:0:8:1077166114]disk    IBM      2107900          .217  /dev/sda /dev/sg0
  # blockdev --getmaxsect /dev/sda
  2560
  # blockdev --getmaxsect /dev/sg0
  20

Now, the value for /dev/sda looks "correct" to me.

  # cd /sys/devices/css0/0.0.0125/0.0.1f69/host0/rport-0\:0-8/
  # cd target0\:0\:8/0\:0\:8\:1077166114/
  # cat block/sda/queue/max_sectors_kb
  1280
  # cat block/sda/queue/hw_sector_size
  512

And the math checks out:

  max_sectors_kb * 1024 / hw_sector_size == getmaxsect
  -OR-
  1280 * 1024 / 512 = 2560

For /dev/sg0, it appears the answer is coming from the sg_ioctl result
which is already multiplied by the block size, and then looking at only the
upper half (short) of the returned big-endian fullword:

  (1280 * 1024 / 512) * 512 = 1310720 = x00140000 => x0014 = 20

The reason for all this?  Well, QEMU recently added a BLKSECTGET ioctl
call[4] which we see during guest boot.  This code presumes the value is in
blocks/sectors, and converts it to bytes[5].  Not that this matters, because
the short/int discrepancy gives us "zero" on s390x.

Also, that code doesn't execute for scsi-generic devices, so the conversion
to bytes is correct, but I'd like to extend this code to interrogate
scsi-generic devices as well.  This is important because libvirt converts
a specified virtio-scsi device to its /dev/sgX address for the QEMU
commandline.

So, do I have to code around the different field sizes (int vs short) as
well as scaling (bytes vs blocks)?  Obviously doable, but looking at the
resulting commits, I find myself feeling a little ill.

[1] The initial kernel git commit
[2] kernel commit 44ec95425c1d9dce6e4638c29e4362cfb44814e7
[3] https://lkml.org/lkml/2012/6/27/78
[4] qemu commit 6f6071745bd0366221f5a0160ed7d18d0e38b9f7
[5] qemu commit 5def6b80e1eca696c1fc6099e7f4d36729686402

Eric Farman (3):
  hw/scsi: Fix debug message of cdb structure in scsi-generic
  block: Fix target variable of BLKSECTGET ioctl
  block: get max_transfer limit for char (scsi-generic) devices

 block/file-posix.c     | 16 +++++++++++++---
 hw/scsi/scsi-generic.c |  5 +++--
 2 files changed, 16 insertions(+), 5 deletions(-)

-- 
2.8.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [RFC PATCH 0/3] scsi-generic and BLKSECTGET
@ 2017-01-16 21:11 ` Eric Farman
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Farman @ 2017-01-16 21:11 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, mreitz, pbonzini, Eric Farman, linux-scsi

(cc'ing linux-scsi for the cover-letter; patches only to QEMU lists.)

In the Linux kernel, I see two (three) places where the BLKSECTGET ioctl is
handled:

(1) block/(compat_)ioctl.c -- (compat_)blkdev_ioctl
(2) drivers/scsi/sg.c -- sg_ioctl

The former has been around forever[1], and returns a short value measured in
sectors.  A sector is generally assumed to be 512 bytes.

The latter has been around for slightly less than forever[2], and returns an
int that measures the value in bytes.  A change to return the block count
was brought up a few years ago[3] and nacked.

As a convenient example, if I use the blockdev tool to drive the ioctl to a
SCSI disk and its scsi-generic equivalent, I get different results:

  # lsscsi -g
  [0:0:8:1077166114]disk    IBM      2107900          .217  /dev/sda /dev/sg0
  # blockdev --getmaxsect /dev/sda
  2560
  # blockdev --getmaxsect /dev/sg0
  20

Now, the value for /dev/sda looks "correct" to me.

  # cd /sys/devices/css0/0.0.0125/0.0.1f69/host0/rport-0\:0-8/
  # cd target0\:0\:8/0\:0\:8\:1077166114/
  # cat block/sda/queue/max_sectors_kb
  1280
  # cat block/sda/queue/hw_sector_size
  512

And the math checks out:

  max_sectors_kb * 1024 / hw_sector_size == getmaxsect
  -OR-
  1280 * 1024 / 512 = 2560

For /dev/sg0, it appears the answer is coming from the sg_ioctl result
which is already multiplied by the block size, and then looking at only the
upper half (short) of the returned big-endian fullword:

  (1280 * 1024 / 512) * 512 = 1310720 = x00140000 => x0014 = 20

The reason for all this?  Well, QEMU recently added a BLKSECTGET ioctl
call[4] which we see during guest boot.  This code presumes the value is in
blocks/sectors, and converts it to bytes[5].  Not that this matters, because
the short/int discrepancy gives us "zero" on s390x.

Also, that code doesn't execute for scsi-generic devices, so the conversion
to bytes is correct, but I'd like to extend this code to interrogate
scsi-generic devices as well.  This is important because libvirt converts
a specified virtio-scsi device to its /dev/sgX address for the QEMU
commandline.

So, do I have to code around the different field sizes (int vs short) as
well as scaling (bytes vs blocks)?  Obviously doable, but looking at the
resulting commits, I find myself feeling a little ill.

[1] The initial kernel git commit
[2] kernel commit 44ec95425c1d9dce6e4638c29e4362cfb44814e7
[3] https://lkml.org/lkml/2012/6/27/78
[4] qemu commit 6f6071745bd0366221f5a0160ed7d18d0e38b9f7
[5] qemu commit 5def6b80e1eca696c1fc6099e7f4d36729686402

Eric Farman (3):
  hw/scsi: Fix debug message of cdb structure in scsi-generic
  block: Fix target variable of BLKSECTGET ioctl
  block: get max_transfer limit for char (scsi-generic) devices

 block/file-posix.c     | 16 +++++++++++++---
 hw/scsi/scsi-generic.c |  5 +++--
 2 files changed, 16 insertions(+), 5 deletions(-)

-- 
2.8.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 1/3] hw/scsi: Fix debug message of cdb structure in scsi-generic
  2017-01-16 21:11 ` [Qemu-devel] " Eric Farman
  (?)
@ 2017-01-16 21:11 ` Eric Farman
  -1 siblings, 0 replies; 9+ messages in thread
From: Eric Farman @ 2017-01-16 21:11 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, mreitz, pbonzini, Eric Farman

When running with debug enabled, the scsi-generic cdb that is
dumped skips byte 0 of the command, which is the opcode.  This
makes identifying which command is being issued/completed a
little difficult.  Example:

  0x00 0x00 0x01 0x00 0x00
  scsi-generic: scsi_read_data 0x0
  scsi-generic: Data ready tag=0x0 len=164
  scsi-generic: scsi_read_data 0x0
  scsi-generic: Command complete 0x0x10a42c60 tag=0x0 status=0

Improve this by adding a message prior to the loop, similar to
what exists for scsi-disk.  Clean up a few other messages to be
more explicit of what is being represented.  Example:

  scsi-generic: Command: data=0x12 0x00 0x00 0x01 0x00 0x00
  scsi-generic: scsi_read_data tag=0x0
  scsi-generic: Data ready tag=0x0 len=164
  scsi-generic: scsi_read_data tag=0x0
  scsi-generic: Command complete 0x0x10a452d0 tag=0x0 status=0

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
---
 hw/scsi/scsi-generic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 7a588a7..92f091a 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -246,7 +246,7 @@ static void scsi_read_data(SCSIRequest *req)
     SCSIDevice *s = r->req.dev;
     int ret;
 
-    DPRINTF("scsi_read_data 0x%x\n", req->tag);
+    DPRINTF("scsi_read_data tag=0x%x\n", req->tag);
 
     /* The request is used as the AIO opaque value, so add a ref.  */
     scsi_req_ref(&r->req);
@@ -294,7 +294,7 @@ static void scsi_write_data(SCSIRequest *req)
     SCSIDevice *s = r->req.dev;
     int ret;
 
-    DPRINTF("scsi_write_data 0x%x\n", req->tag);
+    DPRINTF("scsi_write_data tag=0x%x\n", req->tag);
     if (r->len == 0) {
         r->len = r->buflen;
         scsi_req_data(&r->req, r->len);
@@ -329,6 +329,7 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *cmd)
     int ret;
 
 #ifdef DEBUG_SCSI
+    DPRINTF("Command: data=0x%02x", cmd[0]);
     {
         int i;
         for (i = 1; i < r->req.cmd.len; i++) {
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 2/3] block: Fix target variable of BLKSECTGET ioctl
  2017-01-16 21:11 ` [Qemu-devel] " Eric Farman
  (?)
  (?)
@ 2017-01-16 21:12 ` Eric Farman
  -1 siblings, 0 replies; 9+ messages in thread
From: Eric Farman @ 2017-01-16 21:12 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, mreitz, pbonzini, Eric Farman

Commit 6f607174 introduced a routine to call the kernel BLKSECTGET
ioctl, which stores the result back to user space.  However, the
size of the data returned depends on the routine handling the ioctl.
The (compat_)blkdev_ioctl returns a short, while sg_ioctl returns
an int.  Thus, on big-endian systems, we can find ourselves
accidentally shifting the result to a much larger value.
(On s390x, a short is 16 bits while an int is 32 bits.)

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
---
 block/file-posix.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 28b47d9..2115155 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -651,12 +651,15 @@ static void raw_reopen_abort(BDRVReopenState *state)
     state->opaque = NULL;
 }
 
-static int hdev_get_max_transfer_length(int fd)
+static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
 {
 #ifdef BLKSECTGET
     int max_sectors = 0;
-    if (ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
+    short max_sectors_short = 0;
+    if (bs->sg && ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
         return max_sectors;
+    } else if (!bs->sg && ioctl(fd, BLKSECTGET, &max_sectors_short) == 0) {
+        return max_sectors_short;
     } else {
         return -errno;
     }
@@ -672,7 +675,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 
     if (!fstat(s->fd, &st)) {
         if (S_ISBLK(st.st_mode)) {
-            int ret = hdev_get_max_transfer_length(s->fd);
+            int ret = hdev_get_max_transfer_length(bs, s->fd);
             if (ret > 0 && ret <= BDRV_REQUEST_MAX_SECTORS) {
                 bs->bl.max_transfer = pow2floor(ret << BDRV_SECTOR_BITS);
             }
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 3/3] block: get max_transfer limit for char (scsi-generic) devices
  2017-01-16 21:11 ` [Qemu-devel] " Eric Farman
                   ` (2 preceding siblings ...)
  (?)
@ 2017-01-16 21:12 ` Eric Farman
  2017-01-17  7:04   ` Fam Zheng
  -1 siblings, 1 reply; 9+ messages in thread
From: Eric Farman @ 2017-01-16 21:12 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, mreitz, pbonzini, Eric Farman

Commit 6f607174 introduced a routine to get the maximum number
of bytes for a single I/O transfer for block devices, however
scsi generic devices are character devices, not block.  Add
a condition for this, with slightly different logic because
the value is already in bytes, and need not be converted from
blocks as happens for block devices.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
---
 block/file-posix.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 2115155..c0843c2 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -679,6 +679,13 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
             if (ret > 0 && ret <= BDRV_REQUEST_MAX_SECTORS) {
                 bs->bl.max_transfer = pow2floor(ret << BDRV_SECTOR_BITS);
             }
+        } else if (S_ISCHR(st.st_mode)) {
+            /* sg returns transfer length in bytes already */
+            int ret = hdev_get_max_transfer_length(bs, s->fd);
+            if (ret > 0 &&
+                (ret >> BDRV_SECTOR_BITS) <= BDRV_REQUEST_MAX_SECTORS) {
+                bs->bl.max_transfer = pow2floor(ret);
+            }
         }
     }
 
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block: get max_transfer limit for char (scsi-generic) devices
  2017-01-16 21:12 ` [Qemu-devel] [PATCH 3/3] block: get max_transfer limit for char (scsi-generic) devices Eric Farman
@ 2017-01-17  7:04   ` Fam Zheng
  2017-01-17 14:49     ` Eric Farman
  0 siblings, 1 reply; 9+ messages in thread
From: Fam Zheng @ 2017-01-17  7:04 UTC (permalink / raw)
  To: Eric Farman; +Cc: qemu-devel, qemu-block, kwolf, pbonzini, mreitz

On Mon, 01/16 22:12, Eric Farman wrote:
> Commit 6f607174 introduced a routine to get the maximum number
> of bytes for a single I/O transfer for block devices, however
> scsi generic devices are character devices, not block.  Add
> a condition for this, with slightly different logic because
> the value is already in bytes, and need not be converted from
> blocks as happens for block devices.
> 
> Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
> ---
>  block/file-posix.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 2115155..c0843c2 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -679,6 +679,13 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>              if (ret > 0 && ret <= BDRV_REQUEST_MAX_SECTORS) {
>                  bs->bl.max_transfer = pow2floor(ret << BDRV_SECTOR_BITS);
>              }
> +        } else if (S_ISCHR(st.st_mode)) {
> +            /* sg returns transfer length in bytes already */
> +            int ret = hdev_get_max_transfer_length(bs, s->fd);
> +            if (ret > 0 &&
> +                (ret >> BDRV_SECTOR_BITS) <= BDRV_REQUEST_MAX_SECTORS) {
> +                bs->bl.max_transfer = pow2floor(ret);
> +            }

Please keep the sectors/bytes quirk in hdev_get_max_transfer_length and always
return bytes from there.

Fam

>          }
>      }
>  
> -- 
> 2.8.4
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/3] scsi-generic and BLKSECTGET
  2017-01-16 21:11 ` [Qemu-devel] " Eric Farman
@ 2017-01-17  7:08   ` Fam Zheng
  -1 siblings, 0 replies; 9+ messages in thread
From: Fam Zheng @ 2017-01-17  7:08 UTC (permalink / raw)
  To: Eric Farman; +Cc: kwolf, linux-scsi, qemu-block, qemu-devel, mreitz, pbonzini

On Mon, 01/16 22:11, Eric Farman wrote:
> (cc'ing linux-scsi for the cover-letter; patches only to QEMU lists.)
> 
> In the Linux kernel, I see two (three) places where the BLKSECTGET ioctl is
> handled:
> 
> (1) block/(compat_)ioctl.c -- (compat_)blkdev_ioctl
> (2) drivers/scsi/sg.c -- sg_ioctl
> 
> The former has been around forever[1], and returns a short value measured in
> sectors.  A sector is generally assumed to be 512 bytes.
> 
> The latter has been around for slightly less than forever[2], and returns an
> int that measures the value in bytes.  A change to return the block count
> was brought up a few years ago[3] and nacked.
> 
> As a convenient example, if I use the blockdev tool to drive the ioctl to a
> SCSI disk and its scsi-generic equivalent, I get different results:

Fun! :-/

The patch looks correct but I don't like how it is written a lot, but still
thanks for bringing it up so we won't be bitten in the future, and your detailed
explanation is much appreciated!

Fam

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/3] scsi-generic and BLKSECTGET
@ 2017-01-17  7:08   ` Fam Zheng
  0 siblings, 0 replies; 9+ messages in thread
From: Fam Zheng @ 2017-01-17  7:08 UTC (permalink / raw)
  To: Eric Farman; +Cc: qemu-devel, qemu-block, kwolf, pbonzini, linux-scsi, mreitz

On Mon, 01/16 22:11, Eric Farman wrote:
> (cc'ing linux-scsi for the cover-letter; patches only to QEMU lists.)
> 
> In the Linux kernel, I see two (three) places where the BLKSECTGET ioctl is
> handled:
> 
> (1) block/(compat_)ioctl.c -- (compat_)blkdev_ioctl
> (2) drivers/scsi/sg.c -- sg_ioctl
> 
> The former has been around forever[1], and returns a short value measured in
> sectors.  A sector is generally assumed to be 512 bytes.
> 
> The latter has been around for slightly less than forever[2], and returns an
> int that measures the value in bytes.  A change to return the block count
> was brought up a few years ago[3] and nacked.
> 
> As a convenient example, if I use the blockdev tool to drive the ioctl to a
> SCSI disk and its scsi-generic equivalent, I get different results:

Fun! :-/

The patch looks correct but I don't like how it is written a lot, but still
thanks for bringing it up so we won't be bitten in the future, and your detailed
explanation is much appreciated!

Fam

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block: get max_transfer limit for char (scsi-generic) devices
  2017-01-17  7:04   ` Fam Zheng
@ 2017-01-17 14:49     ` Eric Farman
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Farman @ 2017-01-17 14:49 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, pbonzini, qemu-devel, qemu-block, mreitz



On 01/17/2017 02:04 AM, Fam Zheng wrote:
> On Mon, 01/16 22:12, Eric Farman wrote:
>> Commit 6f607174 introduced a routine to get the maximum number
>> of bytes for a single I/O transfer for block devices, however
>> scsi generic devices are character devices, not block.  Add
>> a condition for this, with slightly different logic because
>> the value is already in bytes, and need not be converted from
>> blocks as happens for block devices.
>>
>> Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
>> ---
>>  block/file-posix.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index 2115155..c0843c2 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -679,6 +679,13 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>>              if (ret > 0 && ret <= BDRV_REQUEST_MAX_SECTORS) {
>>                  bs->bl.max_transfer = pow2floor(ret << BDRV_SECTOR_BITS);
>>              }
>> +        } else if (S_ISCHR(st.st_mode)) {
>> +            /* sg returns transfer length in bytes already */
>> +            int ret = hdev_get_max_transfer_length(bs, s->fd);
>> +            if (ret > 0 &&
>> +                (ret >> BDRV_SECTOR_BITS) <= BDRV_REQUEST_MAX_SECTORS) {
>> +                bs->bl.max_transfer = pow2floor(ret);
>> +            }
>
> Please keep the sectors/bytes quirk in hdev_get_max_transfer_length and always
> return bytes from there.

That's easy enough.  I'll allow a day or two before sending a v2, in 
case there's other considerations for the rats nest I've wandered into.

Thanks!

  - Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-01-17 14:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-16 21:11 [RFC PATCH 0/3] scsi-generic and BLKSECTGET Eric Farman
2017-01-16 21:11 ` [Qemu-devel] " Eric Farman
2017-01-16 21:11 ` [Qemu-devel] [PATCH 1/3] hw/scsi: Fix debug message of cdb structure in scsi-generic Eric Farman
2017-01-16 21:12 ` [Qemu-devel] [PATCH 2/3] block: Fix target variable of BLKSECTGET ioctl Eric Farman
2017-01-16 21:12 ` [Qemu-devel] [PATCH 3/3] block: get max_transfer limit for char (scsi-generic) devices Eric Farman
2017-01-17  7:04   ` Fam Zheng
2017-01-17 14:49     ` Eric Farman
2017-01-17  7:08 ` [Qemu-devel] [RFC PATCH 0/3] scsi-generic and BLKSECTGET Fam Zheng
2017-01-17  7:08   ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.