All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] possible privilege escalation via SG_IO ioctl (CVE-2011-4127)
@ 2011-12-22 18:02 Paolo Bonzini
  2011-12-22 18:02 ` [PATCH 1/3] block: add and use scsi_blk_cmd_ioctl Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Paolo Bonzini @ 2011-12-22 18:02 UTC (permalink / raw)
  To: linux-kernel, security, pmatouse, agk, jbottomley, mchristi,
	msnitzer, torvalds

Partition block devices or LVM volumes can be sent SCSI commands via
SG_IO, which are then passed down to the underlying device.  (Yes, that
is it.  I tried hard to build a climax in the cover letter, but couldn't
get anything that satisfied me).

It has been this way forever.  I found a reference from 2004 at
https://lkml.org/lkml/2004/8/12/218 and it is even documented in the
sg_dd man page:

    blk_sgio=1
              when set to 0, block devices (e.g. /dev/sda) are treated
              like normal files (i.e. read(2) and write(2) are used for
              IO). When set to 1, block devices are assumed to accept the
              SG_IO ioctl and  SCSI commands are issued for IO. [...]
              If the input or output device is a block device partition
              (e.g. /dev/sda3) then setting this option causes the
              partition information to be ignored (since access is
              directly to the underlying device).

This is quite nasty, because "safe" SCSI commands, including READ
or WRITE, can be sent to the disk without any particular capability.  All
that is required is having a file descriptor for the block device, and
permission (e.g. from SELinux) to send a ioctl.  However, when a user lets
a program access /dev/sda2, it still should not be able to read/write
outside the boundaries of /dev/sda2.

This is also entirely independent of capabilities.  Continuing the
previous example, if the same user gives CAP_SYS_RAWIO to the program and
write access to /dev/sdb, the program should be able to send arbitrary
SCSI commands to /dev/sdb, but still should not be able to access /dev/sda
outside the boundaries of /dev/sda2.

These rules are quite obvious once you consider a real attack that can be
performed using SG_IO.  The attack lets a virtual machine, whose disk is
hosted on a partition, read and write arbitrary data from the host's disk.
To work, the guest needs to be able to send SG_IO ioctls to its devices.
As far as I know it affects virtual machine monitors that use virtio-blk,
and also OpenVZ.  It does not affect Xen, which does not support SG_IO
on its paravirtualized block devices.

In the virtio case, virtio-blk supports a limited form of SCSI passthrough
via the SG_IO ioctl; the virtio-blk device model on the host (e.g. qemu)
relies on the kernel to filter commands, so it will always forward the ioctl
to the block device and pass back the results to the VM.  When the block
device is a partition, the VM can read as well as clobber blocks outside
its assigned volume.  Note that SELinux cannot do anything to stop the
attack, because qemu _is_ supposed to send ioctls to disks.

With OpenVZ, a container would also be able to read and write outside
the partition limits with SG_IO ioctls.  Passing block devices is not
too common with OpenVZ but nevertheless possible.

In the virtio case the vulnerability can be mitigated by disabling SCSI
passthrough for the virtio-blk device; however, the root cause is in
the kernel and needs to be fixed there.

The patches implement a simple global whitelist for both partitions
and partial disk mappings.  While it is also possible to introduce a more
flexible per-device whitelist mechanism, in our testing the patches turned
out to be surprisingly tricky to get right.  Hence, I prefer to send a
version that more closely matches what we applied to the RHEL kernel.
Refactoring can then be done outside security@kernel.org.

Patch 1 refactors the code to prepare for introduction of the whitelist,
while patch 2 actually implements it for the SCSI ioctls.  drivers/ide/
has several ioctls that should only be restricted to the full block
device (for example HDIO_SET_*, HDIO_DRIVE_CMD, HDIO_DRIVE_TASK,
HDIO_DRIVE_RESET).  However, all of them require either CAP_SYS_ADMIN
or CAP_SYS_RAWIO, so they have a much smaller security impact.

Logical volumes are also affected if they have only one target, and this
target can pass ioctls to the underlying block device.  Patch 3 thus adds
the whitelist to logical volumes as well.

Paolo

Paolo Bonzini (3):
  block: add and use scsi_blk_cmd_ioctl
  block: fail SCSI passthrough ioctls on partition devices
  dm: do not forward ioctls from logical volumes to the underlying device

 block/scsi_ioctl.c             |   41 ++++++++++++++++++++++++++++++++++++++++
 drivers/block/cciss.c          |    6 ++--
 drivers/block/ub.c             |   14 +------------
 drivers/block/virtio_blk.c     |    4 +-
 drivers/cdrom/cdrom.c          |    3 +-
 drivers/ide/ide-floppy_ioctl.c |    3 +-
 drivers/md/dm-flakey.c         |   11 +++++++++-
 drivers/md/dm-linear.c         |   12 ++++++++++-
 drivers/md/dm-mpath.c          |    6 +++++
 drivers/scsi/sd.c              |   13 +++++++++--
 include/linux/blkdev.h         |    3 ++
 11 files changed, 89 insertions(+), 27 deletions(-)

-- 
1.7.7.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] block: add and use scsi_blk_cmd_ioctl
  2011-12-22 18:02 [PATCH 0/3] possible privilege escalation via SG_IO ioctl (CVE-2011-4127) Paolo Bonzini
@ 2011-12-22 18:02 ` Paolo Bonzini
  2011-12-22 18:02 ` [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices Paolo Bonzini
  2011-12-22 18:02 ` [PATCH 3/3] dm: do not forward ioctls from logical volumes to the underlying device Paolo Bonzini
  2 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2011-12-22 18:02 UTC (permalink / raw)
  To: linux-kernel, security, pmatouse, agk, jbottomley, mchristi,
	msnitzer, torvalds

Introduce a wrapper around scsi_cmd_ioctl that takes a block device.
The function will then be enhanced to detect partition block devices and,
in that case, subject the ioctls to whitelisting.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/scsi_ioctl.c             |    7 +++++++
 drivers/block/cciss.c          |    6 +++---
 drivers/block/ub.c             |   14 +-------------
 drivers/block/virtio_blk.c     |    4 ++--
 drivers/cdrom/cdrom.c          |    3 +--
 drivers/ide/ide-floppy_ioctl.c |    3 +--
 drivers/scsi/sd.c              |    2 +-
 include/linux/blkdev.h         |    2 ++
 8 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index e5b1001..48dfbe7 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -675,6 +675,13 @@ int scsi_cmd_ioctl(struct request_queue *q, struct gendisk *bd_disk, fmode_t mod
 }
 EXPORT_SYMBOL(scsi_cmd_ioctl);
 
+int scsi_cmd_blk_ioctl(struct block_device *bd, fmode_t mode,
+		       unsigned int cmd, void __user *arg)
+{
+	return scsi_cmd_ioctl(bd->bd_disk->queue, bd->bd_disk, mode, cmd, arg);
+}
+EXPORT_SYMBOL(scsi_cmd_blk_ioctl);
+
 static int __init blk_scsi_ioctl_init(void)
 {
 	blk_set_cmd_filter_defaults(&blk_default_cmd_filter);
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index b8f2c02..ca4b52a 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1703,7 +1703,7 @@ static int cciss_ioctl(struct block_device *bdev, fmode_t mode,
 	case CCISS_BIG_PASSTHRU:
 		return cciss_bigpassthru(h, argp);
 
-	/* scsi_cmd_ioctl handles these, below, though some are not */
+	/* scsi_cmd_blk_ioctl handles these, below, though some are not */
 	/* very meaningful for cciss.  SG_IO is the main one people want. */
 
 	case SG_GET_VERSION_NUM:
@@ -1714,9 +1714,9 @@ static int cciss_ioctl(struct block_device *bdev, fmode_t mode,
 	case SG_EMULATED_HOST:
 	case SG_IO:
 	case SCSI_IOCTL_SEND_COMMAND:
-		return scsi_cmd_ioctl(disk->queue, disk, mode, cmd, argp);
+		return scsi_cmd_blk_ioctl(bdev, mode, cmd, argp);
 
-	/* scsi_cmd_ioctl would normally handle these, below, but */
+	/* scsi_cmd_blk_ioctl would normally handle these, below, but */
 	/* they aren't a good fit for cciss, as CD-ROMs are */
 	/* not supported, and we don't have any bus/target/lun */
 	/* which we present to the kernel. */
diff --git a/drivers/block/ub.c b/drivers/block/ub.c
index 761ee6d..d3b15ff 100644
--- a/drivers/block/ub.c
+++ b/drivers/block/ub.c
@@ -1721,12 +1721,11 @@ static int ub_bd_release(struct gendisk *disk, fmode_t mode)
 static int ub_bd_ioctl(struct block_device *bdev, fmode_t mode,
     unsigned int cmd, unsigned long arg)
 {
-	struct gendisk *disk = bdev->bd_disk;
 	void __user *usermem = (void __user *) arg;
 	int ret;
 
 	mutex_lock(&ub_mutex);
-	ret = scsi_cmd_ioctl(disk->queue, disk, mode, cmd, usermem);
+	ret = scsi_cmd_blk_ioctl(bdev, mode, cmd, usermem);
 	mutex_unlock(&ub_mutex);
 
 	return ret;
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index cc4f6ad..14f9a2a 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -233,8 +233,8 @@ static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
 	if (!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_SCSI))
 		return -ENOTTY;
 
-	return scsi_cmd_ioctl(disk->queue, disk, mode, cmd,
-			      (void __user *)data);
+	return scsi_cmd_blk_ioctl(bdev, mode, cmd,
+				  (void __user *)data);
 }
 
 /* We provide getgeo only to please some old bootloader/partitioning tools */
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 3bba1a2..20f5157 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2688,12 +2688,11 @@ int cdrom_ioctl(struct cdrom_device_info *cdi, struct block_device *bdev,
 {
 	void __user *argp = (void __user *)arg;
 	int ret;
-	struct gendisk *disk = bdev->bd_disk;
 
 	/*
 	 * Try the generic SCSI command ioctl's first.
 	 */
-	ret = scsi_cmd_ioctl(disk->queue, disk, mode, cmd, argp);
+	ret = scsi_cmd_blk_ioctl(bdev, mode, cmd, argp);
 	if (ret != -ENOTTY)
 		return ret;
 
diff --git a/drivers/ide/ide-floppy_ioctl.c b/drivers/ide/ide-floppy_ioctl.c
index 9c22882..05f024c 100644
--- a/drivers/ide/ide-floppy_ioctl.c
+++ b/drivers/ide/ide-floppy_ioctl.c
@@ -287,8 +287,7 @@ int ide_floppy_ioctl(ide_drive_t *drive, struct block_device *bdev,
 	 * and CDROM_SEND_PACKET (legacy) ioctls
 	 */
 	if (cmd != CDROM_SEND_PACKET && cmd != SCSI_IOCTL_SEND_COMMAND)
-		err = scsi_cmd_ioctl(bdev->bd_disk->queue, bdev->bd_disk,
-				mode, cmd, argp);
+		err = scsi_cmd_blk_ioctl(bdev, mode, cmd, argp);
 
 	if (err == -ENOTTY)
 		err = generic_ide_ioctl(drive, bdev, cmd, arg);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 5de155d..c6c449a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1079,7 +1079,7 @@ static int sd_ioctl(struct block_device *bdev, fmode_t mode,
 			error = scsi_ioctl(sdp, cmd, p);
 			break;
 		default:
-			error = scsi_cmd_ioctl(disk->queue, disk, mode, cmd, p);
+			error = scsi_cmd_blk_ioctl(bdev, mode, cmd, p);
 			if (error != -ENOTTY)
 				break;
 			error = scsi_ioctl(sdp, cmd, p);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a7f9867..03a00a6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -761,6 +761,8 @@ extern void blk_plug_device(struct request_queue *);
 				     struct request *rq);
 extern void blk_delay_queue(struct request_queue *, unsigned long);
 extern void blk_recount_segments(struct request_queue *, struct bio *);
+extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
+			      unsigned int, void __user *);
 extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			  unsigned int, void __user *);
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
-- 
1.7.7.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 18:02 [PATCH 0/3] possible privilege escalation via SG_IO ioctl (CVE-2011-4127) Paolo Bonzini
  2011-12-22 18:02 ` [PATCH 1/3] block: add and use scsi_blk_cmd_ioctl Paolo Bonzini
@ 2011-12-22 18:02 ` Paolo Bonzini
  2011-12-22 18:37   ` Linus Torvalds
  2011-12-22 18:02 ` [PATCH 3/3] dm: do not forward ioctls from logical volumes to the underlying device Paolo Bonzini
  2 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2011-12-22 18:02 UTC (permalink / raw)
  To: linux-kernel, security, pmatouse, agk, jbottomley, mchristi,
	msnitzer, torvalds

Linux allows executing the SG_IO ioctl on a partition or even on an
LVM volume, and will pass the command to the underlying block device.
This is well-known, but it is also a large security problem when (via
Unix permissions, ACLs, SELinux or a combination thereof) a program or
user needs to be granted access to a particular partition or logical
volume but not to the full device.

This patch limits the ioctls that are forwarded to non-SCSI devices to
a few ones that are harmless.  This restriction includes programs
running with the CAP_SYS_RAWIO.  If for example I let a program access
/dev/sda2 and /dev/sdb, it still should not be able to read/write outside
the boundaries of /dev/sda2 independent of the capabilities.

This patch does not affect the non-libata IDE driver.  That driver however
already tests for bd != bd->bd_contains before issuing some ioctl; so,
programs that do not require CAP_SYS_ADMIN or CAP_SYS_RAWIO are safe.
Whenever possible a workaround is just to use libata, of course.

Encryption on the host is a mitigating factor, but it does not provide
a full solution.  In particular it doesn't protect against DoS (write
random data), replay attacks (reinstate old ciphertext sectors), or
writes to unencrypted areas including the MBR, the partition table, or
/boot.

Thanks to Daniel Berrange, Milan Broz, Mike Christie, Alasdair Kergon,
Petr Matousek, Jeff Moyer, Mike Snitzer and others for help discussing
this issue.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/scsi_ioctl.c     |   34 ++++++++++++++++++++++++++++++++++
 drivers/scsi/sd.c      |   11 +++++++++--
 include/linux/blkdev.h |    1 +
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 48dfbe7..6411f8c 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -675,9 +675,43 @@ int scsi_cmd_ioctl(struct request_queue *q, struct gendisk *bd_disk, fmode_t mod
 }
 EXPORT_SYMBOL(scsi_cmd_ioctl);
 
+int scsi_verify_blk_ioctl(struct block_device *bd, unsigned int cmd)
+{
+	if (bd && bd == bd->bd_contains)
+		return 0;
+
+	/* Actually none of this is particularly useful on a partition
+	 * device, but let's play it safe.
+	 */
+	switch (cmd) {
+	case SCSI_IOCTL_GET_IDLUN:
+	case SCSI_IOCTL_GET_BUS_NUMBER:
+	case SCSI_IOCTL_GET_PCI:
+	case SCSI_IOCTL_PROBE_HOST:
+	case SG_GET_VERSION_NUM:
+	case SG_SET_TIMEOUT:
+	case SG_GET_TIMEOUT:
+	case SG_GET_RESERVED_SIZE:
+	case SG_SET_RESERVED_SIZE:
+	case SG_EMULATED_HOST:
+		return 0;
+	default:
+		break;
+	}
+	/* In particular, rule out all resets and host-specific ioctls.  */
+	return -ENOTTY;
+}
+EXPORT_SYMBOL(scsi_verify_blk_ioctl);
+
 int scsi_cmd_blk_ioctl(struct block_device *bd, fmode_t mode,
 		       unsigned int cmd, void __user *arg)
 {
+	int ret;
+
+	ret = scsi_verify_blk_ioctl(bd, cmd);
+	if (ret < 0)
+		return ret;
+
 	return scsi_cmd_ioctl(bd->bd_disk->queue, bd->bd_disk, mode, cmd, arg);
 }
 EXPORT_SYMBOL(scsi_cmd_blk_ioctl);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c6c449a..0c5954c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1058,6 +1058,10 @@ static int sd_ioctl(struct block_device *bdev, fmode_t mode,
 	SCSI_LOG_IOCTL(1, sd_printk(KERN_INFO, sdkp, "sd_ioctl: disk=%s, "
 				    "cmd=0x%x\n", disk->disk_name, cmd));
 
+	error = scsi_verify_blk_ioctl(bdev, cmd);
+	if (error < 0)
+		return error;
+
 	/*
 	 * If we are in the middle of error recovery, don't let anyone
 	 * else try and use this device.  Also, if error recovery fails, it
@@ -1228,6 +1232,11 @@ static int sd_compat_ioctl(struct block_device *bdev, fmode_t mode,
 			   unsigned int cmd, unsigned long arg)
 {
 	struct scsi_device *sdev = scsi_disk(bdev->bd_disk)->device;
+	int ret;
+
+	ret = scsi_verify_blk_ioctl(bdev, cmd);
+	if (ret < 0)
+		return ret == -ENOTTY ? -ENOIOCTLCMD : ret;
 
 	/*
 	 * If we are in the middle of error recovery, don't let anyone
@@ -1239,8 +1248,6 @@ static int sd_compat_ioctl(struct block_device *bdev, fmode_t mode,
 		return -ENODEV;
 	       
 	if (sdev->host->hostt->compat_ioctl) {
-		int ret;
-
 		ret = sdev->host->hostt->compat_ioctl(sdev, cmd, (void __user *)arg);
 
 		return ret;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 03a00a6..11cf6ca 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -761,6 +761,7 @@ extern void blk_plug_device(struct request_queue *);
 				     struct request *rq);
 extern void blk_delay_queue(struct request_queue *, unsigned long);
 extern void blk_recount_segments(struct request_queue *, struct bio *);
+extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
 extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
 			      unsigned int, void __user *);
 extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
-- 
1.7.7.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] dm: do not forward ioctls from logical volumes to the underlying device
  2011-12-22 18:02 [PATCH 0/3] possible privilege escalation via SG_IO ioctl (CVE-2011-4127) Paolo Bonzini
  2011-12-22 18:02 ` [PATCH 1/3] block: add and use scsi_blk_cmd_ioctl Paolo Bonzini
  2011-12-22 18:02 ` [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices Paolo Bonzini
@ 2011-12-22 18:02 ` Paolo Bonzini
  2 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2011-12-22 18:02 UTC (permalink / raw)
  To: linux-kernel, security, pmatouse, agk, jbottomley, mchristi,
	msnitzer, torvalds

A logical volume can map to just part of underlying physical volume.
In this case, it must be treated like a partition.

Based on a patch from Alasdair G Kergon.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/md/dm-flakey.c |   11 ++++++++++-
 drivers/md/dm-linear.c |   12 +++++++++++-
 drivers/md/dm-mpath.c  |    6 ++++++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index f84c080..9fb18c1 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -368,8 +368,17 @@ static int flakey_status(struct dm_target *ti, status_type_t type,
 static int flakey_ioctl(struct dm_target *ti, unsigned int cmd, unsigned long arg)
 {
 	struct flakey_c *fc = ti->private;
+	struct dm_dev *dev = fc->dev;
+	int r = 0;
 
-	return __blkdev_driver_ioctl(fc->dev->bdev, fc->dev->mode, cmd, arg);
+	/*
+	 * Only pass ioctls through if the device sizes match exactly.
+	 */
+	if (fc->start ||
+	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
+		r = scsi_verify_blk_ioctl(NULL, cmd);
+
+	return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
 }
 
 static int flakey_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 3921e3b..9728839 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -116,7 +116,17 @@ static int linear_ioctl(struct dm_target *ti, unsigned int cmd,
 			unsigned long arg)
 {
 	struct linear_c *lc = (struct linear_c *) ti->private;
-	return __blkdev_driver_ioctl(lc->dev->bdev, lc->dev->mode, cmd, arg);
+	struct dm_dev *dev = lc->dev;
+	int r = 0;
+
+	/*
+	 * Only pass ioctls through if the device sizes match exactly.
+	 */
+	if (lc->start ||
+	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
+		r = scsi_verify_blk_ioctl(NULL, cmd);
+
+	return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg);
 }
 
 static int linear_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 8ab377f..2bfbdb9 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1521,6 +1521,12 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,
 
 	spin_unlock_irqrestore(&m->lock, flags);
 
+	/*
+	 * Only pass ioctls through if the device sizes match exactly.
+	 */
+	if (!r && ti->len != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT)
+		r = scsi_verify_blk_ioctl(NULL, cmd);
+
 	return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg);
 }
 
-- 
1.7.7.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 18:02 ` [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices Paolo Bonzini
@ 2011-12-22 18:37   ` Linus Torvalds
  2011-12-22 19:11     ` Willy Tarreau
  2011-12-22 19:18     ` Paolo Bonzini
  0 siblings, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2011-12-22 18:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, security, pmatouse, agk, jbottomley, mchristi, msnitzer

On Thu, Dec 22, 2011 at 10:02 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Linux allows executing the SG_IO ioctl on a partition or even on an
> LVM volume, and will pass the command to the underlying block device.
> This is well-known, but it is also a large security problem when (via
> Unix permissions, ACLs, SELinux or a combination thereof) a program or
> user needs to be granted access to a particular partition or logical
> volume but not to the full device.

So who actually *does* this in practice?

> +       /* In particular, rule out all resets and host-specific ioctls.  */
> +       return -ENOTTY;

This kind of crazy needs to go away.

If it's a permission problem, state that. Don't turn it into ENOTTY that then:

> +               return ret == -ENOTTY ? -ENOIOCTLCMD : ret;

gets turned into another random error number.

That's just crazy. No way am I ever applying patches *this* confused
without some serious explanation. And quite frankly, probably not even
then.

If it's a permission issue, return EPERM. Nothing else.

If returning EPERM causes problems for eject or other user tools,
explain it. And then just say "ok, we allow eject on a partition, and
it ejects the disk".

                       Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 18:37   ` Linus Torvalds
@ 2011-12-22 19:11     ` Willy Tarreau
  2011-12-22 19:18     ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Willy Tarreau @ 2011-12-22 19:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paolo Bonzini, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer

On Thu, Dec 22, 2011 at 10:37:56AM -0800, Linus Torvalds wrote:
> On Thu, Dec 22, 2011 at 10:02 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Linux allows executing the SG_IO ioctl on a partition or even on an
> > LVM volume, and will pass the command to the underlying block device.
> > This is well-known, but it is also a large security problem when (via
> > Unix permissions, ACLs, SELinux or a combination thereof) a program or
> > user needs to be granted access to a particular partition or logical
> > volume but not to the full device.
> 
> So who actually *does* this in practice?

I've seen this in the past when mtools were used a lot to access FAT
partitions on dual-boot systems. I've also seen it with vmware, where
a user is allowed to boot the other OS within vmware without rebooting.
But granted this is not the most common scheme.

Willy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 18:37   ` Linus Torvalds
  2011-12-22 19:11     ` Willy Tarreau
@ 2011-12-22 19:18     ` Paolo Bonzini
  2011-12-22 19:44       ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2011-12-22 19:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, security, pmatouse, agk, jbottomley, mchristi, msnitzer

On 12/22/2011 07:37 PM, Linus Torvalds wrote:
> On Thu, Dec 22, 2011 at 10:02 AM, Paolo Bonzini<pbonzini@redhat.com>  wrote:
>> Linux allows executing the SG_IO ioctl on a partition or even on an
>> LVM volume, and will pass the command to the underlying block device.
>> This is well-known, but it is also a large security problem when (via
>> Unix permissions, ACLs, SELinux or a combination thereof) a program or
>> user needs to be granted access to a particular partition or logical
>> volume but not to the full device.
>
> So who actually *does* this in practice?

Virtualization, as explained in the cover letter.

>> +       /* In particular, rule out all resets and host-specific ioctls.  */
>> +       return -ENOTTY;
>
> This kind of crazy needs to go away.

What crazy?  It's not a permission problem.  Sending a SCSI command to a 
partition makes no sense.  A permission problem implies that somehow you 
should be able to fix it by granting additional permissions, which is 
not the case here.

> If it's a permission problem, state that. Don't turn it into ENOTTY that then:
>
>> +               return ret == -ENOTTY ? -ENOIOCTLCMD : ret;
>
> gets turned into another random error number.

That's existing craziness of the compat_ioctl mechanism:

/* Most of the generic ioctls are handled in the normal fallback path.
    This assumes the blkdev's low level compat_ioctl always returns
    ENOIOCTLCMD for unknown ioctls. */

The logic is quite intricate:

1. process generic block layer ioctls that require compat handling 
(compat_blkdev_ioctl)

2. process device-specific ioctls that require special 32-on-64 
handling, whose implementation is outside block/ (sd_compat_ioctl).

3. process device-specific ioctls that require special 32-on-64 
handling, whose implementation is in block/compat_ioctl.c 
(compat_blkdev_driver_ioctl).

4. fallback to the normal ioctl implementation for ioctls that do not 
require 32-on-64 (__blkdev_driver_ioctl).

If I return ENOTTY (or EPERM for that matter: anything but ENOIOCTLCMD), 
then I rule out execution of steps 3 and especially 4.  This means 
32-on-64 systems will get ENOTTY for BLKGETSIZE64 and will fail to boot.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 19:18     ` Paolo Bonzini
@ 2011-12-22 19:44       ` Linus Torvalds
  2011-12-22 20:23         ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2011-12-22 19:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, security, pmatouse, agk, jbottomley, mchristi, msnitzer

On Thu, Dec 22, 2011 at 11:18 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> This kind of crazy needs to go away.
>
> What crazy?  It's not a permission problem.  Sending a SCSI command to a
> partition makes no sense.  A permission problem implies that somehow you
> should be able to fix it by granting additional permissions, which is not
> the case here.

Ahh, I misread the intention here, and didn't notice that it was doing
it on the stupid SCSI ioctl commands, not the lowlevel SCSI "cmd". The
fact that the changelog talked about sending read/write commands down
to the disk confused me.

But please do use ENOIOCTLCMD directly then, instead of using ENOTTY
and turning it into ENOIOCTLCMD.

                   Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 19:44       ` Linus Torvalds
@ 2011-12-22 20:23         ` Paolo Bonzini
  2011-12-22 20:52           ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2011-12-22 20:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, security, pmatouse, agk, jbottomley, mchristi, msnitzer

On 12/22/2011 08:44 PM, Linus Torvalds wrote:
>>> >>  This kind of crazy needs to go away.
>> >
>> >  What crazy?  It's not a permission problem.  Sending a SCSI command to a
>> >  partition makes no sense.  A permission problem implies that somehow you
>> >  should be able to fix it by granting additional permissions, which is not
>> >  the case here.
> Ahh, I misread the intention here, and didn't notice that it was doing
> it on the stupid SCSI ioctl commands, not the lowlevel SCSI "cmd". The
> fact that the changelog talked about sending read/write commands down
> to the disk confused me.
>
> But please do use ENOIOCTLCMD directly then, instead of using ENOTTY
> and turning it into ENOIOCTLCMD.

I disagree.  ENOTTY is perfect in all cases except the compat_ioctl 
(which I'm not denying is ugly, but beautifying it would make everything 
else ugly).

In fact ENOTTY means "fail", ENOIOCTLCMD means "handle this elsewhere". 
  Only with compat_ioctl it makes sense to "handle this elsewhere" (we 
know that we will get it again in the non-compat fallback path, and 
return -ENOTTY).

Secondarily, ENOIOCTLCMD is ultimately turned into EINVAL when the 
system call returns (not ENOTTY).

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 20:23         ` Paolo Bonzini
@ 2011-12-22 20:52           ` Linus Torvalds
  2011-12-22 22:08             ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2011-12-22 20:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, security, pmatouse, agk, jbottomley, mchristi, msnitzer

On Thu, Dec 22, 2011 at 12:23 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> I disagree.  ENOTTY is perfect in all cases except the compat_ioctl (which
> I'm not denying is ugly, but beautifying it would make everything else
> ugly).

No it's not.

ENOTTTY isn't ever perfect. We should never have used it to begin with
inside the kernel.

Why would it make *anything* else uglier? Just return it, don't try to
change it anywhere. Does it break anything?

> Secondarily, ENOIOCTLCMD is ultimately turned into EINVAL when the system
> call returns (not ENOTTY).

That's a completely independent bug that has been discussed several
times. We probably should just bite the bullet and fix it. EINVAL is
never the right return value (unless the per-ioctl arguments
themselves are invalid, not for bad ioctls).

The EINVAL return comes from some people who didn't understand that
ENOTTY means "no such ioctl" despite the name.

But regardless, for your patch, it shouldn't matter.

                  Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 20:52           ` Linus Torvalds
@ 2011-12-22 22:08             ` Paolo Bonzini
  2011-12-22 22:25               ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2011-12-22 22:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, security, pmatouse, agk, jbottomley, mchristi, msnitzer

On 12/22/2011 09:52 PM, Linus Torvalds wrote:
> On Thu, Dec 22, 2011 at 12:23 PM, Paolo Bonzini<pbonzini@redhat.com>  wrote:
>>
>> I disagree.  ENOTTY is perfect in all cases except the compat_ioctl (which
>> I'm not denying is ugly, but beautifying it would make everything else
>> ugly).
>
> No it's not.
>
> ENOTTTY isn't ever perfect. We should never have used it to begin with
> inside the kernel.
>
> Why would it make *anything* else uglier? Just return it, don't try to
> change it anywhere. Does it break anything?

I don't know.  But given the ways in which earlier versions broke during 
testing, I wouldn't be able to tell before running many tests.  Which 
with the Christmas break would easily mean a little less than 2 weeks.

>> Secondarily, ENOIOCTLCMD is ultimately turned into EINVAL when the system
>> call returns (not ENOTTY).
>
> That's a completely independent bug that has been discussed several
> times. We probably should just bite the bullet and fix it. EINVAL is
> never the right return value (unless the per-ioctl arguments
> themselves are invalid, not for bad ioctls).

Makes a lot of sense and I was wondering about it myself, but I'm not 
sure stable would accept that.

I understand that it's kind of backwards to commit as is and cleanup 
later, but it's also backwards to force all distros and stable to 
deviate from upstream for the sake of cleaning up.

> The EINVAL return comes from some people who didn't understand that
> ENOTTY means "no such ioctl" despite the name.

I did. :)

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 22:08             ` Paolo Bonzini
@ 2011-12-22 22:25               ` Linus Torvalds
  2011-12-22 23:48                 ` Alasdair G Kergon
  2011-12-23  0:17                 ` H. Peter Anvin
  0 siblings, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2011-12-22 22:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, security, pmatouse, agk, jbottomley, mchristi, msnitzer

On Thu, Dec 22, 2011 at 2:08 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> I don't know.  But given the ways in which earlier versions broke during
> testing, I wouldn't be able to tell before running many tests.  Which with
> the Christmas break would easily mean a little less than 2 weeks.

I don't think I'm ready to apply this as-is anyway, exactly because
I'd worry about some distribution really depending on being able to
eject disks by partition etc.

I don't *think* anybody does something as crazy as giving actual block
device ownership to people, but I don't know - and I could easily
imagine that even if they don't give the block device to the user, the
suid binary might do "eject /dev/sda1" etc. And one way to do that is
to do the "door unlock" scsi command itself.

So at this point I'd much *much* rather see this as a "we merge it for
3.3 and mark it for stable", so that we can at least get the testing
in the git tree, rather than me taking what could potentially break
peoples setups very late in the -rc series just before a release.

>> That's a completely independent bug that has been discussed several
>> times. We probably should just bite the bullet and fix it. EINVAL is
>> never the right return value (unless the per-ioctl arguments
>> themselves are invalid, not for bad ioctls).
>
> Makes a lot of sense and I was wondering about it myself, but I'm not sure
> stable would accept that.

Again, I think it's something we should do during the merge window
(early at that).

It really has come up several times, it just always comes up at the
wrong moment. If you remind me after the 3.2 release, I'll make sure
to "just do it", and we'll see if anything break.

I doubt anything will, because ENOTTY really is the correct error
return (to user space - for traditional reasons) for "no such ioctl".

Inside the kernel we generally want to use ENOIOCTLCMD, which is much
better named - and thus avoids the confusion that some people have
about the ENOTTY name - and that also has the whole "let's just try
another approach one then".

>> The EINVAL return comes from some people who didn't understand that
>> ENOTTY means "no such ioctl" despite the name.
>
> I did. :)

I suspect most people who have used Unix for a long time kind of take
the "ENOTTY - no such ioctl" thing for granted and don't even think
about it, but we do have a lot of people who write drivers etc and
that started with Linux, and I can understand why they think "ENOTTY"
is an odd name for a SCSI device driver to use.

                Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 22:25               ` Linus Torvalds
@ 2011-12-22 23:48                 ` Alasdair G Kergon
  2011-12-23  0:07                   ` Linus Torvalds
  2011-12-23  0:17                 ` H. Peter Anvin
  1 sibling, 1 reply; 27+ messages in thread
From: Alasdair G Kergon @ 2011-12-22 23:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paolo Bonzini, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer

On Thu, Dec 22, 2011 at 02:25:56PM -0800, Linus Torvalds wrote:
> I don't *think* anybody does something as crazy as giving actual block
> device ownership to people, 

That can happen when running virtual machines backed by logical volumes.

Say I am running a server that offers virtual machines to different
people, and I allow those people to have root access within their own
guest, but, naturally, I don't give them any access to other people's
guests.

I pool my disks on the server into a Volume Group and create one simple
Logical Volume per guest VM to hold its filesystem.

Due to this bug, a root user inside one guest VM can see and modify the
contents of other VMs that don't belong to them (and in some situations
perhaps even take control of the host machine by modifying the host's
LVM metadata).

Alasdair


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 23:48                 ` Alasdair G Kergon
@ 2011-12-23  0:07                   ` Linus Torvalds
  2011-12-23  6:26                     ` Willy Tarreau
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2011-12-23  0:07 UTC (permalink / raw)
  To: Linus Torvalds, Paolo Bonzini, linux-kernel, security, pmatouse,
	agk, jbottomley, mchristi, msnitzer

On Thu, Dec 22, 2011 at 3:48 PM, Alasdair G Kergon <agk@redhat.com> wrote:
> On Thu, Dec 22, 2011 at 02:25:56PM -0800, Linus Torvalds wrote:
>> I don't *think* anybody does something as crazy as giving actual block
>> device ownership to people,
>
> That can happen when running virtual machines backed by logical volumes.

So my worry is not so much the security fix for the vm case, but
simply normal people - who don't hit the issue to begin with - now
being screwed because what their distro does no longer works.

For example, I just traced it, and "eject /dev/sdb1" does a CDROMEJECT
ioctl when used as the root user. I haven't tested the patch, but just
reading it, I'd expect it to break that.

And that's the *natural* way to eject a mounted device. Look at the
USB memory sticks you have. They are almost all partitioned to have
one partition, and that one partition doesn't cover the whole device.
And it's that one partition you use to interact with it - it's what
you mount, and what you eject.

Breaking things like that is not an option. It's stuff people do every
day. And there may be some very non-obvious reason that I don't see
why it's not broken by this patch-series, but that's the kind of thing
that I worry about.

And I worry about it a *lot* more than I worry about some broken
virtualization setup where you have system engineers that could patch
their own kernel if they feel strongly about this. We simply
*must*not* break things.

I absolutely do not get the feeling that this has been tested so much
and is so obvious that there is no risk of breakage.

I suspect one thing that would be reasonable would be to just say
"Root can do any ioctl's he damn well wants on a partial device". That
would make me worry much less about the "normal" setup breaking where
you don't give regular users direct access to partitions to begin
with.

But even that might not work reliably. Maybe the system deamon that
actually does the eject (when a normal user does an eject) has been
setguid to 'disk' instead, and isn't root. I don't know. I doubt you
know either.  Maybe some of them have fallback code to find the parent
device. Do you know? Do you know that all do? I doubt it.

                          Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-22 22:25               ` Linus Torvalds
  2011-12-22 23:48                 ` Alasdair G Kergon
@ 2011-12-23  0:17                 ` H. Peter Anvin
  1 sibling, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2011-12-23  0:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paolo Bonzini, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer

On 12/22/2011 02:25 PM, Linus Torvalds wrote:
> I don't *think* anybody does something as crazy as giving actual block
> device ownership to people, but I don't know...

It's quite common and often very necessary to do, to allow a user to
manipulate a device that is not intended to be mounted.

	-hpa


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-23  0:07                   ` Linus Torvalds
@ 2011-12-23  6:26                     ` Willy Tarreau
  2011-12-23  9:22                       ` Linus Torvalds
  2011-12-26  1:41                       ` Daniel Barkalow
  0 siblings, 2 replies; 27+ messages in thread
From: Willy Tarreau @ 2011-12-23  6:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paolo Bonzini, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer

On Thu, Dec 22, 2011 at 04:07:46PM -0800, Linus Torvalds wrote:
> For example, I just traced it, and "eject /dev/sdb1" does a CDROMEJECT
> ioctl when used as the root user. I haven't tested the patch, but just
> reading it, I'd expect it to break that.
> 
> And that's the *natural* way to eject a mounted device. Look at the
> USB memory sticks you have. They are almost all partitioned to have
> one partition, and that one partition doesn't cover the whole device.
> And it's that one partition you use to interact with it - it's what
> you mount, and what you eject.

Call me dumb, but why would someone use "eject" on a non-physically
ejectable device such as a memory stick ? I use it on CDs, I've used
it on Sun floppy drives, but USB memory stick ??? After the umount,
I just have to pull it from the plug and that's all. I don't catch
what an eject command can bring me on top of that :-/

Willy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-23  6:26                     ` Willy Tarreau
@ 2011-12-23  9:22                       ` Linus Torvalds
  2011-12-23  9:45                         ` Willy Tarreau
  2011-12-23 14:15                         ` Paolo Bonzini
  2011-12-26  1:41                       ` Daniel Barkalow
  1 sibling, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2011-12-23  9:22 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Paolo Bonzini, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer

On Thu, Dec 22, 2011 at 10:26 PM, Willy Tarreau <w@1wt.eu> wrote:
>
> Call me dumb, but why would someone use "eject" on a non-physically
> ejectable device such as a memory stick ?

Perhaps because that's the operation that works everywhere and is the
simplest one?

Just look at the icon on your desktop - it probably has "Eject" above
the silly "Safely remove drive" when you right-click it. It has for
me.

And I've taught myself (and my wife) to always eject media before
removing them. I don't "unmount" them, because then for cdroms I need
to first unmount them and then eject them. Or I'd need to do different
operations for different devices. Both of which are just stupid.

So yes, I claim that "eject" is actually the *natural* thing to do
before you physically remove the medium, because it works across
different media.

But more importantly, we don't break what works. So what the f&^k was
the point of even asking? THAT was the "dumb" thing here.

                                    Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-23  9:22                       ` Linus Torvalds
@ 2011-12-23  9:45                         ` Willy Tarreau
  2011-12-23 14:15                         ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Willy Tarreau @ 2011-12-23  9:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paolo Bonzini, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer

On Fri, Dec 23, 2011 at 01:22:24AM -0800, Linus Torvalds wrote:
> On Thu, Dec 22, 2011 at 10:26 PM, Willy Tarreau <w@1wt.eu> wrote:
> >
> > Call me dumb, but why would someone use "eject" on a non-physically
> > ejectable device such as a memory stick ?
> 
> Perhaps because that's the operation that works everywhere and is the
> simplest one?

I didn't know people were doing that. For me eject was just used to
activate the motor of ejectable devices. I've just found this in the
man which explains what you're doing and which I did not know :

       If the device is currently mounted, it is unmounted before
       ejecting.

> Just look at the icon on your desktop - it probably has "Eject" above
> the silly "Safely remove drive" when you right-click it. It has for
> me.
> 
> And I've taught myself (and my wife) to always eject media before
> removing them. I don't "unmount" them, because then for cdroms I need
> to first unmount them and then eject them. Or I'd need to do different
> operations for different devices. Both of which are just stupid.

It makes sense. However it requires setting user permissions on a device,
which "umount" would not require.

> So yes, I claim that "eject" is actually the *natural* thing to do
> before you physically remove the medium, because it works across
> different media.
> 
> But more importantly, we don't break what works. So what the f&^k was
> the point of even asking? THAT was the "dumb" thing here.

The point was that I didn't even imagine people were using this command
for this. It did not seem more natural to me than to use "mt", "chvt" or
"stty" to unmount this type of device precisely because there was no motor
to eject the device. The way you present it kind of makes sense and explains
why people are doing it this way.

Thanks,
Willy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-23  9:22                       ` Linus Torvalds
  2011-12-23  9:45                         ` Willy Tarreau
@ 2011-12-23 14:15                         ` Paolo Bonzini
  2011-12-23 22:46                           ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2011-12-23 14:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer

On 12/23/2011 10:22 AM, Linus Torvalds wrote:
> On Thu, Dec 22, 2011 at 10:26 PM, Willy Tarreau<w@1wt.eu>  wrote:
>>
>> Call me dumb, but why would someone use "eject" on a non-physically
>> ejectable device such as a memory stick ?
>
> Perhaps because that's the operation that works everywhere and is the
> simplest one?

But does it actually do anything?  For me, "eject /dev/sdf1" does 
nothing beyond unmounting the disk.  Yes, it sends a CDROMEJECT ioctl 
which becomes a start/stop unit SCSI command, but it has no effect on 
the USB stick I tried.

I tried it on a card reader, and indeed it started reporting "no medium 
found" after ejecting.

And actually there it fixes a potential data corruption.  Because yes, 
most USB sticks and cards have one partition.  But if you had two, 
"eject /dev/sdf1" would unmount one partition and CDROMEJECT the whole 
card.  Which means the other partition is left unclean.  (Invoked with 
"eject /dev/sdb", eject at least tries to go through the partitions and 
unmounts all of them.  Not bulletproof, but almost there).

So "eject /dev/sdb1" on USB sticks is just an unmount.  On card readers 
it will indeed be affected by the patch.  But the actual eject operation 
that it does after unmounting ought to fail.  It hasn't so far?  Just 
another instance of the bug I'm fixing.

> Just look at the icon on your desktop - it probably has "Eject" above
> the silly "Safely remove drive" when you right-click it. It has for
> me.

Assuming you're using GNOME3, for USB sticks the "Eject" button unmounts 
the volume, but it leaves the /dev/ files in place.  "Safely remove 
drive" unmounts the volume and also does what udisks calls "detach".  It 
sends a start/stop unit SCSI command (same as CDROMEJECT and the eject 
command) and then unplugs the device via sysfs.  To be precise, it also 
sends a cache flush, just in case the disk reported itself as 
writethrough but wasn't, so it is actually safer than eject.

But udisks never sends SCSI commands to partition devices, because 
udisks knows the distinction between disk and partition.  Even on an 
unpatched kernel, "udisks --detach /dev/sdf1" fails with "Detach failed: 
Device is not a drive".  In general, I looked at the source code for 
this throughout the GNOME stack (both GNOME2 and GNOME3), and everywhere 
they distinguish between "drive" and "volume" objects.  Eject maps to a 
method on the volume's eject method, while "safely remove drive" maps to 
the drive's stop method.

So, no, nothing I tried to do on USB sticks from a recent desktop 
environment will break with the patch.

Before udisks, HAL had "unmount" and "eject".  With or without the 
patch, on USB sticks they act the same.  They unmount the volume and 
leave /dev/ files in place.

> So yes, I claim that "eject" is actually the *natural* thing to do
> before you physically remove the medium, because it works across
> different media.

If you talk about ejecting the disk, yeah, "safely remove drive" is just 
paranoia.  But if you talk about ejecting a partition from the 
command-line, then no, not at all.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-23 14:15                         ` Paolo Bonzini
@ 2011-12-23 22:46                           ` Linus Torvalds
  2012-01-05 13:18                             ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2011-12-23 22:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Willy Tarreau, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer

On Fri, Dec 23, 2011 at 6:15 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> But does it actually do anything?  For me, "eject /dev/sdf1" does nothing
> beyond unmounting the disk.  Yes, it sends a CDROMEJECT ioctl which becomes
> a start/stop unit SCSI command, but it has no effect on the USB stick I
> tried.

Guys, you are TOTALLY MISSING THE POINT.

The point is that the patch-series is damned dangerous. You have no
clue at all what it will do on various different distributions and
with various different hardware.

What's so hard to understand about this? Applying it at this point in
the release cycle with basically *zero* testing would be crazy. I gave
you just one example of where real people do ioctl's on a partition,
and where the behavior of the kernel clearly changes as a result.

The fact is, partitions are what most user interactions see. Suddenly
totally changing things and saying "you can't do that on a partition"
when clearly people *have* been doing that on partitions isn't
something we can do without serious testing.

No amount of "it didn't change in the one situation I tested" is going
to change that. What about somebody running their own distro (there's
this Russian distro maker who regularly finds regressions that nobody
else has ever seen)? What about somebody upgrading kernels on a
five-year-old user space (== pretty much any of the "enterprise"
distros)? What about other random hardware than just the USB disk
example that I gave?

It sounds like people didn't even *think* of the potential issues this
patch can bring. I'd absolutely be insane to apply them for -rc7.

             Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-23  6:26                     ` Willy Tarreau
  2011-12-23  9:22                       ` Linus Torvalds
@ 2011-12-26  1:41                       ` Daniel Barkalow
  1 sibling, 0 replies; 27+ messages in thread
From: Daniel Barkalow @ 2011-12-26  1:41 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Paolo Bonzini, linux-kernel, security, pmatouse,
	agk, jbottomley, mchristi, msnitzer

On Fri, 23 Dec 2011, Willy Tarreau wrote:

> On Thu, Dec 22, 2011 at 04:07:46PM -0800, Linus Torvalds wrote:
> > For example, I just traced it, and "eject /dev/sdb1" does a CDROMEJECT
> > ioctl when used as the root user. I haven't tested the patch, but just
> > reading it, I'd expect it to break that.
> > 
> > And that's the *natural* way to eject a mounted device. Look at the
> > USB memory sticks you have. They are almost all partitioned to have
> > one partition, and that one partition doesn't cover the whole device.
> > And it's that one partition you use to interact with it - it's what
> > you mount, and what you eject.
> 
> Call me dumb, but why would someone use "eject" on a non-physically
> ejectable device such as a memory stick ? I use it on CDs, I've used
> it on Sun floppy drives, but USB memory stick ??? After the umount,
> I just have to pull it from the plug and that's all. I don't catch
> what an eject command can bring me on top of that :-/

I use "eject" on my (old) ipod in order to get it to stop telling me not 
to unplug it. The device doesn't actually have any problems if it just 
gets yanked while it's neither mounted nor ejected, but it acts unhappy 
through its UI, since it doesn't know the computer's state. (And I pass 
"eject" the mountpoint, because that's short and tab-completes and 
"eject" translates the mountpoint into a device node with fstab so it 
works.)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2011-12-23 22:46                           ` Linus Torvalds
@ 2012-01-05 13:18                             ` Paolo Bonzini
  2012-01-05 16:16                               ` Linus Torvalds
  2012-01-05 23:49                               ` Linus Torvalds
  0 siblings, 2 replies; 27+ messages in thread
From: Paolo Bonzini @ 2012-01-05 13:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer, Christoph Hellwig

On 12/23/2011 11:46 PM, Linus Torvalds wrote:
> It sounds like people didn't even*think*  of the potential issues this
> patch can bring.  I'd absolutely be insane to apply them for -rc7.

Fair enough, I obviously cannot say they aren't intrusive.

Anyway, I set to change the patches to use ENOIOCTLCMD.  I did some
research and found the following commit:

commit d9ecdea7ed7467db32ec160f4eca46c279255606
Author: Christoph Hellwig <hch@lst.de>
Date:   Sat Jun 20 21:29:41 2009 +0200

    virtio_blk: ioctl return value fix
    
    Block driver ioctl methods must return ENOTTY and not -ENOIOCTLCMD if
    they expect the block layer to handle generic ioctls.
    
    This triggered a BLKROSET failure in xfsqa #200.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

which indeed matches the current code in block/ioctl.c:

	case BLKROSET:
		ret = __blkdev_driver_ioctl(bdev, mode, cmd, arg);
		/* -EINVAL to handle old uncorrected drivers */
		if (ret != -EINVAL && ret != -ENOTTY)
			return ret;
		if (!capable(CAP_SYS_ADMIN))
			return -EACCES;
		if (get_user(n, (int __user *)(arg)))
			return -EFAULT;
		set_device_ro(bdev, n);
		return 0;

Hence, changing scsi_verify_blk_ioctl to return ENOIOCTLCMD is not
really possible.  I can make it return a boolean value, but I do not
like it: does true mean "pass this ioctl" or "forbid this ioctl"?

Would you apply the patches as they are or do you want me to squash in
something like this?

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index a6bedfe..bb94c88 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -710,6 +710,14 @@ int scsi_verify_blk_ioctl(struct block_device *bd, unsigned int cmd)
 	case SG_SET_RESERVED_SIZE:
 	case SG_EMULATED_HOST:
 		return 0;
+
+	case CDROMEJECT:
+		/* This is also unsafe for partition devices, but
+		 * "eject /mnt/usb-drive" invokes it.  Warn about it
+		 * and keep backwards compatibility.  */
+		printk_ratelimited(KERN_WARNING
+				   "sending CDROMEJECT ioctl to a partition\n");
+		return 0;
 	default:
 		break;
 	}

... perhaps allowing it only for CAP_SYS_RAWIO?

Paolo

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2012-01-05 13:18                             ` Paolo Bonzini
@ 2012-01-05 16:16                               ` Linus Torvalds
  2012-01-05 16:40                                 ` Paolo Bonzini
  2012-01-05 23:49                               ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2012-01-05 16:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Willy Tarreau, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer, Christoph Hellwig

On Thu, Jan 5, 2012 at 5:18 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Hence, changing scsi_verify_blk_ioctl to return ENOIOCTLCMD is not
> really possible.

What?

"We have a bug in the block IO layer, so we cannot possible fix
another problem?"

Whjat the f*ck is the logic there?

Just fix the *obvious* breakage in BLKROSET. It's clearly what the
code *intends* to do, it just didn't check for ENOIOCTLCMD.

In fact, fixing that BLKROSET case would make it much more obvious
*why* it has those error tests at all. And we really should try to
just remove the crazy EINVAL case, because it really is wrong - and
then fix any broken drivers that exposes.

It's a DISEASE how you seem to think that "we have ugly mistakes in
the kernel, SO LET'S ADD ANOTHER ONE".

That's not how we do things. We *fix* things, instead of making things
even *worse*.

Stop this insanity from spreading, instead of making it spread more!

                           Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2012-01-05 16:16                               ` Linus Torvalds
@ 2012-01-05 16:40                                 ` Paolo Bonzini
  2012-01-05 17:04                                   ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2012-01-05 16:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer, Christoph Hellwig

On 01/05/2012 05:16 PM, Linus Torvalds wrote:
> On Thu, Jan 5, 2012 at 5:18 AM, Paolo Bonzini<pbonzini@redhat.com>  wrote:
>>
>> Hence, changing scsi_verify_blk_ioctl to return ENOIOCTLCMD is not
>> really possible.
>
> What?
>
> "We have a bug in the block IO layer, so we cannot possible fix
> another problem?"
>
> Whjat the f*ck is the logic there?
>
> Just fix the *obvious* breakage in BLKROSET. It's clearly what the
> code *intends* to do, it just didn't check for ENOIOCTLCMD.

Aha, so this is clear and obvious.  And who knows that something else 
won't break?  Such as the 32-on-64 logic that already uses ENOIOCTLCMD 
for something else?

If the block maintainers want to fix that, fine.  "git blame 
block/ioctl.c" shows that it's been like this for 6 years and in general 
the file has hardly seen changes.  That's enough to make me steer away 
from that code.

Foolish me who found a bug, and an exploitable one for that matter, and 
even tried to fix it.  Looks like security by obscurity would have 
served users better.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2012-01-05 16:40                                 ` Paolo Bonzini
@ 2012-01-05 17:04                                   ` Linus Torvalds
  2012-01-05 17:26                                     ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2012-01-05 17:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Willy Tarreau, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer, Christoph Hellwig

On Thu, Jan 5, 2012 at 8:40 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Aha, so this is clear and obvious.  And who knows that something else won't
> break?  Such as the 32-on-64 logic that already uses ENOIOCTLCMD for
> something else?

Exactly. Which is why we can try to do this in the merge window, and
fix up the fallout.

> Foolish me who found a bug, and an exploitable one for that matter, and even
> tried to fix it.  Looks like security by obscurity would have served users better.

Umm. I just sent out what I think is what we *should* be doing.

You are the one who seems to just want to add hack upon hack to
things. THAT is what I really hate. It's not only in bad taste, it
*will* come back and bite us some day.

If you think that "security" is about adding new special cases and
hacks, you're so out to lunch that it isn't even funny. It is
absolutely the *last* thing you want.

                    Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2012-01-05 17:04                                   ` Linus Torvalds
@ 2012-01-05 17:26                                     ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2012-01-05 17:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer, Christoph Hellwig

On 01/05/2012 06:04 PM, Linus Torvalds wrote:
>
>> >  Foolish me who found a bug, and an exploitable one for that matter, and even
>> >  tried to fix it.  Looks like security by obscurity would have served users better.
> Umm. I just sent out what I think is what we*should*  be doing.
>
> You are the one who seems to just want to add hack upon hack to
> things. THAT is what I really hate. It's not only in bad taste, it
> *will*  come back and bite us some day.

I could have just written

+	ret = scsi_verify_blk_ioctl(bdev, cmd);
+	if (ret < 0)
+		return -ENOIOCTLCMD;

It wouldn't have been any less hacky, but it would have looked quite 
normal and perhaps it would have escaped review.  I knew I was working 
around messy code, and I made that clear.  In that, I succeeded. :)

> If you think that "security" is about adding new special cases and
> hacks, you're so out to lunch that it isn't even funny. It is
> absolutely the*last*  thing you want.

Thanks for the tip, :) and thanks for picking up the cleanup.  I'll keep 
an eye and resubmit when the dust settles.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices
  2012-01-05 13:18                             ` Paolo Bonzini
  2012-01-05 16:16                               ` Linus Torvalds
@ 2012-01-05 23:49                               ` Linus Torvalds
  1 sibling, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2012-01-05 23:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Willy Tarreau, linux-kernel, security, pmatouse, agk, jbottomley,
	mchristi, msnitzer, Christoph Hellwig

On Thu, Jan 5, 2012 at 5:18 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Would you apply the patches as they are or do you want me to squash in
> something like this?
>
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index a6bedfe..bb94c88 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -710,6 +710,14 @@ int scsi_verify_blk_ioctl(struct block_device *bd, unsigned int cmd)
>        case SG_SET_RESERVED_SIZE:
>        case SG_EMULATED_HOST:
>                return 0;
> +
> +       case CDROMEJECT:
> +               /* This is also unsafe for partition devices, but
> +                * "eject /mnt/usb-drive" invokes it.  Warn about it
> +                * and keep backwards compatibility.  */
> +               printk_ratelimited(KERN_WARNING
> +                                  "sending CDROMEJECT ioctl to a partition\n");
> +               return 0;
>        default:
>                break;
>        }

I think that right now the right thing to do woult probably to

 (a) print that warning - naming the actual ioctl number - for *every*
ioctl this disallows.

 (b) after warning, let them through for CAP_SYS_RAWIO, so that if
there are users of them, we will both know about them, _and_ we will
avoid breaking them if there are no security issues.

Hmm? Can you send such an updated patch, and we can get this in early
in the merge window, and start testing? I committed and pushed out the
ENOIOCTLCMD cleanup, let's see if that causes any problems..

                      Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-01-05 23:50 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-22 18:02 [PATCH 0/3] possible privilege escalation via SG_IO ioctl (CVE-2011-4127) Paolo Bonzini
2011-12-22 18:02 ` [PATCH 1/3] block: add and use scsi_blk_cmd_ioctl Paolo Bonzini
2011-12-22 18:02 ` [PATCH 2/3] block: fail SCSI passthrough ioctls on partition devices Paolo Bonzini
2011-12-22 18:37   ` Linus Torvalds
2011-12-22 19:11     ` Willy Tarreau
2011-12-22 19:18     ` Paolo Bonzini
2011-12-22 19:44       ` Linus Torvalds
2011-12-22 20:23         ` Paolo Bonzini
2011-12-22 20:52           ` Linus Torvalds
2011-12-22 22:08             ` Paolo Bonzini
2011-12-22 22:25               ` Linus Torvalds
2011-12-22 23:48                 ` Alasdair G Kergon
2011-12-23  0:07                   ` Linus Torvalds
2011-12-23  6:26                     ` Willy Tarreau
2011-12-23  9:22                       ` Linus Torvalds
2011-12-23  9:45                         ` Willy Tarreau
2011-12-23 14:15                         ` Paolo Bonzini
2011-12-23 22:46                           ` Linus Torvalds
2012-01-05 13:18                             ` Paolo Bonzini
2012-01-05 16:16                               ` Linus Torvalds
2012-01-05 16:40                                 ` Paolo Bonzini
2012-01-05 17:04                                   ` Linus Torvalds
2012-01-05 17:26                                     ` Paolo Bonzini
2012-01-05 23:49                               ` Linus Torvalds
2011-12-26  1:41                       ` Daniel Barkalow
2011-12-23  0:17                 ` H. Peter Anvin
2011-12-22 18:02 ` [PATCH 3/3] dm: do not forward ioctls from logical volumes to the underlying device Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.