linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst@vlnb.net>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	"Theodore Ts'o" <tytso@mit.edu>, Tejun Heo <tj@kernel.org>,
	"James E.J. Bottomley" <JBottomley@parallels.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))
Date: Tue, 28 May 2013 23:12:28 -0700	[thread overview]
Message-ID: <51A59C4C.1050005@vlnb.net> (raw)
In-Reply-To: <yq17gii3l93.fsf@sermon.lab.mkp.net>

Martin K. Petersen, on 05/28/2013 01:25 PM wrote:
> Vladislav> Linux block layer is purely artificial creature slowly
> Vladislav> reinventing wheel creating more problems, than solving.
> 
> On the contrary. I do think we solve a whole bunch of problems.
> 
> 
> Vladislav> It enforces approach, where often "impossible" means
> Vladislav> "impossible in this interface".
> 
> I agree we have limitations. I do not agree that all limitations are
> bad. Sometimes it's OK to say no.
> 
> 
> Vladislav> For instance, how about copy offload?  How about atomic
> Vladislav> writes?
> 
> I'm actively working on copy offload. Nobody appears to be interested in
> atomic writes. Otherwise I'd work on those as well.
> 
> 
> Vladislav> Why was it needed to create special blk integrity interface
> Vladislav> with the only end user - SCSI?
> 
> Simple. Because we did not want to interleave data and PI 512+8+512+8
> neither in memory, nor at DMA time.

It can similarly be done in SCSI-like interface without need for any middleman.

> Furthermore, the ATA EPP proposal
> was still on the table so I also needed to support ATA.
> 
> And finally, NVM Express uses the blk_integrity interface as well.
> 
> 
> Vladislav> The block layer keeps repeating SCSI. So, maybe, after all,
> Vladislav> it's better to acknowledge that direct usage of SCSI without
> Vladislav> any intermediate layers and translations is more productive?
> Vladislav> And for those minors not using SCSI internally, translate
> Vladislav> from SCSI to their internal commands? Creating and filling
> Vladislav> CDB fields for most cases isn't anyhow harder, than creating
> Vladislav> and feeling bio fields.
> 
> This is quite possibly the worst idea I have heard all week.
> 
> As it stands it's a headache for the disk ULD driver to figure out which
> of the bazillion READ/WRITE variants to send to a SCSI/ATA device. What
> makes you think that an application or filesystem would be better
> equipped to make that call?
> 
> See also: WRITE SAME w/ zeroes vs. WRITE SAME w/ UNMAP vs. UNMAP 
> 
> See also: EXTENDED COPY vs. the PROXY command set
> 
> See also: USB-ATA bridge chips
> 
> You make it sound like all the block layer does is filling out
> CDBs. Which it doesn't in fact have anything to do with at all.
> 
> When you are talking about CDBs we're down in the SBC/SSC territory.
> Which is such a tiny bit of what's going on. We have transports, we have
> SAM, we have HBA controller DMA constraints, system DMA constraints,
> buffer bouncing, etc. There's a ton of stuff that needs to happen before
> the CDB and the data physically reach the storage.
> 
> You seem to be advocating that everything up to the point where the
> device receives the command is in the way. Well, by all means. Why limit
> ourselves to the confines of SCSI? Why not get rid of POSIX
> read()/write(), page cache, filesystems and let applications speak
> ST-506 directly?
> 
> I know we're doing different things. My job is to make a general purpose
> operating system with interfaces that make sense to normal applications.
> That does not preclude special cases where it may make sense to poke at
> the device directly. For testing purposes, for instance. But I consider
> it a failure when we start having applications that know about hardware
> intricacies, cylinders/heads/sectors, etc. That road leads straight to
> the 1980s...

What you mean is true, but my point is that this abstraction is better to be done in
SCSI, i.e. SAM, manner. Now need to write fields inside of CDBs, it would be pretty
inconvenient ;). But CDBs fields can be fields in some scsi_io structure. Exact opcodes
can be easily abstracted to be filled on the last stage, where end CDB is constructed
from those fields.

Problem with block abstraction is that it is the least common denominator of all block
devices capabilities, hence advanced capabilities, available only some class of
devices, are automatically become "impossible". Hence, it would be more productive
instead to use the most capable abstraction, which is SAM. In this abstraction there's
no need to reinvent complex interfaces and write complex middleman code for every
advanced capability. All advanced capabilities there are available by definition, if
supported by underlying hardware. That's my point.

POSIX is for simple applications, for which read()/write() calls are sufficient. They
are outside of our discussions. But advanced applications need more. I know plenty of
applications issuing direct SCSI commands, but how many can you name applications using
block interface (bsg)? I can recall only one quite relatively used Linux specific
library. That's all. This interface is not demanded by applications.

Vlad

  reply	other threads:[~2013-05-29  6:12 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-06 15:15 [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542) Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 01/14] sg_io: pass request_queue to blk_verify_command Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 02/14] sg_io: reorganize list of allowed commands Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 03/14] sg_io: use different default filters for each device class Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 04/14] sg_io: resolve conflicts between commands assigned to multiple classes (CVE-2012-4542) Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 05/14] sg_io: whitelist a few more commands for rare & obsolete device types Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 06/14] sg_io: whitelist another command for multimedia devices Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 07/14] sg_io: whitelist a few more commands for media changers Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 08/14] sg_io: whitelist a few more commands for tapes Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 09/14] sg_io: whitelist a few more commands for disks Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 10/14] sg_io: whitelist a few obsolete commands Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 11/14] sg_io: mark blk_set_cmd_filter_defaults as __init Paolo Bonzini
2013-02-06 15:15 ` [PATCH v2 12/14] sg_io: remove remnants of sysfs SG_IO filters Paolo Bonzini
2013-02-06 15:16 ` [PATCH v2 13/14] sg_io: introduce unpriv_sgio queue flag Paolo Bonzini
2013-02-06 15:16 ` [PATCH v2 14/14] sg_io: use unpriv_sgio to disable whitelisting for scanners Paolo Bonzini
2013-02-13  8:32 ` [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542) Paolo Bonzini
2013-02-13 15:35   ` Douglas Gilbert
2013-02-13 15:48     ` Paolo Bonzini
2013-02-20 16:12 ` Paolo Bonzini
2013-03-22 22:30   ` PING^2 " Paolo Bonzini
2013-04-04 18:18     ` PING^3 " Paolo Bonzini
2013-04-17 12:26       ` PING^4 aka The Jon Corbet Effect " Paolo Bonzini
2013-04-27 13:31         ` PING^5 aka New ways to attract attentions " Paolo Bonzini
2013-05-06 20:43   ` PING^6 " Paolo Bonzini
2013-05-22  6:35 ` PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542)) Paolo Bonzini
2013-05-22  9:32   ` Tejun Heo
2013-05-22  9:53     ` Paolo Bonzini
2013-05-22 10:02       ` Tejun Heo
2013-05-22 10:23         ` Paolo Bonzini
2013-05-22 12:07           ` James Bottomley
2013-05-22 14:07             ` Paolo Bonzini
2013-05-22 16:31               ` Paolo Bonzini
2013-05-22 13:41           ` Tejun Heo
2013-05-22 14:12             ` Paolo Bonzini
2013-05-22 14:30               ` Tejun Heo
2013-05-22 15:00                 ` Paolo Bonzini
2013-05-22 19:30                   ` Tejun Heo
2013-05-22 21:18                     ` Paolo Bonzini
2013-05-22 22:17                       ` Tejun Heo
2013-05-23  0:54                         ` Tejun Heo
2013-05-23  7:45                         ` Paolo Bonzini
2013-05-23  9:02                           ` Tejun Heo
2013-05-23  9:47                             ` Paolo Bonzini
2013-05-24  1:44                               ` Tejun Heo
2013-05-24  7:13                                 ` Paolo Bonzini
2013-05-24  8:02                                   ` Tejun Heo
2013-05-24  8:31                                     ` Paolo Bonzini
2013-05-24  9:07                                       ` Tejun Heo
2013-05-24  9:45                                         ` Paolo Bonzini
2013-05-24 22:20                                           ` Tejun Heo
2013-05-25  4:35                                     ` James Bottomley
2013-05-25  5:27                                       ` Christoph Hellwig
2013-05-25  7:05                                         ` Paolo Bonzini
2013-05-25  7:11                                           ` Christoph Hellwig
2013-05-25  7:21                                             ` Paolo Bonzini
2013-06-21 11:57                                           ` Christoph Hellwig
2013-05-25  8:37                                       ` Tejun Heo
2013-05-25 11:14                                         ` Paolo Bonzini
2013-05-25 12:48                                           ` Tejun Heo
2013-05-25 12:56                                             ` Paolo Bonzini
2013-05-22 15:03               ` Theodore Ts'o
2013-05-22 15:53                 ` Paolo Bonzini
2013-05-22 16:32                   ` Martin K. Petersen
2013-05-22 17:00                     ` Paolo Bonzini
2013-05-22 18:11                       ` Theodore Ts'o
2013-05-22 19:37                         ` Paolo Bonzini
2013-05-22 20:19                           ` Theodore Ts'o
2013-05-22 20:36                             ` Paolo Bonzini
2013-05-25  3:54                     ` Vladislav Bolkhovitin
2013-05-28 20:25                       ` Martin K. Petersen
2013-05-29  6:12                         ` Vladislav Bolkhovitin [this message]
2013-05-22 20:39                   ` Tejun Heo
2013-05-22 21:12                     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51A59C4C.1050005@vlnb.net \
    --to=vst@vlnb.net \
    --cc=JBottomley@parallels.com \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).