From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756025Ab3E1UZE (ORCPT ); Tue, 28 May 2013 16:25:04 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:29561 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755587Ab3E1UZC (ORCPT ); Tue, 28 May 2013 16:25:02 -0400 To: Vladislav Bolkhovitin Cc: "Martin K. Petersen" , Paolo Bonzini , "Theodore Ts'o" , Tejun Heo , "James E.J. Bottomley" , Jens Axboe , linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org Subject: Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542)) From: "Martin K. Petersen" Organization: Oracle Corporation References: <1360163761-8541-1-git-send-email-pbonzini@redhat.com> <519C674A.50700@redhat.com> <20130522093249.GC3466@mtj.dyndns.org> <519C959A.3090100@redhat.com> <20130522100212.GE3466@mtj.dyndns.org> <519C9CBC.3050003@redhat.com> <20130522134134.GA15189@mtj.dyndns.org> <519CD234.40608@redhat.com> <20130522150335.GC2777@thunk.org> <519CE9FE.2030007@redhat.com> <51A035FA.6070409@vlnb.net> Date: Tue, 28 May 2013 16:25:12 -0400 In-Reply-To: <51A035FA.6070409@vlnb.net> (Vladislav Bolkhovitin's message of "Fri, 24 May 2013 20:54:34 -0700") Message-ID: User-Agent: Gnus/5.130005 (Ma Gnus v0.5) Emacs/24.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Source-IP: ucsinet21.oracle.com [156.151.31.93] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >>>>> "Vladislav" == Vladislav Bolkhovitin writes: Vladislav> Linux block layer is purely artificial creature slowly Vladislav> reinventing wheel creating more problems, than solving. On the contrary. I do think we solve a whole bunch of problems. Vladislav> It enforces approach, where often "impossible" means Vladislav> "impossible in this interface". I agree we have limitations. I do not agree that all limitations are bad. Sometimes it's OK to say no. Vladislav> For instance, how about copy offload? How about atomic Vladislav> writes? I'm actively working on copy offload. Nobody appears to be interested in atomic writes. Otherwise I'd work on those as well. Vladislav> Why was it needed to create special blk integrity interface Vladislav> with the only end user - SCSI? Simple. Because we did not want to interleave data and PI 512+8+512+8 neither in memory, nor at DMA time. Furthermore, the ATA EPP proposal was still on the table so I also needed to support ATA. And finally, NVM Express uses the blk_integrity interface as well. Vladislav> The block layer keeps repeating SCSI. So, maybe, after all, Vladislav> it's better to acknowledge that direct usage of SCSI without Vladislav> any intermediate layers and translations is more productive? Vladislav> And for those minors not using SCSI internally, translate Vladislav> from SCSI to their internal commands? Creating and filling Vladislav> CDB fields for most cases isn't anyhow harder, than creating Vladislav> and feeling bio fields. This is quite possibly the worst idea I have heard all week. As it stands it's a headache for the disk ULD driver to figure out which of the bazillion READ/WRITE variants to send to a SCSI/ATA device. What makes you think that an application or filesystem would be better equipped to make that call? See also: WRITE SAME w/ zeroes vs. WRITE SAME w/ UNMAP vs. UNMAP See also: EXTENDED COPY vs. the PROXY command set See also: USB-ATA bridge chips You make it sound like all the block layer does is filling out CDBs. Which it doesn't in fact have anything to do with at all. When you are talking about CDBs we're down in the SBC/SSC territory. Which is such a tiny bit of what's going on. We have transports, we have SAM, we have HBA controller DMA constraints, system DMA constraints, buffer bouncing, etc. There's a ton of stuff that needs to happen before the CDB and the data physically reach the storage. You seem to be advocating that everything up to the point where the device receives the command is in the way. Well, by all means. Why limit ourselves to the confines of SCSI? Why not get rid of POSIX read()/write(), page cache, filesystems and let applications speak ST-506 directly? I know we're doing different things. My job is to make a general purpose operating system with interfaces that make sense to normal applications. That does not preclude special cases where it may make sense to poke at the device directly. For testing purposes, for instance. But I consider it a failure when we start having applications that know about hardware intricacies, cylinders/heads/sectors, etc. That road leads straight to the 1980s... -- Martin K. Petersen Oracle Linux Engineering