From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753096AbbDRCBA (ORCPT ); Fri, 17 Apr 2015 22:01:00 -0400 Received: from mail-pd0-f177.google.com ([209.85.192.177]:35402 "EHLO mail-pd0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751834AbbDRCA4 (ORCPT ); Fri, 17 Apr 2015 22:00:56 -0400 Message-ID: <5531BAD5.4030104@kernel.dk> Date: Fri, 17 Apr 2015 20:00:53 -0600 From: Jens Axboe User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Dave Chinner , Jens Axboe CC: Ming Lin , lkml , linux-fsdevel@vger.kernel.org, ming.l@ssi.samsung.com, "Kwan (Hingkwan) Huen-SSI" Subject: Re: [PATCH 3/6] direct-io: add support for write stream IDs References: <1427210823-5283-1-git-send-email-axboe@fb.com> <1427210823-5283-4-git-send-email-axboe@fb.com> <20150325024328.GF31342@dastard> <5512C588.2050805@kernel.dk> <20150411115922.GO13731@dastard> <20150417230651.GF21261@dastard> <5531932C.8030505@fb.com> <20150417235142.GH15810@dastard> In-Reply-To: <20150417235142.GH15810@dastard> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/17/2015 05:51 PM, Dave Chinner wrote: > On Fri, Apr 17, 2015 at 05:11:40PM -0600, Jens Axboe wrote: >> On 04/17/2015 05:06 PM, Dave Chinner wrote: >>> On Thu, Apr 16, 2015 at 11:20:45PM -0700, Ming Lin wrote: >>>> On Sat, Apr 11, 2015 at 4:59 AM, Dave Chinner wrote: >>>>> On Fri, Apr 10, 2015 at 04:50:05PM -0700, Ming Lin wrote: >>>>>> On Wed, Mar 25, 2015 at 7:26 AM, Jens Axboe wrote: >>>>>>>> If iocb->ki_filp->f_streamid is not set, then it should fall back to >>>>>>>> whatever is set on the inode->i_streamid. >>>>>> >>>>>> Why should do the fall back? >>>>> >>>>> Because then you have a method of using streams with applications >>>>> that aren't aware of streams. >>>>> >>>>> Or perhaps you have a file you know has different access patterns to >>>>> the rest of the files in a directory, and you don't want to have to >>>>> set the stream on every process that opens and uses that file. e.g. >>>>> database writeahead log files (sequential write, never read) vs >>>>> database index/table files (random read/write)..... >>>>> >>>>>>> Good point, agree. Will make that change. >>>>>> >>>>>> That change causes problem for direct IO, for example >>>>>> >>>>>> process 1: >>>>>> fd = open("/dev/nvme0n1", O_DIRECT...); >>>>>> //set stream_id 1 >>>>>> fadvise(fd, 1, 0, POSIX_FADV_STREAMID); >>>>>> pwrite(fd, ....); >>>>>> >>>>>> process 2: >>>>>> fd = open("/dev/nvme0n1", O_DIRECT...); >>>>>> //should be legacy stream_id 0 >>>>>> pwrite(fd, ....); >>>>>> >>>>>> But now process 2 also see stream_id 1, which is wrong. >>>>> >>>>> It's not wrong, your behaviour model is just different You have >>>>> defined a process/fd based stream model and not considered >>>>> considered that admins and applications might want to use a file >>>>> based stream model instead, so applications don't need to even be >>>>> aware that write streams are in use... >>>> >>>> The stream must be opened, otherwise device will return error if application >>>> write to a not-opened stream. >>> >>> That's an extremely device specific *implementation* of a write >>> stream. The *concept* of a write stream being passed from userspace to >>> the block layer doesn't have such constraints, and I get realy >>> concerned when implementations of a generic concept are so tightly >>> focussed around one type of hardware implementation of the >>> concept... >> >> Indeed, which is why the implementation posted cares ONLY about the >> stream ID itself, and passing that through. >> >> But the point about fallback is valid, however, for some use cases >> that will not be what you want. But we have to make some sort of >> decision, and falling back to the inode set value (if one is set) is >> probably the right thing to do in most use cases. > > Right, the question is then whether fadvise should set the value on > the inode at all, because then the effect of setting it on a fd also > changes the fallback. Perhaps we need to a distinction between > "setting the stream for this fd" which lasts as long as the fd is > active, and "setting the default inode stream" which is potentially > a persistent operation if the filesystem stores it on disk... Yes, that might be a good compromise. The easiest would be to define a second fadvise advice, where the stronger advice would be file + inode. Another option would be changing the file approach to use fcntl(), and keeping the fadvise for the inode. I'll be happy to take input on what people would prefer here. >>>> Device has limited number of streams, for example, 16 streams. >>>> There are 2 APIs to open/close the stream. >>> >>> What's to stop me writing something for DM-thinp that understands >>> write streams in bios and uses it to separate out the write streams >>> into different regions of the thinp device to improve locality of >>> it's data placement and hence reduce fragmentation? >> >> Absolutely nothing, in fact that's one of the use cases that I had >> in mind. Or for for caching software. > > *nod*. We are on the same page, then :) Yes completely, basically just wanted to clarify that. -- Jens Axboe