From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38378C433EF for ; Sat, 21 May 2022 22:14:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245400AbiEUWOM (ORCPT ); Sat, 21 May 2022 18:14:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229961AbiEUWOL (ORCPT ); Sat, 21 May 2022 18:14:11 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D33745079 for ; Sat, 21 May 2022 15:14:10 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id w17-20020a17090a529100b001db302efed6so10528012pjh.4 for ; Sat, 21 May 2022 15:14:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language :from:to:cc:references:in-reply-to:content-transfer-encoding; bh=z/8rB5FIqtxmcp3jhB1yi+D49jbUOOvCYqJH+Yf/SFA=; b=Ku41e7+1Dc3CJvqM5QD4JYfzpFWGBlw13PPdDA8S9bpz/ZALt16WO5PEZ6z4asMs4X TW/i0ve+c0rFSQCVSMqhfhMZJ3MQQBqDdXcPfIORprX37Pd+qvOXRYVNDgVxiIu96YWs xu87UXP9Ocnmh6EjMWQ0q1yY7yywbl4xeYYF/0HJScDTFisBqAGKfouPo51GK1S3J2II dB9ah9CeqvJtsJj6+tiMaUZpF30Pi8fNnxmOtgNF/gf2N2XuTelflNiMQGJ2AK8DWSzH QEjLSmAMNvE2VT50xRsnahbaVjrnampIm6+zJFCnxFHSuIIgSvekV37xXkvtBk2ncdO/ 6oQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:from:to:cc:references:in-reply-to :content-transfer-encoding; bh=z/8rB5FIqtxmcp3jhB1yi+D49jbUOOvCYqJH+Yf/SFA=; b=briZZPzWhl4TiLjzV+FEz91sL3tg0n8jfpV8YzXNo6fPxnXqRqqJ5CziR31cmx40Jm WOuOZwWJF/wj8yMZ1K1Gwu+18qPdk4Oq+0B6fQii20JDf/MEDTKR+t+k3dboAGd/tDOs iU0xNoAkLX7cLaWc29Qe6bWX4UdVeLZbBxU1HHiAtEZJ05dfdCjBsgOUmzADu/lj8lrn LBiAQRUQ0hsesgtleXuwnDRBAaznVRsZcN+gO90hlNPR4vbwxGoI1hd+XnYYJCjIjPkr xsLT6cA+YmNfNA+lsdcoXVpVBJxeicbout8Xy4h9mzgSPFOrY0Uz9bNx+kzUtiEgVL1q 6n2A== X-Gm-Message-State: AOAM533MKsoLVJN6wTLZqH0LgzgURo889UtFSpywkbEfH2vdwdVVCKB2 PcOk3CcT+ACnG8s00MOqowHxpZZLJErJAw== X-Google-Smtp-Source: ABdhPJxv2AtAFDDU4WzRln1Gml6rAtF32FNyBATc8i4s+IG9jkYQ3gKr/xajMSPoLfP/+z/gTGEIfg== X-Received: by 2002:a17:903:31d1:b0:159:804:e852 with SMTP id v17-20020a17090331d100b001590804e852mr15969465ple.19.1653171249436; Sat, 21 May 2022 15:14:09 -0700 (PDT) Received: from [192.168.1.100] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id t12-20020a170902e84c00b0015e8d4eb244sm1615428plg.142.2022.05.21.15.14.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 21 May 2022 15:14:08 -0700 (PDT) Message-ID: Date: Sat, 21 May 2022 16:14:07 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [RFC] what to do with IOCB_DSYNC? Content-Language: en-US From: Jens Axboe To: Al Viro , Christoph Hellwig Cc: Matthew Wilcox , linux-fsdevel@vger.kernel.org References: <20210621135958.GA1013@lst.de> <20210621140956.GA1887@lst.de> <20210621142235.GA2391@lst.de> <20210621143501.GA3789@lst.de> <70b5e4a8-1daa-dc75-af58-9d82a732a6be@kernel.dk> In-Reply-To: <70b5e4a8-1daa-dc75-af58-9d82a732a6be@kernel.dk> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On 5/21/22 1:03 PM, Jens Axboe wrote: > On 5/21/22 11:48 AM, Al Viro wrote: >> [resurrecting an old thread] >> >> On Mon, Jun 21, 2021 at 04:35:01PM +0200, Christoph Hellwig wrote: >>> On Mon, Jun 21, 2021 at 02:32:46PM +0000, Al Viro wrote: >>>> I'd rather have a single helper for those checks, rather than >>>> open-coding IS_SYNC() + IOCB_DSYNC in each, for obvious reasons... >>> >>> Yes, I think something like: >>> >>> static inline bool iocb_is_sync(struct kiocb *iocb) >>> { >>> return (iocb->ki_flags & IOCB_DSYNC) || >>> S_SYNC(iocb->ki_filp->f_mapping->host); >>> } >>> >>> should do the job. >> >> There's a problem with that variant. Take a look at btrfs_direct_write(): >> const bool is_sync_write = (iocb->ki_flags & IOCB_DSYNC); >> ... >> /* >> * We remove IOCB_DSYNC so that we don't deadlock when iomap_dio_rw() >> * calls generic_write_sync() (through iomap_dio_complete()), because >> * that results in calling fsync (btrfs_sync_file()) which will try to >> * lock the inode in exclusive/write mode. >> */ >> if (is_sync_write) >> iocb->ki_flags &= ~IOCB_DSYNC; >> ... >> >> /* >> * Add back IOCB_DSYNC. Our caller, btrfs_file_write_iter(), will do >> * the fsync (call generic_write_sync()). >> */ >> if (is_sync_write) >> iocb->ki_flags |= IOCB_DSYNC; >> >> will run into trouble. How about this (completely untested): >> >> Precalculate iocb_flags() >> >> Store the value in file->f_i_flags; calculate at open time (do_dentry_open() >> for opens, alloc_file() for pipe(2)/socket(2)/etc.), update at FCNTL_SETFL >> time. >> >> IOCB_DSYNC is... special in that respect; replace checks for it with >> an inlined helper (iocb_is_dsync()), leave the checks of underlying fs >> mounted with -o sync and/or inode being marked sync until then. >> To avoid breaking btrfs deadlock avoidance, add an explicit "no dsync" flag >> that would suppress IOCB_DSYNC; that way btrfs_direct_write() can set it >> for the duration of work where it wants to avoid generic_write_sync() >> triggering. >> >> That ought to reduce the overhead of new_sync_{read,write}() quite a bit. >> >> NEEDS TESTING; NOT FOR MERGE IN THAT FORM. > > Definitely generates better code here, unsurprisingly. Unfortunately it > doesn't seem to help a whole lot for a direct comparison, though it does > nudge us in the right direction. > > The main issue here, using urandom and 4k reads (iter reads done, > non-iter writes), the main difference seems to be that with the non-iter > reads, here's our copy overhead: > > + 1.80% dd [kernel.kallsyms] [k] __arch_copy_to_user > + 0.74% dd [kernel.kallsyms] [k] _copy_to_user > > and with the iter variant, for the same workload, it looks like this: > > + 4.13% dd [kernel.kallsyms] [k] _copy_to_iter > + 0.88% dd [kernel.kallsyms] [k] __arch_copy_to_user > + 0.72% dd [kernel.kallsyms] [k] copyout > > and about 1% just doing the iov_iter advance. Since this test case is a > single iovec read, ran a quick test where we simply use this helper: > > static size_t random_copy(void *src, int size, struct iov_iter *to) > { > const struct iovec *iov = to->iov; > size_t to_copy = min_t(size_t, size, iov->iov_len); > > if (copy_to_user(iov->iov_base, src, to_copy)) > return 0; > > to->count -= to_copy; > return to_copy; > } > > rather than copy_to_iter(). That brings us within 0.8% of the non-iter > read performance: > > non-iter: 408MB/sec > iter: 395MB/sec -3.2% > iter+al+hack 406MB/sec -0.8% > > for 32-byte reads: > > non-iter 73.1MB/sec > iter: 72.1MB/sec -1.4% > iter+al+hack 72.5MB/sec -0.8% > > which as mentioned in my initial email is closer than the 4k, but still > sees a nice reduction in overhead. Experimented a bit more - if we special case nr_segs == 1 && is_iovec in _copy_to_iter(): _copy_to_iter() { ... if (iter_is_iovec(i)) { might_fault(); if (i->nr_segs == 1) { /* copy */ iov_iter_advance(i, copied); return copied; } } iterate_and_advance(i, bytes, base, len, off, ... return bytes; } Then we're almost on par, and it looks like we just need to special case iov_iter_advance() for the nr_segs == 1 as well to be on par. This is on top of your patch as well, fwiw. It might make sense to special case the single segment cases, for both setup, iteration, and advancing. With that, I think we'll be where we want to be, and there will be no discernable difference between the iter paths and the old style paths. -- Jens Axboe