From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EC48C433EF for ; Thu, 19 May 2022 02:08:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232650AbiESCIS (ORCPT ); Wed, 18 May 2022 22:08:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229929AbiESCIS (ORCPT ); Wed, 18 May 2022 22:08:18 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F3053DA47; Wed, 18 May 2022 19:08:16 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D3E23B82291; Thu, 19 May 2022 02:08:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C772C385A9; Thu, 19 May 2022 02:08:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1652926093; bh=pnb9aXe3WZvEqZlEZ5RDbcE5AYVZAbAeYbTM7TOrBEs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=H1AKEiAlGwA6ivWEG5HRxpglk+YKbkzPHWZflqHUmXru2PjrfXfV/34mNvpwTcWxF G72Zj2gNS20rwezyTIRWyNCeizhgLytgpsfOEUI8b4OtF6/jSWzczMtso/TINld8WU wtkD4Sbh9o7AbTxWrwNi/alskZjuuWyMUMYo2+PSExX95hoxA9aY43dyyE6x58i9NP KiiXPa7k1dlgIXpNc1pz92M1XpTozD27D1mt4La+KUtopYnOo23Hc8Uul6B2Mq77DL dMuS6tIt0BEOpBKiAJjc1oRRV2eN8v2tox3oztyaRqmmO6UoGq2i/nuXLoFxysDXLN ZbS2kels4Ccuw== Date: Wed, 18 May 2022 19:08:11 -0700 From: Eric Biggers To: Keith Busch Cc: Keith Busch , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, axboe@kernel.dk, Kernel Team , hch@lst.de, bvanassche@acm.org, damien.lemoal@opensource.wdc.com Subject: Re: [PATCHv2 3/3] block: relax direct io memory alignment Message-ID: References: <20220518171131.3525293-1-kbusch@fb.com> <20220518171131.3525293-4-kbusch@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Wed, May 18, 2022 at 07:59:36PM -0600, Keith Busch wrote: > On Wed, May 18, 2022 at 06:53:11PM -0700, Eric Biggers wrote: > > On Wed, May 18, 2022 at 07:00:39PM -0600, Keith Busch wrote: > > > On Wed, May 18, 2022 at 05:14:49PM -0700, Eric Biggers wrote: > > > > On Wed, May 18, 2022 at 10:11:31AM -0700, Keith Busch wrote: > > > > > diff --git a/block/fops.c b/block/fops.c > > > > > index b9b83030e0df..d8537c29602f 100644 > > > > > --- a/block/fops.c > > > > > +++ b/block/fops.c > > > > > @@ -54,8 +54,9 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, > > > > > struct bio bio; > > > > > ssize_t ret; > > > > > > > > > > - if ((pos | iov_iter_alignment(iter)) & > > > > > - (bdev_logical_block_size(bdev) - 1)) > > > > > + if ((pos | iov_iter_count(iter)) & (bdev_logical_block_size(bdev) - 1)) > > > > > + return -EINVAL; > > > > > + if (iov_iter_alignment(iter) & bdev_dma_alignment(bdev)) > > > > > return -EINVAL; > > > > > > > > The block layer makes a lot of assumptions that bios can be split at any bvec > > > > boundary. With this patch, bios whose length isn't a multiple of the logical > > > > block size can be generated by splitting, which isn't valid. > > > > > > How? This patch ensures every segment is block size aligned. > > > > No, it doesn't. It ensures that the *total* length of each bio is logical block > > size aligned. It doesn't ensure that for the individual bvecs. By decreasing > > the required memory alignment to below the logical block size, you're allowing > > logical blocks to span a page boundary. Whenever the two pages involved aren't > > physically contiguous, the data of the block will be split across two bvecs. > > I'm aware that spanning pages can cause bad splits on the bi_max_vecs > condition, but I believe it's well handled here. Unless I'm terribly confused, > which is certainly possible, I think you may have missed this part of the > patch: > > @@ -1223,6 +1224,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); > > size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset); > + if (size > 0) > + size = ALIGN_DOWN(size, queue_logical_block_size(q)); > if (unlikely(size <= 0)) > return size ? size : -EFAULT; > That makes the total length of each "batch" of pages be a multiple of the logical block size, but individual logical blocks within that batch can still be divided into multiple bvecs in the loop just below it: for (left = size, i = 0; left > 0; left -= len, i++) { struct page *page = pages[i]; len = min_t(size_t, PAGE_SIZE - offset, left); if (__bio_try_merge_page(bio, page, len, offset, &same_page)) { if (same_page) put_page(page); } else { if (WARN_ON_ONCE(bio_full(bio, len))) { bio_put_pages(pages + i, left, offset); return -EINVAL; } __bio_add_page(bio, page, len, offset); } offset = 0; } - Eric