linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Ming Lei <tom.leiming@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Dave Chinner <david@fromorbit.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	"open list:XFS FILESYSTEM" <linux-xfs@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Dave Chinner <dchinner@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Aaron Lu <aaron.lu@intel.com>, Christopher Lameter <cl@linux.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-block <linux-block@vger.kernel.org>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc
Date: Tue, 26 Feb 2019 21:42:48 +0800	[thread overview]
Message-ID: <20190226134247.GA30942@ming.t460p> (raw)
In-Reply-To: <20190226130230.GD11592@bombadil.infradead.org>

On Tue, Feb 26, 2019 at 05:02:30AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 26, 2019 at 08:35:46PM +0800, Ming Lei wrote:
> > On Tue, Feb 26, 2019 at 04:12:09AM -0800, Matthew Wilcox wrote:
> > > On Tue, Feb 26, 2019 at 07:12:49PM +0800, Ming Lei wrote:
> > > > The buffer needs to be device block size aligned for dio, and now the block
> > > > size can be 512, 1024, 2048 and 4096.
> > > 
> > > Why does the block size make a difference?  This requirement is due to
> > > some storage devices having shoddy DMA controllers.  Are you saying there
> > > are devices which can't even do 512-byte aligned I/O?
> > 
> > Direct IO requires that, see do_blockdev_direct_IO().
> > 
> > This issue can be triggered when running xfs over loop/dio. We could
> > fallback to buffered IO under this situation, but not sure it is the
> > only case.
> 
> Wait, we're imposing a ridiculous amount of complexity on XFS for no
> reason at all?  We should just change this to 512-byte alignment.  Tying
> it to the blocksize of the device never made any sense.

OK, that is fine since we can fallback to buffered IO for loop in case of
unaligned dio.

Then something like the following patch should work for all fs, could
anyone comment on this approach?

--

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 5f2c429d4378..76f09f23a410 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -405,3 +405,44 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+static struct kmem_cache *sector_buf_slabs[(PAGE_SIZE >> 9) - 1];
+
+void *blk_alloc_sec_buf(unsigned size, gfp_t flags)
+{
+	int idx;
+
+	size = round_up(size, 512);
+	if (size >= PAGE_SIZE)
+		return NULL;
+
+	idx = (size >> 9) - 1;
+	if (!sector_buf_slabs[idx])
+		return NULL;
+	return kmem_cache_alloc(sector_buf_slabs[idx], flags);
+}
+EXPORT_SYMBOL_GPL(blk_alloc_sec_buf);
+
+void blk_free_sec_buf(void *buf, int size)
+{
+	size = round_up(size, 512);
+	if (size >= PAGE_SIZE)
+		return;
+
+	return kmem_cache_free(sector_buf_slabs[(size >> 9) - 1], buf);
+}
+EXPORT_SYMBOL_GPL(blk_free_sec_buf);
+
+void __init blk_sector_buf_init(void)
+{
+	unsigned size;
+
+	for (size = 512; size < PAGE_SIZE; size += 512) {
+		char name[16];
+		int idx = (size >> 9) - 1;
+
+		snprintf(name, 16, "blk_sec_buf-%u", size);
+		sector_buf_slabs[idx] = kmem_cache_create(name, size, 512,
+							  SLAB_PANIC, NULL);
+	}
+}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index faed9d9eb84c..a4117e526715 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1657,6 +1657,9 @@ extern int bdev_read_page(struct block_device *, sector_t, struct page *);
 extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
 
+extern void *blk_alloc_sec_buf(unsigned size, gfp_t flags);
+extern void blk_free_sec_buf(void *buf, int size);
+
 #ifdef CONFIG_BLK_DEV_ZONED
 bool blk_req_needs_zone_write_lock(struct request *rq);
 void __blk_req_zone_write_lock(struct request *rq);
@@ -1755,6 +1758,15 @@ static inline int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 	return 0;
 }
 
+static inline void *blk_alloc_sec_buf(unsigned size, gfp_t flags)
+{
+	return NULL;
+}
+
+static inline void blk_free_sec_buf(void *buf, int size)
+{
+}
+
 #endif /* CONFIG_BLOCK */
 
 static inline void blk_wake_io_task(struct task_struct *waiter)

Thanks,
Ming

  reply	other threads:[~2019-02-26 13:43 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-25  4:09 [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc Ming Lei
2019-02-25  4:36 ` Dave Chinner
2019-02-25  8:46   ` Ming Lei
2019-02-25 10:03     ` Ming Lei
2019-02-25 20:11     ` Dave Chinner
2019-02-25 13:15   ` Vlastimil Babka
2019-02-25 20:26     ` Dave Chinner
2019-02-26  2:22       ` Ming Lei
2019-02-26  3:02         ` Dave Chinner
2019-02-26  3:27           ` Matthew Wilcox
2019-02-26  4:58             ` Dave Chinner
2019-02-26  9:33               ` Ming Lei
2019-02-26 10:06                 ` Vlastimil Babka
2019-02-26 11:12                   ` Ming Lei
2019-02-26 12:12                     ` Matthew Wilcox
2019-02-26 12:35                       ` Ming Lei
2019-02-26 13:02                         ` Matthew Wilcox
2019-02-26 13:42                           ` Ming Lei [this message]
2019-02-26 14:04                             ` Matthew Wilcox
2019-02-26 16:14                               ` Darrick J. Wong
2019-02-26 16:19                                 ` Matthew Wilcox
2019-02-27  1:41                                   ` Ming Lei
2019-02-27  7:07                                   ` Vlastimil Babka
2019-03-08  8:18                                     ` Christoph Hellwig
2019-02-27 21:38                                 ` Dave Chinner
2019-02-26 15:30                             ` Christopher Lameter
2019-02-26 20:45                 ` Dave Chinner
2019-02-27  1:50                   ` Ming Lei
2019-02-27  3:41                     ` Dave Chinner
2019-02-26 15:20     ` Christopher Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190226134247.GA30942@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=aaron.lu@intel.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=axboe@kernel.dk \
    --cc=cl@linux.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=tom.leiming@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).