From: Vlastimil Babka <vbabka@suse.cz> To: Ming Lei <ming.lei@redhat.com>, Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@infradead.org>, "Darrick J . Wong" <darrick.wong@oracle.com>, linux-xfs@vger.kernel.org, Jens Axboe <axboe@kernel.dk>, Vitaly Kuznetsov <vkuznets@redhat.com>, Dave Chinner <dchinner@redhat.com>, Christoph Hellwig <hch@lst.de>, Alexander Duyck <alexander.h.duyck@linux.intel.com>, Aaron Lu <aaron.lu@intel.com>, Christopher Lameter <cl@linux.com>, Linux FS Devel <linux-fsdevel@vger.kernel.org>, linux-mm@kvack.org, linux-block@vger.kernel.org, Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com> Subject: Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc Date: Tue, 26 Feb 2019 11:06:12 +0100 [thread overview] Message-ID: <a641feb8-ceb2-2dac-27aa-7b1df10f5ae5@suse.cz> (raw) In-Reply-To: <20190226093302.GA24879@ming.t460p> On 2/26/19 10:33 AM, Ming Lei wrote: > On Tue, Feb 26, 2019 at 03:58:26PM +1100, Dave Chinner wrote: >> On Mon, Feb 25, 2019 at 07:27:37PM -0800, Matthew Wilcox wrote: >>> On Tue, Feb 26, 2019 at 02:02:14PM +1100, Dave Chinner wrote: >>>>> Or what is the exact size of sub-page IO in xfs most of time? For >>>> >>>> Determined by mkfs parameters. Any power of 2 between 512 bytes and >>>> 64kB needs to be supported. e.g: >>>> >>>> # mkfs.xfs -s size=512 -b size=1k -i size=2k -n size=8k .... >>>> >>>> will have metadata that is sector sized (512 bytes), filesystem >>>> block sized (1k), directory block sized (8k) and inode cluster sized >>>> (32k), and will use all of them in large quantities. >>> >>> If XFS is going to use each of these in large quantities, then it doesn't >>> seem unreasonable for XFS to create a slab for each type of metadata? >> >> >> Well, that is the question, isn't it? How many other filesystems >> will want to make similar "don't use entire pages just for 4k of >> metadata" optimisations as 64k page size machines become more >> common? There are others that have the same "use slab for sector >> aligned IO" which will fall foul of the same problem that has been >> reported for XFS.... >> >> If nobody else cares/wants it, then it can be XFS only. But it's >> only fair we address the "will it be useful to others" question >> first..... > > This kind of slab cache should have been global, just like interface of > kmalloc(size). > > However, the alignment requirement depends on block device's block size, > then it becomes hard to implement as genera interface, for example: > > block size: 512, 1024, 2048, 4096 > slab size: 512*N, 0 < N < PAGE_SIZE/512 > > For 4k page size, 28(7*4) slabs need to be created, and 64k page size > needs to create 127*4 slabs. > Where does the '*4' multiplier come from? So I wonder how hard would it actually be (+CC slab maintainers) to just guarantee generic kmalloc() alignment for power-of-two sizes. If we can do that for kmem_cache_create() then the code should be already there. AFAIK the alignment happens anyway (albeit not guaranteed) in the non-debug cases, and if guaranteeing alignment for certain debugging configurations (that need some space before the object) means larger memory overhead, then the cost should still be bearable since its non-standard configuration where the point is to catch bug and not have peak performance? > But, specific file system may only use some of them, and it depends > on meta data size. > > Thanks, > Ming >
WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz> To: Ming Lei <ming.lei@redhat.com>, Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@infradead.org>, "Darrick J . Wong" <darrick.wong@oracle.com>, linux-xfs@vger.kernel.org, Jens Axboe <axboe@kernel.dk>, Vitaly Kuznetsov <vkuznets@redhat.com>, Dave Chinner <dchinner@redhat.com>, Christoph Hellwig <hch@lst.de>, Alexander Duyck <alexander.h.duyck@linux.intel.com>, Aaron Lu <aaron.lu@intel.com>, Christopher Lameter <cl@linux.com>, Linux FS Devel <linux-fsdevel@vger.kernel.org>, linux-mm@kvack.org, linux-block@vger.kernel.orgChristoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com> Subject: Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc Date: Tue, 26 Feb 2019 11:06:12 +0100 [thread overview] Message-ID: <a641feb8-ceb2-2dac-27aa-7b1df10f5ae5@suse.cz> (raw) In-Reply-To: <20190226093302.GA24879@ming.t460p> On 2/26/19 10:33 AM, Ming Lei wrote: > On Tue, Feb 26, 2019 at 03:58:26PM +1100, Dave Chinner wrote: >> On Mon, Feb 25, 2019 at 07:27:37PM -0800, Matthew Wilcox wrote: >>> On Tue, Feb 26, 2019 at 02:02:14PM +1100, Dave Chinner wrote: >>>>> Or what is the exact size of sub-page IO in xfs most of time? For >>>> >>>> Determined by mkfs parameters. Any power of 2 between 512 bytes and >>>> 64kB needs to be supported. e.g: >>>> >>>> # mkfs.xfs -s size=512 -b size=1k -i size=2k -n size=8k .... >>>> >>>> will have metadata that is sector sized (512 bytes), filesystem >>>> block sized (1k), directory block sized (8k) and inode cluster sized >>>> (32k), and will use all of them in large quantities. >>> >>> If XFS is going to use each of these in large quantities, then it doesn't >>> seem unreasonable for XFS to create a slab for each type of metadata? >> >> >> Well, that is the question, isn't it? How many other filesystems >> will want to make similar "don't use entire pages just for 4k of >> metadata" optimisations as 64k page size machines become more >> common? There are others that have the same "use slab for sector >> aligned IO" which will fall foul of the same problem that has been >> reported for XFS.... >> >> If nobody else cares/wants it, then it can be XFS only. But it's >> only fair we address the "will it be useful to others" question >> first..... > > This kind of slab cache should have been global, just like interface of > kmalloc(size). > > However, the alignment requirement depends on block device's block size, > then it becomes hard to implement as genera interface, for example: > > block size: 512, 1024, 2048, 4096 > slab size: 512*N, 0 < N < PAGE_SIZE/512 > > For 4k page size, 28(7*4) slabs need to be created, and 64k page size > needs to create 127*4 slabs. > Where does the '*4' multiplier come from? So I wonder how hard would it actually be (+CC slab maintainers) to just guarantee generic kmalloc() alignment for power-of-two sizes. If we can do that for kmem_cache_create() then the code should be already there. AFAIK the alignment happens anyway (albeit not guaranteed) in the non-debug cases, and if guaranteeing alignment for certain debugging configurations (that need some space before the object) means larger memory overhead, then the cost should still be bearable since its non-standard configuration where the point is to catch bug and not have peak performance? > But, specific file system may only use some of them, and it depends > on meta data size. > > Thanks, > Ming >
next prev parent reply other threads:[~2019-02-26 10:08 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-02-25 4:09 [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc Ming Lei 2019-02-25 4:36 ` Dave Chinner 2019-02-25 8:46 ` Ming Lei 2019-02-25 10:03 ` Ming Lei 2019-02-25 20:11 ` Dave Chinner 2019-02-25 13:15 ` Vlastimil Babka 2019-02-25 20:26 ` Dave Chinner 2019-02-26 2:22 ` Ming Lei 2019-02-26 3:02 ` Dave Chinner 2019-02-26 3:27 ` Matthew Wilcox 2019-02-26 4:58 ` Dave Chinner 2019-02-26 9:33 ` Ming Lei 2019-02-26 10:06 ` Vlastimil Babka [this message] 2019-02-26 10:06 ` Vlastimil Babka 2019-02-26 11:12 ` Ming Lei 2019-02-26 11:12 ` Ming Lei 2019-02-26 12:12 ` Matthew Wilcox 2019-02-26 12:35 ` Ming Lei 2019-02-26 13:02 ` Matthew Wilcox 2019-02-26 13:42 ` Ming Lei 2019-02-26 14:04 ` Matthew Wilcox 2019-02-26 16:14 ` Darrick J. Wong 2019-02-26 16:19 ` Matthew Wilcox 2019-02-27 1:41 ` Ming Lei 2019-02-27 7:07 ` Vlastimil Babka 2019-03-08 8:18 ` Christoph Hellwig 2019-02-27 21:38 ` Dave Chinner 2019-02-26 15:30 ` Christopher Lameter 2019-02-26 15:30 ` Christopher Lameter 2019-02-26 20:45 ` Dave Chinner 2019-02-27 1:50 ` Ming Lei 2019-02-27 3:41 ` Dave Chinner 2019-02-26 15:20 ` Christopher Lameter 2019-02-26 15:20 ` Christopher Lameter
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=a641feb8-ceb2-2dac-27aa-7b1df10f5ae5@suse.cz \ --to=vbabka@suse.cz \ --cc=aaron.lu@intel.com \ --cc=alexander.h.duyck@linux.intel.com \ --cc=axboe@kernel.dk \ --cc=cl@linux.com \ --cc=darrick.wong@oracle.com \ --cc=david@fromorbit.com \ --cc=dchinner@redhat.com \ --cc=hch@lst.de \ --cc=iamjoonsoo.kim@lge.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-xfs@vger.kernel.org \ --cc=ming.lei@redhat.com \ --cc=penberg@kernel.org \ --cc=rientjes@google.com \ --cc=vkuznets@redhat.com \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.