* bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? @ 2019-02-06 22:11 Nix 2019-02-06 23:43 ` Dave Chinner 2019-02-07 8:16 ` Coly Li 0 siblings, 2 replies; 14+ messages in thread From: Nix @ 2019-02-06 22:11 UTC (permalink / raw) To: linux-bcache, linux-xfs; +Cc: linux-kernel So I just upgraded to 4.20 and revived my long-turned-off bcache now that the metadata corruption leading to mount failure on dirty close may have been identified (applying Tang Junhui's patch to do so)... and I spotted something a bit disturbing. It appears that XFS directory and metadata I/O is going more or less entirely uncached. Here's some bcache stats before and after a git status of a *huge* uncached tree (Chromium) on my no-writeback readaround cache. It takes many minutes and pounds the disk with massively seeky metadata I/O in the process: Before: stats_total/bypassed: 48.3G stats_total/cache_bypass_hits: 7942 stats_total/cache_bypass_misses: 861045 stats_total/cache_hit_ratio: 3 stats_total/cache_hits: 16286 stats_total/cache_miss_collisions: 25 stats_total/cache_misses: 411575 stats_total/cache_readaheads: 0 After: stats_total/bypassed: 49.3G stats_total/cache_bypass_hits: 7942 stats_total/cache_bypass_misses: 1154887 stats_total/cache_hit_ratio: 3 stats_total/cache_hits: 16291 stats_total/cache_miss_collisions: 25 stats_total/cache_misses: 411625 stats_total/cache_readaheads: 0 Huge increase in bypassed reads, essentially no new cached reads. This is... basically the optimum case for bcache, and it's not caching it! From my reading of xfs_dir2_leaf_readbuf(), it looks like essentially all directory reads in XFS appear to bcache as a single non-readahead followed by a pile of readahead I/O: bcache bypasses readahead bios, so all directory reads (or perhaps all directory reads larger than a single block) are going to be bypassed out of hand. This seems... suboptimal, but so does filling up the cache with read-ahead blocks (particularly for non-metadata) that are never used. Anyone got any ideas, 'cos I'm currently at a loss: XFS doesn't appear to let us distinguish between "read-ahead just in case but almost certain to be accessed" (like directory blocks) and "read ahead on the offchance because someone did a single-block file read and what the hell let's suck in a bunch more". As it is, this seems to render bcache more or less useless with XFS, since bcache's primary raison d'etre is precisely to cache seeky stuff like metadata. :( ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-06 22:11 bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? Nix @ 2019-02-06 23:43 ` Dave Chinner 2019-02-07 0:24 ` Andre Noll 2019-02-07 8:16 ` Coly Li 1 sibling, 1 reply; 14+ messages in thread From: Dave Chinner @ 2019-02-06 23:43 UTC (permalink / raw) To: Nix; +Cc: linux-bcache, linux-xfs, linux-kernel On Wed, Feb 06, 2019 at 10:11:21PM +0000, Nix wrote: > So I just upgraded to 4.20 and revived my long-turned-off bcache now > that the metadata corruption leading to mount failure on dirty close may > have been identified (applying Tang Junhui's patch to do so)... and I > spotted something a bit disturbing. It appears that XFS directory and > metadata I/O is going more or less entirely uncached. > > Here's some bcache stats before and after a git status of a *huge* > uncached tree (Chromium) on my no-writeback readaround cache. It takes > many minutes and pounds the disk with massively seeky metadata I/O in > the process: > > Before: > > stats_total/bypassed: 48.3G > stats_total/cache_bypass_hits: 7942 > stats_total/cache_bypass_misses: 861045 > stats_total/cache_hit_ratio: 3 > stats_total/cache_hits: 16286 > stats_total/cache_miss_collisions: 25 > stats_total/cache_misses: 411575 > stats_total/cache_readaheads: 0 > > After: > stats_total/bypassed: 49.3G > stats_total/cache_bypass_hits: 7942 > stats_total/cache_bypass_misses: 1154887 > stats_total/cache_hit_ratio: 3 > stats_total/cache_hits: 16291 > stats_total/cache_miss_collisions: 25 > stats_total/cache_misses: 411625 > stats_total/cache_readaheads: 0 > > Huge increase in bypassed reads, essentially no new cached reads. This > is... basically the optimum case for bcache, and it's not caching it! > > From my reading of xfs_dir2_leaf_readbuf(), it looks like essentially > all directory reads in XFS appear to bcache as a single non-readahead > followed by a pile of readahead I/O: bcache bypasses readahead bios, so > all directory reads (or perhaps all directory reads larger than a single > block) are going to be bypassed out of hand. That's a bcache problem, not an XFS problem. XFS does extensive amounts of metadata readahead (btree traversals, directory access, etc), and always has. If bcache considers readahead as "not worth caching" then that has nothing to do with XFS. > > This seems... suboptimal, but so does filling up the cache with > read-ahead blocks (particularly for non-metadata) that are never used. Which is not the case for XFS. We do readahead when we know we are going to need a block in the near future. It is rarely unnecessary, it's a mechanism to reduce access latency when we do need to access the metadata. > Anyone got any ideas, 'cos I'm currently at a loss: XFS doesn't appear > to let us distinguish between "read-ahead just in case but almost > certain to be accessed" (like directory blocks) and "read ahead on the > offchance because someone did a single-block file read and what the hell > let's suck in a bunch more". File data readahead: REQ_RAHEAD Metadata readahead: REQ_META | REQ_RAHEAD drivers/md/bcache/request.c::check_should_bypass(): /* * Flag for bypass if the IO is for read-ahead or background, * unless the read-ahead request is for metadata (eg, for gfs2). */ if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && !(bio->bi_opf & REQ_PRIO)) goto skip; bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's wrong - REQ_META means it's metadata IO, and so this is a bcache bug. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-06 23:43 ` Dave Chinner @ 2019-02-07 0:24 ` Andre Noll 2019-02-07 2:26 ` Dave Chinner 2019-02-07 2:27 ` Coly Li 0 siblings, 2 replies; 14+ messages in thread From: Andre Noll @ 2019-02-07 0:24 UTC (permalink / raw) To: Dave Chinner; +Cc: Nix, linux-bcache, linux-xfs, linux-kernel, Coly Li [-- Attachment #1: Type: text/plain, Size: 893 bytes --] On Thu, Feb 07, 10:43, Dave Chinner wrote > File data readahead: REQ_RAHEAD > Metadata readahead: REQ_META | REQ_RAHEAD > > drivers/md/bcache/request.c::check_should_bypass(): > > /* > * Flag for bypass if the IO is for read-ahead or background, > * unless the read-ahead request is for metadata (eg, for gfs2). > */ > if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && > !(bio->bi_opf & REQ_PRIO)) > goto skip; > > bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's > wrong - REQ_META means it's metadata IO, and so this is a bcache > bug. Do you think 752f66a75abad is bad (ha!) and should be reverted? Thanks Andre -- Max Planck Institute for Developmental Biology Max-Planck-Ring 5, 72076 Tübingen, Germany. Phone: (+49) 7071 601 829 http://people.tuebingen.mpg.de/maan/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 0:24 ` Andre Noll @ 2019-02-07 2:26 ` Dave Chinner 2019-02-07 2:38 ` Coly Li 2019-02-07 2:27 ` Coly Li 1 sibling, 1 reply; 14+ messages in thread From: Dave Chinner @ 2019-02-07 2:26 UTC (permalink / raw) To: Andre Noll; +Cc: Nix, linux-bcache, linux-xfs, linux-kernel, Coly Li On Thu, Feb 07, 2019 at 01:24:25AM +0100, Andre Noll wrote: > On Thu, Feb 07, 10:43, Dave Chinner wrote > > File data readahead: REQ_RAHEAD > > Metadata readahead: REQ_META | REQ_RAHEAD > > > > drivers/md/bcache/request.c::check_should_bypass(): > > > > /* > > * Flag for bypass if the IO is for read-ahead or background, > > * unless the read-ahead request is for metadata (eg, for gfs2). > > */ > > if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && > > !(bio->bi_opf & REQ_PRIO)) > > goto skip; > > > > bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's > > wrong - REQ_META means it's metadata IO, and so this is a bcache > > bug. > > Do you think 752f66a75abad is bad (ha!) and should be reverted? Yes, that change is just broken. From include/linux/blk_types.h: __REQ_META, /* metadata io request */ __REQ_PRIO, /* boost priority in cfq */ i.e. REQ_META means that it is a metadata request, REQ_PRIO means it is a "high priority" request. Two completely different things, often combined, but not interchangeable. So, yeah, that needs to be reverted if you want bcache to function properly for metadata caching. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 2:26 ` Dave Chinner @ 2019-02-07 2:38 ` Coly Li 2019-02-07 3:10 ` Dave Chinner 2019-02-07 13:10 ` Nix 0 siblings, 2 replies; 14+ messages in thread From: Coly Li @ 2019-02-07 2:38 UTC (permalink / raw) To: Dave Chinner, Andre Noll Cc: Nix, linux-bcache, linux-xfs, linux-kernel, Christoph Hellwig, axboe On 2019/2/7 10:26 上午, Dave Chinner wrote: > On Thu, Feb 07, 2019 at 01:24:25AM +0100, Andre Noll wrote: >> On Thu, Feb 07, 10:43, Dave Chinner wrote >>> File data readahead: REQ_RAHEAD >>> Metadata readahead: REQ_META | REQ_RAHEAD >>> >>> drivers/md/bcache/request.c::check_should_bypass(): >>> >>> /* >>> * Flag for bypass if the IO is for read-ahead or background, >>> * unless the read-ahead request is for metadata (eg, for gfs2). >>> */ >>> if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && >>> !(bio->bi_opf & REQ_PRIO)) >>> goto skip; >>> >>> bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's >>> wrong - REQ_META means it's metadata IO, and so this is a bcache >>> bug. >> >> Do you think 752f66a75abad is bad (ha!) and should be reverted? > > Yes, that change is just broken. From include/linux/blk_types.h: > > __REQ_META, /* metadata io request */ > __REQ_PRIO, /* boost priority in cfq */ > > Hi Dave, > i.e. REQ_META means that it is a metadata request, REQ_PRIO means it > is a "high priority" request. Two completely different things, often > combined, but not interchangeable. I found in file system metadata IO, most of time REQ_META and REQ_PRIO are tagged together for bio, but XFS seems not use REQ_PRIO. Is there any basic principle for when should these tags to be used or not ? e.g. If REQ_META is enough for meta data I/O, why REQ_PRIO is used too. And if REQ_PRIO is necessary, why it is not used in fs/xfs/ code ? > > So, yeah, that needs to be reverted if you want bcache to function > properly for metadata caching. Sure, I will fix this, once I make it clear to me. Thanks for the hint. -- Coly Li ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 2:38 ` Coly Li @ 2019-02-07 3:10 ` Dave Chinner 2019-02-07 8:18 ` Coly Li 2019-02-07 13:10 ` Nix 1 sibling, 1 reply; 14+ messages in thread From: Dave Chinner @ 2019-02-07 3:10 UTC (permalink / raw) To: Coly Li Cc: Andre Noll, Nix, linux-bcache, linux-xfs, linux-kernel, Christoph Hellwig, axboe On Thu, Feb 07, 2019 at 10:38:58AM +0800, Coly Li wrote: > On 2019/2/7 10:26 上午, Dave Chinner wrote: > > On Thu, Feb 07, 2019 at 01:24:25AM +0100, Andre Noll wrote: > >> On Thu, Feb 07, 10:43, Dave Chinner wrote > >>> File data readahead: REQ_RAHEAD > >>> Metadata readahead: REQ_META | REQ_RAHEAD > >>> > >>> drivers/md/bcache/request.c::check_should_bypass(): > >>> > >>> /* > >>> * Flag for bypass if the IO is for read-ahead or background, > >>> * unless the read-ahead request is for metadata (eg, for gfs2). > >>> */ > >>> if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && > >>> !(bio->bi_opf & REQ_PRIO)) > >>> goto skip; > >>> > >>> bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's > >>> wrong - REQ_META means it's metadata IO, and so this is a bcache > >>> bug. > >> > >> Do you think 752f66a75abad is bad (ha!) and should be reverted? > > > > Yes, that change is just broken. From include/linux/blk_types.h: > > > > __REQ_META, /* metadata io request */ > > __REQ_PRIO, /* boost priority in cfq */ > > > > > > Hi Dave, > > > i.e. REQ_META means that it is a metadata request, REQ_PRIO means it > > is a "high priority" request. Two completely different things, often > > combined, but not interchangeable. > > I found in file system metadata IO, most of time REQ_META and REQ_PRIO > are tagged together for bio, but XFS seems not use REQ_PRIO. Yes, that's correct, because we don't specifically prioritise metadata IO over data IO. > Is there any basic principle for when should these tags to be used or > not ? Yes. > e.g. If REQ_META is enough for meta data I/O, why REQ_PRIO is used > too. REQ_META is used for metadata. REQ_PRIO is used to communicate to the lower layers that the submitter considers this IO to be more important that non REQ_PRIO IO and so dispatch should be expedited. IOWs, if the filesystem considers metadata IO to be more important that user data IO, then it will use REQ_PRIO | REQ_META rather than just REQ_META. Historically speaking, REQ_PRIO was a hack for CFQ to get it to dispatch journal IO from a different thread without waiting for a time slice to expire. In the XFs world, we just said "don't use CFQ, it's fundametnally broken for highly concurrent applications" and didn't bother trying to hack around the limitations of CFQ. These days REQ_PRIO is only used by the block layer writeback throttle, but unlike bcache it considers both REQ_META and REQ_PRIO to mean the same thing. REQ_META, OTOH, is used by BFQ and blk-cgroup to detect metadata IO and don't look at REQ_PRIO at all. So, really, REQ_META is for metadata, not REQ_PRIO. REQ_PRIO looks like it should just go away. > And if REQ_PRIO is necessary, why it is not used in fs/xfs/ code ? It's not necessary, it's just an /optimisation/ that some filesystems make and some IO schedulers used to pay attention to. It looks completely redundant now. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 3:10 ` Dave Chinner @ 2019-02-07 8:18 ` Coly Li 0 siblings, 0 replies; 14+ messages in thread From: Coly Li @ 2019-02-07 8:18 UTC (permalink / raw) To: Dave Chinner Cc: Andre Noll, Nix, linux-bcache, linux-xfs, linux-kernel, Christoph Hellwig, axboe On 2019/2/7 11:10 上午, Dave Chinner wrote: > On Thu, Feb 07, 2019 at 10:38:58AM +0800, Coly Li wrote: >> On 2019/2/7 10:26 上午, Dave Chinner wrote: >>> On Thu, Feb 07, 2019 at 01:24:25AM +0100, Andre Noll wrote: >>>> On Thu, Feb 07, 10:43, Dave Chinner wrote >>>>> File data readahead: REQ_RAHEAD >>>>> Metadata readahead: REQ_META | REQ_RAHEAD >>>>> >>>>> drivers/md/bcache/request.c::check_should_bypass(): >>>>> >>>>> /* >>>>> * Flag for bypass if the IO is for read-ahead or background, >>>>> * unless the read-ahead request is for metadata (eg, for gfs2). >>>>> */ >>>>> if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && >>>>> !(bio->bi_opf & REQ_PRIO)) >>>>> goto skip; >>>>> >>>>> bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's >>>>> wrong - REQ_META means it's metadata IO, and so this is a bcache >>>>> bug. >>>> >>>> Do you think 752f66a75abad is bad (ha!) and should be reverted? >>> >>> Yes, that change is just broken. From include/linux/blk_types.h: >>> >>> __REQ_META, /* metadata io request */ >>> __REQ_PRIO, /* boost priority in cfq */ >>> >>> >> >> Hi Dave, >> >>> i.e. REQ_META means that it is a metadata request, REQ_PRIO means it >>> is a "high priority" request. Two completely different things, often >>> combined, but not interchangeable. >> >> I found in file system metadata IO, most of time REQ_META and REQ_PRIO >> are tagged together for bio, but XFS seems not use REQ_PRIO. > > Yes, that's correct, because we don't specifically prioritise > metadata IO over data IO. > >> Is there any basic principle for when should these tags to be used or >> not ? > > Yes. > >> e.g. If REQ_META is enough for meta data I/O, why REQ_PRIO is used >> too. > > REQ_META is used for metadata. REQ_PRIO is used to communicate to > the lower layers that the submitter considers this IO to be more > important that non REQ_PRIO IO and so dispatch should be expedited. > > IOWs, if the filesystem considers metadata IO to be more important > that user data IO, then it will use REQ_PRIO | REQ_META rather than > just REQ_META. > > Historically speaking, REQ_PRIO was a hack for CFQ to get it to > dispatch journal IO from a different thread without waiting for a > time slice to expire. In the XFs world, we just said "don't use CFQ, > it's fundametnally broken for highly concurrent applications" and > didn't bother trying to hack around the limitations of CFQ. > > These days REQ_PRIO is only used by the block layer writeback > throttle, but unlike bcache it considers both REQ_META and REQ_PRIO > to mean the same thing. > > REQ_META, OTOH, is used by BFQ and blk-cgroup to detect metadata > IO and don't look at REQ_PRIO at all. So, really, REQ_META is for > metadata, not REQ_PRIO. REQ_PRIO looks like it should just go away. > >> And if REQ_PRIO is necessary, why it is not used in fs/xfs/ code ? > > It's not necessary, it's just an /optimisation/ that some > filesystems make and some IO schedulers used to pay attention to. It > looks completely redundant now. Hi Dave, Thanks for your detailed explanation. This is great hint from view of file system developer :-) I just compose a fix, hope it makes better. -- Coly Li ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 2:38 ` Coly Li 2019-02-07 3:10 ` Dave Chinner @ 2019-02-07 13:10 ` Nix 1 sibling, 0 replies; 14+ messages in thread From: Nix @ 2019-02-07 13:10 UTC (permalink / raw) To: Coly Li Cc: Dave Chinner, Andre Noll, linux-bcache, linux-xfs, linux-kernel, Christoph Hellwig, axboe On 7 Feb 2019, Coly Li stated: > On 2019/2/7 10:26 上午, Dave Chinner wrote: >> So, yeah, that needs to be reverted if you want bcache to function >> properly for metadata caching. > > Sure, I will fix this, once I make it clear to me. I'll give it a test :) The meaning of these flags was somewhat opaque to me, too (mostly due to novelty: I've never really looked at anything in the block layer before). -- NULL && (void) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 0:24 ` Andre Noll 2019-02-07 2:26 ` Dave Chinner @ 2019-02-07 2:27 ` Coly Li 2019-02-07 9:28 ` Andre Noll 1 sibling, 1 reply; 14+ messages in thread From: Coly Li @ 2019-02-07 2:27 UTC (permalink / raw) To: Andre Noll, Dave Chinner Cc: Nix, linux-bcache, linux-xfs, linux-kernel, Christoph Hellwig, axboe On 2019/2/7 8:24 上午, Andre Noll wrote: > On Thu, Feb 07, 10:43, Dave Chinner wrote >> File data readahead: REQ_RAHEAD Metadata readahead: REQ_META | >> REQ_RAHEAD >> >> drivers/md/bcache/request.c::check_should_bypass(): >> >> /* * Flag for bypass if the IO is for read-ahead or background, * >> unless the read-ahead request is for metadata (eg, for gfs2). */ >> if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && !(bio->bi_opf & >> REQ_PRIO)) goto skip; >> >> bcache needs fixing - it thinks REQ_PRIO means metadata IO. >> That's wrong - REQ_META means it's metadata IO, and so this is a >> bcache bug. > > Do you think 752f66a75abad is bad (ha!) and should be reverted? Hi Dave and Andre, Correct me if I am wrong: REQ_META is used for blktrace to tag metadata IO, and REQ_PRIO is used for block layer to handle metadata IO. I discussed with Christoph Hellwig about this topic quite long time ago, and got the above conclusion. If different file system handles metadata flags in unified ways, it is OK to me to change the code to: !(bio->bi_opf & (REQ_META |REQ_PRIO)). Thanks in advance. -- Coly Li ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 2:27 ` Coly Li @ 2019-02-07 9:28 ` Andre Noll 0 siblings, 0 replies; 14+ messages in thread From: Andre Noll @ 2019-02-07 9:28 UTC (permalink / raw) To: Coly Li Cc: Dave Chinner, Nix, linux-bcache, linux-xfs, linux-kernel, Christoph Hellwig, axboe [-- Attachment #1: Type: text/plain, Size: 845 bytes --] On Thu, Feb 07, 10:27, Coly Li wrote > If different file system handles metadata flags in unified ways, it is > OK to me to change the code to: !(bio->bi_opf & (REQ_META |REQ_PRIO)). Yes, that's the smallest fix that should also go into 4.19-stable. In the long run, we should try to get rid of the 45 instances of REQ_PRIO. Most users specify REQ_META | REQ_PRIO anyway, which leaves only a few other instances to look at. I think the one in submit_bh_wbc() of fs/buffer.c can just be removed while block/cfq-iosched.c does not use REQ_META at all, so the simple s/REQ_PRIO/REQ_META should be OK. drivers/staging/erofs/data.c is also easy to fix. Best Andre -- Max Planck Institute for Developmental Biology Max-Planck-Ring 5, 72076 Tübingen, Germany. Phone: (+49) 7071 601 829 http://people.tuebingen.mpg.de/maan/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-06 22:11 bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? Nix 2019-02-06 23:43 ` Dave Chinner @ 2019-02-07 8:16 ` Coly Li 2019-02-07 9:41 ` Andre Noll 2019-02-07 20:51 ` Nix 1 sibling, 2 replies; 14+ messages in thread From: Coly Li @ 2019-02-07 8:16 UTC (permalink / raw) To: Nix; +Cc: linux-bcache, linux-xfs, linux-kernel, Dave Chinner, Andre Noll [-- Attachment #1: Type: text/plain, Size: 2464 bytes --] On 2019/2/7 6:11 上午, Nix wrote: > So I just upgraded to 4.20 and revived my long-turned-off bcache now > that the metadata corruption leading to mount failure on dirty close may > have been identified (applying Tang Junhui's patch to do so)... and I > spotted something a bit disturbing. It appears that XFS directory and > metadata I/O is going more or less entirely uncached. > > Here's some bcache stats before and after a git status of a *huge* > uncached tree (Chromium) on my no-writeback readaround cache. It takes > many minutes and pounds the disk with massively seeky metadata I/O in > the process: > > Before: > > stats_total/bypassed: 48.3G > stats_total/cache_bypass_hits: 7942 > stats_total/cache_bypass_misses: 861045 > stats_total/cache_hit_ratio: 3 > stats_total/cache_hits: 16286 > stats_total/cache_miss_collisions: 25 > stats_total/cache_misses: 411575 > stats_total/cache_readaheads: 0 > > After: > stats_total/bypassed: 49.3G > stats_total/cache_bypass_hits: 7942 > stats_total/cache_bypass_misses: 1154887 > stats_total/cache_hit_ratio: 3 > stats_total/cache_hits: 16291 > stats_total/cache_miss_collisions: 25 > stats_total/cache_misses: 411625 > stats_total/cache_readaheads: 0 > > Huge increase in bypassed reads, essentially no new cached reads. This > is... basically the optimum case for bcache, and it's not caching it! > > From my reading of xfs_dir2_leaf_readbuf(), it looks like essentially > all directory reads in XFS appear to bcache as a single non-readahead > followed by a pile of readahead I/O: bcache bypasses readahead bios, so > all directory reads (or perhaps all directory reads larger than a single > block) are going to be bypassed out of hand. > > This seems... suboptimal, but so does filling up the cache with > read-ahead blocks (particularly for non-metadata) that are never used. > Anyone got any ideas, 'cos I'm currently at a loss: XFS doesn't appear > to let us distinguish between "read-ahead just in case but almost > certain to be accessed" (like directory blocks) and "read ahead on the > offchance because someone did a single-block file read and what the hell > let's suck in a bunch more". > > As it is, this seems to render bcache more or less useless with XFS, > since bcache's primary raison d'etre is precisely to cache seeky stuff > like metadata. :( > Hi Nix, Could you please to try whether the attached patch makes things better ? Thanks in advance for your help. -- Coly Li [-- Attachment #2: 0001-bcache-use-REQ_META-REQ_PRIO-to-indicate-bio-for-met.patch --] [-- Type: text/plain, Size: 2704 bytes --] From 7c27e26017f6297a6bc6a8075732f69d3edcc52e Mon Sep 17 00:00:00 2001 From: Coly Li <colyli@suse.de> Date: Thu, 7 Feb 2019 15:54:24 +0800 Subject: [PATCH] bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio for metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio. This assumption is not always correct, e.g. XFS uses REQ_META to mark metadata bio other than REQ_PRIO. This is why Nix reports a regression that bcache does not cache metadata for XFS after the above commit. Thanks to Dave Chinner, he explains the difference between REQ_META and REQ_PRIO from view of file system developer. Here I quote part of his explanation from mailing list, REQ_META is used for metadata. REQ_PRIO is used to communicate to the lower layers that the submitter considers this IO to be more important that non REQ_PRIO IO and so dispatch should be expedited. IOWs, if the filesystem considers metadata IO to be more important that user data IO, then it will use REQ_PRIO | REQ_META rather than just REQ_META. Then it seems bios with REQ_META or REQ_PRIO should both be cached for performance optimation, because they are all probably low I/O latency demand by upper layer (e.g. file system). So in this patch, when we want to check whether a bio is metadata related, REQ_META and REQ_PRIO are both checked. Then both metadata and high priority I/O requests will be handled properly. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Coly Li <colyli@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Andre Noll <maan@tuebingen.mpg.de> Cc: Christoph Hellwig <hch@lst.de> --- drivers/md/bcache/request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 3bf35914bb57..62bda90a38dc 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -395,7 +395,7 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) * unless the read-ahead request is for metadata (eg, for gfs2 or xfs). */ if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && - !(bio->bi_opf & REQ_PRIO)) + !(bio->bi_opf & (REQ_META|REQ_PRIO))) goto skip; if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) || @@ -877,7 +877,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s, } if (!(bio->bi_opf & REQ_RAHEAD) && - !(bio->bi_opf & REQ_PRIO) && + !(bio->bi_opf & (REQ_META|REQ_PRIO)) && s->iop.c->gc_stats.in_use < CUTOFF_CACHE_READA) reada = min_t(sector_t, dc->readahead >> 9, get_capacity(bio->bi_disk) - bio_end_sector(bio)); -- 2.16.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 8:16 ` Coly Li @ 2019-02-07 9:41 ` Andre Noll 2019-02-07 10:23 ` Coly Li 2019-02-07 20:51 ` Nix 1 sibling, 1 reply; 14+ messages in thread From: Andre Noll @ 2019-02-07 9:41 UTC (permalink / raw) To: Coly Li; +Cc: Nix, linux-bcache, linux-xfs, linux-kernel, Dave Chinner [-- Attachment #1: Type: text/plain, Size: 1899 bytes --] On Thu, Feb 07, 16:16, Coly Li wrote > From: Coly Li <colyli@suse.de> > Date: Thu, 7 Feb 2019 15:54:24 +0800 > Subject: [PATCH] bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata > > In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio for > metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio. > This assumption is not always correct, e.g. XFS uses REQ_META to mark > metadata bio other than REQ_PRIO. This is why Nix reports a regression > that bcache does not cache metadata for XFS after the above commit. maybe s/reports a regression/noticed > Thanks to Dave Chinner, he explains the difference between REQ_META and > REQ_PRIO from view of file system developer. Here I quote part of his > explanation from mailing list, > REQ_META is used for metadata. REQ_PRIO is used to communicate to > the lower layers that the submitter considers this IO to be more > important that non REQ_PRIO IO and so dispatch should be expedited. > > IOWs, if the filesystem considers metadata IO to be more important > that user data IO, then it will use REQ_PRIO | REQ_META rather than > just REQ_META. > > Then it seems bios with REQ_META or REQ_PRIO should both be cached for > performance optimation, because they are all probably low I/O latency > demand by upper layer (e.g. file system). > > So in this patch, when we want to check whether a bio is metadata > related, REQ_META and REQ_PRIO are both checked. Then both metadata and > high priority I/O requests will be handled properly. s/check whether a bio is metadata related/decide whether to bypass the cache Apart from these two nitpicks, feel free to add my Reviewed-by. Thanks Andre -- Max Planck Institute for Developmental Biology Max-Planck-Ring 5, 72076 Tübingen, Germany. Phone: (+49) 7071 601 829 http://people.tuebingen.mpg.de/maan/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 9:41 ` Andre Noll @ 2019-02-07 10:23 ` Coly Li 0 siblings, 0 replies; 14+ messages in thread From: Coly Li @ 2019-02-07 10:23 UTC (permalink / raw) To: Andre Noll; +Cc: Nix, linux-bcache, linux-xfs, linux-kernel, Dave Chinner -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2019/2/7 5:41 下午, Andre Noll wrote: > On Thu, Feb 07, 16:16, Coly Li wrote >> From: Coly Li <colyli@suse.de> Date: Thu, 7 Feb 2019 15:54:24 >> +0800 Subject: [PATCH] bcache: use (REQ_META|REQ_PRIO) to >> indicate bio for metadata >> >> In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio >> for metadata")' REQ_META is replaced by REQ_PRIO to indicate >> metadata bio. This assumption is not always correct, e.g. XFS >> uses REQ_META to mark metadata bio other than REQ_PRIO. This is >> why Nix reports a regression that bcache does not cache metadata >> for XFS after the above commit. > > maybe s/reports a regression/noticed > >> Thanks to Dave Chinner, he explains the difference between >> REQ_META and REQ_PRIO from view of file system developer. Here I >> quote part of his explanation from mailing list, REQ_META is used >> for metadata. REQ_PRIO is used to communicate to the lower layers >> that the submitter considers this IO to be more important that >> non REQ_PRIO IO and so dispatch should be expedited. >> >> IOWs, if the filesystem considers metadata IO to be more >> important that user data IO, then it will use REQ_PRIO | REQ_META >> rather than just REQ_META. >> >> Then it seems bios with REQ_META or REQ_PRIO should both be >> cached for performance optimation, because they are all probably >> low I/O latency demand by upper layer (e.g. file system). >> >> So in this patch, when we want to check whether a bio is >> metadata related, REQ_META and REQ_PRIO are both checked. Then >> both metadata and high priority I/O requests will be handled >> properly. > > s/check whether a bio is metadata related/decide whether to bypass > the cache > > Apart from these two nitpicks, feel free to add my Reviewed-by. Hi Andre, Thanks for your review :-) - -- Coly Li -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEE6j5FL/T5SGCN6PrQxzkHk2t9+PwFAlxcBy0ACgkQxzkHk2t9 +PxmeRAAnIh3KzGx13lmpj/0sDlellc02eawvooSDE8zTNsqo/UzqAB1lzUvtJx1 SQXYepoM4hPjrGzDKBe+oaMWuzG6WO08kI0dMT1Q1cfKX2eucExa8G9oRHQqxxBq wRChl4HKScqvyCy0XTQYet8QJwFAeBAdBzM2L0VOuulsRjLKBSfpcH+jr9fdNt/S wWGv9KVlJDq3BZrRA0TA1ovDl4wohyvtXM4OL27ahUMlmj4cI0XlaoketaAVM8QS z7uK+9sRdemW3cvfkl76wfIkiSN43Pk6RZcKRAHzlSgwF1L3xY7ZPxhFz187bFzI ANpTLVXiv1JNx+jnceV4O7UVZyj06hQhh1VFLedqpAVvzgYNp8RRSaiWucNH7m5d 6qzw5+ob5jMEI2vdXooYC8hVwNaXa+3XfVU5QLlKVG9Al1kZxgkE3e8s28pjj3Df WJXaqkOCnal12FZ9Hz+Ev/Jp/1wmN3TagPJyeVlc/0vQTIXfBFuSssL+n/1RrKjd wTc2WuWBf2x7BLnuCX6z4fDvPLuW8LOJCrCYsqr4z5s7YI6mb0B4qDSqli8dN2TB vxnY1K1WQW/9BhE/QQpXHKBD2nRbPYcaAZEhk3UvIdV2dvafdQpcD4FgrEXGKXV8 MmkaeTcQ4NcSNvGWgG9vaiY0acE8YGtqNsBDAtjOglJrP/tJjJA= =kG5J -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? 2019-02-07 8:16 ` Coly Li 2019-02-07 9:41 ` Andre Noll @ 2019-02-07 20:51 ` Nix 1 sibling, 0 replies; 14+ messages in thread From: Nix @ 2019-02-07 20:51 UTC (permalink / raw) To: Coly Li; +Cc: linux-bcache, linux-xfs, linux-kernel, Dave Chinner, Andre Noll On 7 Feb 2019, Coly Li uttered the following: > On 2019/2/7 6:11 上午, Nix wrote: >> As it is, this seems to render bcache more or less useless with XFS, >> since bcache's primary raison d'etre is precisely to cache seeky stuff >> like metadata. :( > > Hi Nix, > > Could you please to try whether the attached patch makes things better ? Looks good! Before huge tree cp -al: loom:~# bcache-stats stats_total/bypassed: 1.0G stats_total/cache_bypass_hits: 16 stats_total/cache_bypass_misses: 26436 stats_total/cache_hit_ratio: 46 stats_total/cache_hits: 24349 stats_total/cache_miss_collisions: 8 stats_total/cache_misses: 27898 stats_total/cache_readaheads: 0 After: stats_total/bypassed: 1.1G stats_total/cache_bypass_hits: 16 stats_total/cache_bypass_misses: 27176 stats_total/cache_hit_ratio: 43 stats_total/cache_hits: 24443 stats_total/cache_miss_collisions: 9 stats_total/cache_misses: 32152 <---- stats_total/cache_readaheads: 0 So loads of new misses. (A bunch of bypassed misses too. Not sure where those came from, maybe some larger sequential reads somewhere, but a lot is getting cached now, and every bit of metadata that gets cached means things get a bit faster.) btw I have ported ewheeler's ioprio-based cache hinting patch to 4.20; I/O below the ioprio threshold bypasses everything, even metadata and REQ_PRIO stuff. It was trivial, but I was able to spot and fix a tiny bypass accounting bug in the patch in the process): see http://www.esperi.org.uk/~nix/bundles/bcache-ioprio.bundle. (I figured you didn't want almost exactly the same patch series as before posted to the list, but I can do that if you prefer.) Put this all together and it seems to work very well: my test massive compile triggered 500MiB of metadata writes at the start and then the actual compile (being entirely sequential reads) hardly wrote anything out and was almost entirely bypassed: meanwhile a huge git push I ran at idle priority didn't pollute the cache at all. Excellent! (I'm also keeping write volumes down by storing transient things like objdirs that just sit in the page cache and then get deleted on a separate non-bcached, non-journalled ext4 fs at the start of the the spinning rust disk, with subdirs of this fs bind-mounted into various places as needed. I should make the scripts that do that public because they seem likely to be useful to bcache users...) Semi-unrelated side note: after my most recent reboot, which involved a bcache journal replay even though my shutdown was clean, the stats_total reset; the cache device's bcache/written and bcache/set/cache_available_percent also flipped to 0 and 100%,. I suspect this is merely a stats bug of some sort, because the boot was notably faster than before and cache_hits was about 6000 by the time it was done. bcache/priority_stats *does* say that the cache is "only" 98% unused, like it did before. Maybe cache_available_percent doesn't mean what I thought it did. -- NULL && (void) ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-02-07 20:51 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-06 22:11 bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? Nix 2019-02-06 23:43 ` Dave Chinner 2019-02-07 0:24 ` Andre Noll 2019-02-07 2:26 ` Dave Chinner 2019-02-07 2:38 ` Coly Li 2019-02-07 3:10 ` Dave Chinner 2019-02-07 8:18 ` Coly Li 2019-02-07 13:10 ` Nix 2019-02-07 2:27 ` Coly Li 2019-02-07 9:28 ` Andre Noll 2019-02-07 8:16 ` Coly Li 2019-02-07 9:41 ` Andre Noll 2019-02-07 10:23 ` Coly Li 2019-02-07 20:51 ` Nix
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).