From: Kent Overstreet <kent.overstreet@linux.dev> To: Jan Kara <jack@suse.cz> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-bcachefs@vger.kernel.org, Kent Overstreet <kent.overstreet@gmail.com>, "Darrick J . Wong" <djwong@kernel.org>, dhowells@redhat.com, Andreas Gruenbacher <agruenba@redhat.com>, cluster-devel@redhat.com, Bob Peterson <rpeterso@redhat.com> Subject: Re: [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Date: Thu, 25 May 2023 17:36:50 -0400 [thread overview] Message-ID: <ZG/U8pl8Zp3Wei7Z@moria.home.lan> (raw) In-Reply-To: <20230525084731.losrlnarpbqtqzil@quack3> On Thu, May 25, 2023 at 10:47:31AM +0200, Jan Kara wrote: > If we submit direct IO that uses mapped file F at offset O as a buffer for > direct IO from file F, offset O, it will currently livelock in an > indefinite retry loop. It should rather return error or fall back to > buffered IO. But that should be fixable. Andreas? > > But if the buffer and direct IO range does not overlap, it will just > happily work - iomap_dio_rw() invalidates only the range direct IO is done > to. *nod* readahead triggered from the page fault path is another consideration. No idea how that interacts with the gf2s method; IIRC there's a hack in the page fault path that says somewhere "we may be getting called via gup(), don't invoke readahead". We could potentially kill that hack if we lifted this to the VFS layer. > > > What happens if we race with the pages we faulted in being evicted? > > We fault them in again and retry. > > > > Also good that you've written a fstest for this, that is definitely a useful > > > addition, although I suspect GFS2 guys added a test for this not so long > > > ago when testing their stuff. Maybe they have a pointer handy? > > > > More tests more good. > > > > So if we want to lift this scheme to the VFS layer, we'd start by > > replacing the lock you added (grepping for it, the name escapes me) with > > a different type of lock - two_state_shared_lock in my code, it's like a > > rw lock except writers don't exclude other writers. That way the DIO > > path can use it without singlethreading writes to a single file. > > Yes, I've noticed that you are introducing in bcachefs a lock with very > similar semantics to mapping->invalidate_lock, just with this special lock > type. What I'm kind of worried about with two_state_shared_lock as > implemented in bcachefs is the fairness. AFAICS so far if someone is e.g. > heavily faulting pages on a file, direct IO to that file can be starved > indefinitely. That is IMHO not a good thing and I would not like to use > this type of lock in VFS until this problem is resolved. But it should be > fixable e.g. by introducing some kind of deadline for a waiter after which > it will block acquisitions of the other lock state. Yeah, my two_state_shared lock is definitely at the quick and dirty prototype level, the implementation would need work. Lockdep support would be another hard requirement. The deadline might be a good idea, OTOH it'd want tuning. Maybe something like what rwsem does where we block new read acquirerers if there's a writer waiting would work.
WARNING: multiple messages have this Message-ID (diff)
From: Kent Overstreet <kent.overstreet@linux.dev> To: cluster-devel.redhat.com Subject: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Date: Thu, 25 May 2023 17:36:50 -0400 [thread overview] Message-ID: <ZG/U8pl8Zp3Wei7Z@moria.home.lan> (raw) In-Reply-To: <20230525084731.losrlnarpbqtqzil@quack3> On Thu, May 25, 2023 at 10:47:31AM +0200, Jan Kara wrote: > If we submit direct IO that uses mapped file F at offset O as a buffer for > direct IO from file F, offset O, it will currently livelock in an > indefinite retry loop. It should rather return error or fall back to > buffered IO. But that should be fixable. Andreas? > > But if the buffer and direct IO range does not overlap, it will just > happily work - iomap_dio_rw() invalidates only the range direct IO is done > to. *nod* readahead triggered from the page fault path is another consideration. No idea how that interacts with the gf2s method; IIRC there's a hack in the page fault path that says somewhere "we may be getting called via gup(), don't invoke readahead". We could potentially kill that hack if we lifted this to the VFS layer. > > > What happens if we race with the pages we faulted in being evicted? > > We fault them in again and retry. > > > > Also good that you've written a fstest for this, that is definitely a useful > > > addition, although I suspect GFS2 guys added a test for this not so long > > > ago when testing their stuff. Maybe they have a pointer handy? > > > > More tests more good. > > > > So if we want to lift this scheme to the VFS layer, we'd start by > > replacing the lock you added (grepping for it, the name escapes me) with > > a different type of lock - two_state_shared_lock in my code, it's like a > > rw lock except writers don't exclude other writers. That way the DIO > > path can use it without singlethreading writes to a single file. > > Yes, I've noticed that you are introducing in bcachefs a lock with very > similar semantics to mapping->invalidate_lock, just with this special lock > type. What I'm kind of worried about with two_state_shared_lock as > implemented in bcachefs is the fairness. AFAICS so far if someone is e.g. > heavily faulting pages on a file, direct IO to that file can be starved > indefinitely. That is IMHO not a good thing and I would not like to use > this type of lock in VFS until this problem is resolved. But it should be > fixable e.g. by introducing some kind of deadline for a waiter after which > it will block acquisitions of the other lock state. Yeah, my two_state_shared lock is definitely at the quick and dirty prototype level, the implementation would need work. Lockdep support would be another hard requirement. The deadline might be a good idea, OTOH it'd want tuning. Maybe something like what rwsem does where we block new read acquirerers if there's a writer waiting would work.
next prev parent reply other threads:[~2023-05-25 21:37 UTC|newest] Thread overview: 207+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-05-09 16:56 [PATCH 00/32] bcachefs - a new COW filesystem Kent Overstreet 2023-05-09 16:56 ` [PATCH 01/32] Compiler Attributes: add __flatten Kent Overstreet 2023-05-09 17:04 ` Miguel Ojeda 2023-05-09 17:24 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 02/32] locking/lockdep: lock_class_is_held() Kent Overstreet 2023-05-09 19:30 ` Peter Zijlstra 2023-05-09 20:11 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 03/32] locking/lockdep: lockdep_set_no_check_recursion() Kent Overstreet 2023-05-09 19:31 ` Peter Zijlstra 2023-05-09 19:57 ` Kent Overstreet 2023-05-09 20:18 ` Kent Overstreet 2023-05-09 20:27 ` Waiman Long 2023-05-09 20:35 ` Kent Overstreet 2023-05-09 21:37 ` Waiman Long 2023-05-10 8:59 ` Peter Zijlstra 2023-05-10 20:38 ` Kent Overstreet 2023-05-11 8:25 ` Peter Zijlstra 2023-05-11 9:32 ` Kent Overstreet 2023-05-12 20:49 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 04/32] locking: SIX locks (shared/intent/exclusive) Kent Overstreet 2023-05-11 12:14 ` Jan Engelhardt 2023-05-12 20:58 ` Kent Overstreet 2023-05-12 22:39 ` Jan Engelhardt 2023-05-12 23:26 ` Kent Overstreet 2023-05-12 23:49 ` Randy Dunlap 2023-05-13 0:17 ` Kent Overstreet 2023-05-13 0:45 ` Eric Biggers 2023-05-13 0:51 ` Kent Overstreet 2023-05-14 12:15 ` Jeff Layton 2023-05-15 2:39 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 05/32] MAINTAINERS: Add entry for six locks Kent Overstreet 2023-05-09 16:56 ` [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Kent Overstreet 2023-05-10 1:07 ` Jan Kara 2023-05-10 6:18 ` Kent Overstreet 2023-05-23 13:34 ` Jan Kara 2023-05-23 13:34 ` [Cluster-devel] " Jan Kara 2023-05-23 16:21 ` Christoph Hellwig 2023-05-23 16:21 ` Christoph Hellwig 2023-05-23 16:35 ` Kent Overstreet 2023-05-23 16:35 ` Kent Overstreet 2023-05-24 6:43 ` Christoph Hellwig 2023-05-24 6:43 ` Christoph Hellwig 2023-05-24 8:09 ` Kent Overstreet 2023-05-24 8:09 ` Kent Overstreet 2023-05-25 8:58 ` Christoph Hellwig 2023-05-25 8:58 ` Christoph Hellwig 2023-05-25 20:50 ` Kent Overstreet 2023-05-25 20:50 ` Kent Overstreet 2023-05-26 8:06 ` Christoph Hellwig 2023-05-26 8:06 ` Christoph Hellwig 2023-05-26 8:34 ` Kent Overstreet 2023-05-26 8:34 ` Kent Overstreet 2023-05-25 21:40 ` Kent Overstreet 2023-05-25 21:40 ` Kent Overstreet 2023-05-25 22:25 ` Andreas Grünbacher 2023-05-25 22:25 ` Andreas Grünbacher 2023-05-25 23:20 ` Kent Overstreet 2023-05-25 23:20 ` Kent Overstreet 2023-05-26 0:05 ` Andreas Grünbacher 2023-05-26 0:05 ` Andreas Grünbacher 2023-05-26 0:39 ` Kent Overstreet 2023-05-26 0:39 ` Kent Overstreet 2023-05-26 8:10 ` Christoph Hellwig 2023-05-26 8:10 ` Christoph Hellwig 2023-05-26 8:38 ` Kent Overstreet 2023-05-26 8:38 ` Kent Overstreet 2023-05-23 16:49 ` Kent Overstreet 2023-05-23 16:49 ` [Cluster-devel] " Kent Overstreet 2023-05-25 8:47 ` Jan Kara 2023-05-25 8:47 ` [Cluster-devel] " Jan Kara 2023-05-25 21:36 ` Kent Overstreet [this message] 2023-05-25 21:36 ` Kent Overstreet 2023-05-25 22:45 ` Andreas Grünbacher 2023-05-25 22:45 ` [Cluster-devel] " Andreas Grünbacher 2023-05-25 22:04 ` Andreas Grünbacher 2023-05-25 22:04 ` [Cluster-devel] " Andreas Grünbacher 2023-05-09 16:56 ` [PATCH 07/32] mm: Bring back vmalloc_exec Kent Overstreet 2023-05-09 18:19 ` Lorenzo Stoakes 2023-05-09 20:15 ` Kent Overstreet 2023-05-09 20:46 ` Christoph Hellwig 2023-05-09 21:12 ` Lorenzo Stoakes 2023-05-09 21:29 ` Kent Overstreet 2023-05-10 6:48 ` Eric Biggers 2023-05-12 18:36 ` Kent Overstreet 2023-05-13 1:57 ` Eric Biggers 2023-05-13 19:28 ` Kent Overstreet 2023-05-14 5:45 ` Kent Overstreet 2023-05-14 18:43 ` Eric Biggers 2023-05-15 5:38 ` Kent Overstreet 2023-05-15 6:13 ` Eric Biggers 2023-05-15 6:18 ` Kent Overstreet 2023-05-15 7:13 ` Eric Biggers 2023-05-15 7:26 ` Kent Overstreet 2023-05-21 21:33 ` Eric Biggers 2023-05-21 22:04 ` Kent Overstreet 2023-05-15 10:29 ` David Laight 2023-05-10 11:56 ` David Laight 2023-05-09 21:43 ` Darrick J. Wong 2023-05-09 21:54 ` Kent Overstreet 2023-05-11 5:33 ` Theodore Ts'o 2023-05-11 5:44 ` Kent Overstreet 2023-05-13 13:25 ` Lorenzo Stoakes 2023-05-14 18:39 ` Christophe Leroy 2023-05-14 23:43 ` Kent Overstreet 2023-05-15 4:45 ` Christophe Leroy 2023-05-15 5:02 ` Kent Overstreet 2023-05-10 14:18 ` Christophe Leroy 2023-05-10 15:05 ` Johannes Thumshirn 2023-05-11 22:28 ` Kees Cook 2023-05-12 18:41 ` Kent Overstreet 2023-05-16 21:02 ` Kees Cook 2023-05-16 21:20 ` Kent Overstreet 2023-05-16 21:47 ` Matthew Wilcox 2023-05-16 21:57 ` Kent Overstreet 2023-05-17 5:28 ` Kent Overstreet 2023-05-17 14:04 ` Mike Rapoport 2023-05-17 14:18 ` Kent Overstreet 2023-05-17 15:44 ` Mike Rapoport 2023-05-17 15:59 ` Kent Overstreet 2023-06-17 4:13 ` Andy Lutomirski 2023-06-17 15:34 ` Kent Overstreet 2023-06-17 19:19 ` Andy Lutomirski 2023-06-17 20:08 ` Kent Overstreet 2023-06-17 20:35 ` Andy Lutomirski 2023-06-19 19:45 ` Kees Cook 2023-06-20 0:39 ` Kent Overstreet 2023-06-19 9:19 ` Mark Rutland 2023-06-19 10:47 ` Kent Overstreet 2023-06-19 12:47 ` Mark Rutland 2023-06-19 19:17 ` Kent Overstreet 2023-06-20 17:42 ` Andy Lutomirski 2023-06-20 18:08 ` Kent Overstreet 2023-06-20 18:15 ` Andy Lutomirski 2023-06-20 18:48 ` Dave Hansen 2023-06-20 20:18 ` Kent Overstreet 2023-06-20 20:42 ` Andy Lutomirski 2023-06-20 22:32 ` Andy Lutomirski 2023-06-20 22:43 ` Nadav Amit 2023-06-21 1:27 ` Andy Lutomirski 2023-05-09 16:56 ` [PATCH 08/32] fs: factor out d_mark_tmpfile() Kent Overstreet 2023-05-09 16:56 ` [PATCH 09/32] block: Add some exports for bcachefs Kent Overstreet 2023-05-09 16:56 ` [PATCH 10/32] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset Kent Overstreet 2023-05-09 16:56 ` [PATCH 11/32] block: Bring back zero_fill_bio_iter Kent Overstreet 2023-05-09 16:56 ` [PATCH 12/32] block: Rework bio_for_each_segment_all() Kent Overstreet 2023-05-09 16:56 ` [PATCH 13/32] block: Rework bio_for_each_folio_all() Kent Overstreet 2023-05-09 16:56 ` [PATCH 14/32] block: Don't block on s_umount from __invalidate_super() Kent Overstreet 2023-05-09 16:56 ` [PATCH 15/32] bcache: move closures to lib/ Kent Overstreet 2023-05-10 1:10 ` Randy Dunlap 2023-05-09 16:56 ` [PATCH 16/32] MAINTAINERS: Add entry for closures Kent Overstreet 2023-05-09 17:05 ` Coly Li 2023-05-09 21:03 ` Randy Dunlap 2023-05-09 16:56 ` [PATCH 17/32] closures: closure_wait_event() Kent Overstreet 2023-05-09 16:56 ` [PATCH 18/32] closures: closure_nr_remaining() Kent Overstreet 2023-05-09 16:56 ` [PATCH 19/32] closures: Add a missing include Kent Overstreet 2023-05-09 16:56 ` [PATCH 20/32] vfs: factor out inode hash head calculation Kent Overstreet 2023-05-23 9:27 ` (subset) " Christian Brauner 2023-05-23 22:53 ` Dave Chinner 2023-05-24 6:44 ` Christoph Hellwig 2023-05-24 7:35 ` Dave Chinner 2023-05-24 8:31 ` Christian Brauner 2023-05-24 8:41 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 21/32] hlist-bl: add hlist_bl_fake() Kent Overstreet 2023-05-10 4:48 ` Dave Chinner 2023-05-23 9:27 ` (subset) " Christian Brauner 2023-05-09 16:56 ` [PATCH 22/32] vfs: inode cache conversion to hash-bl Kent Overstreet 2023-05-10 4:45 ` Dave Chinner 2023-05-16 15:45 ` Christian Brauner 2023-05-16 16:17 ` Kent Overstreet 2023-05-16 23:15 ` Dave Chinner 2023-05-22 13:04 ` Christian Brauner 2023-05-23 9:28 ` (subset) " Christian Brauner 2023-10-19 15:30 ` Mateusz Guzik 2023-10-19 15:59 ` Mateusz Guzik 2023-10-20 11:38 ` Dave Chinner 2023-10-20 17:49 ` Mateusz Guzik 2023-10-21 12:13 ` Mateusz Guzik 2023-10-23 5:10 ` Dave Chinner 2023-10-27 17:13 ` Mateusz Guzik 2023-10-27 18:36 ` Darrick J. Wong 2023-10-31 11:02 ` Christian Brauner 2023-10-31 11:31 ` Mateusz Guzik 2023-11-02 2:36 ` Kent Overstreet 2023-11-04 20:51 ` Dave Chinner 2023-05-09 16:56 ` [PATCH 23/32] iov_iter: copy_folio_from_iter_atomic() Kent Overstreet 2023-05-10 2:20 ` kernel test robot 2023-05-11 2:08 ` kernel test robot 2023-05-09 16:56 ` [PATCH 24/32] MAINTAINERS: Add entry for generic-radix-tree Kent Overstreet 2023-05-09 21:03 ` Randy Dunlap 2023-05-09 16:56 ` [PATCH 25/32] lib/generic-radix-tree.c: Don't overflow in peek() Kent Overstreet 2023-05-09 16:56 ` [PATCH 26/32] lib/generic-radix-tree.c: Add a missing include Kent Overstreet 2023-05-09 16:56 ` [PATCH 27/32] lib/generic-radix-tree.c: Add peek_prev() Kent Overstreet 2023-05-09 16:56 ` [PATCH 28/32] stacktrace: Export stack_trace_save_tsk Kent Overstreet 2023-06-19 9:10 ` Mark Rutland 2023-06-19 11:16 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 29/32] lib/string_helpers: string_get_size() now returns characters wrote Kent Overstreet 2023-07-12 19:58 ` Kees Cook 2023-07-12 20:19 ` Kent Overstreet 2023-07-12 22:38 ` Kees Cook 2023-07-12 23:53 ` Kent Overstreet 2023-07-12 20:23 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 30/32] lib: Export errname Kent Overstreet 2023-05-09 16:56 ` [PATCH 31/32] lib: add mean and variance module Kent Overstreet 2023-05-09 16:56 ` [PATCH 32/32] MAINTAINERS: Add entry for bcachefs Kent Overstreet 2023-05-09 21:04 ` Randy Dunlap 2023-05-09 21:07 ` Kent Overstreet 2023-06-15 20:41 ` [PATCH 00/32] bcachefs - a new COW filesystem Pavel Machek 2023-06-15 21:26 ` Kent Overstreet
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ZG/U8pl8Zp3Wei7Z@moria.home.lan \ --to=kent.overstreet@linux.dev \ --cc=agruenba@redhat.com \ --cc=cluster-devel@redhat.com \ --cc=dhowells@redhat.com \ --cc=djwong@kernel.org \ --cc=jack@suse.cz \ --cc=kent.overstreet@gmail.com \ --cc=linux-bcachefs@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=rpeterso@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.