From: Kent Overstreet <kent.overstreet@linux.dev> To: "Andreas Grünbacher" <andreas.gruenbacher@gmail.com> Cc: Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>, cluster-devel@redhat.com, "Darrick J . Wong" <djwong@kernel.org>, linux-kernel@vger.kernel.org, dhowells@redhat.com, linux-bcachefs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Kent Overstreet <kent.overstreet@gmail.com> Subject: Re: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Date: Thu, 25 May 2023 19:20:46 -0400 [thread overview] Message-ID: <ZG/tTorh8G2919Jz@moria.home.lan> (raw) In-Reply-To: <CAHpGcML0CZ1RGkOf26iYt_tK0Ux=cfdW8d3bjMVbjXLr91cs+g@mail.gmail.com> On Fri, May 26, 2023 at 12:25:31AM +0200, Andreas Grünbacher wrote: > Am Di., 23. Mai 2023 um 18:28 Uhr schrieb Christoph Hellwig <hch@infradead.org>: > > On Tue, May 23, 2023 at 03:34:31PM +0200, Jan Kara wrote: > > > I've checked the code and AFAICT it is all indeed handled. BTW, I've now > > > remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25 > > > ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different > > > way (by prefaulting pages from the iter before grabbing the problematic > > > lock and then disabling page faults for the iomap_dio_rw() call). I guess > > > we should somehow unify these schemes so that we don't have two mechanisms > > > for avoiding exactly the same deadlock. Adding GFS2 guys to CC. > > > > > > Also good that you've written a fstest for this, that is definitely a useful > > > addition, although I suspect GFS2 guys added a test for this not so long > > > ago when testing their stuff. Maybe they have a pointer handy? > > > > generic/708 is the btrfs version of this. > > > > But I think all of the file systems that have this deadlock are actually > > fundamentally broken because they have a mess up locking hierarchy > > where page faults take the same lock that is held over the the direct I/ > > operation. And the right thing is to fix this. I have work in progress > > for btrfs, and something similar should apply to gfs2, with the added > > complication that it probably means a revision to their network > > protocol. > > We do disable page faults, and there can be deadlocks in page fault > handlers while no page faults are allowed. > > I'm roughly aware of the locking hierarchy that other filesystems use, > and that's something we want to avoid because of two reasons: (1) it > would be an incompatible change, and (2) we want to avoid cluster-wide > locking operations as much as possible because they are very slow. > > These kinds of locking conflicts are so rare in practice that the > theoretical inefficiency of having to retry the operation doesn't > matter. Would you be willing to expand on that? I'm wondering if this would simplify things for gfs2, but you mention locking heirarchy being an incompatible change - how does that work? > > > I'm absolutely not in favour to add workarounds for thes kind of locking > > problems to the core kernel. I already feel bad for allowing the > > small workaround in iomap for btrfs, as just fixing the locking back > > then would have avoid massive ratholing. > > Please let me know when those btrfs changes are in a presentable shape ... I would also be curious to know what btrfs needs and what the approach is there.
WARNING: multiple messages have this Message-ID (diff)
From: Kent Overstreet <kent.overstreet@linux.dev> To: cluster-devel.redhat.com Subject: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Date: Thu, 25 May 2023 19:20:46 -0400 [thread overview] Message-ID: <ZG/tTorh8G2919Jz@moria.home.lan> (raw) In-Reply-To: <CAHpGcML0CZ1RGkOf26iYt_tK0Ux=cfdW8d3bjMVbjXLr91cs+g@mail.gmail.com> On Fri, May 26, 2023 at 12:25:31AM +0200, Andreas Gr?nbacher wrote: > Am Di., 23. Mai 2023 um 18:28 Uhr schrieb Christoph Hellwig <hch@infradead.org>: > > On Tue, May 23, 2023 at 03:34:31PM +0200, Jan Kara wrote: > > > I've checked the code and AFAICT it is all indeed handled. BTW, I've now > > > remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25 > > > ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different > > > way (by prefaulting pages from the iter before grabbing the problematic > > > lock and then disabling page faults for the iomap_dio_rw() call). I guess > > > we should somehow unify these schemes so that we don't have two mechanisms > > > for avoiding exactly the same deadlock. Adding GFS2 guys to CC. > > > > > > Also good that you've written a fstest for this, that is definitely a useful > > > addition, although I suspect GFS2 guys added a test for this not so long > > > ago when testing their stuff. Maybe they have a pointer handy? > > > > generic/708 is the btrfs version of this. > > > > But I think all of the file systems that have this deadlock are actually > > fundamentally broken because they have a mess up locking hierarchy > > where page faults take the same lock that is held over the the direct I/ > > operation. And the right thing is to fix this. I have work in progress > > for btrfs, and something similar should apply to gfs2, with the added > > complication that it probably means a revision to their network > > protocol. > > We do disable page faults, and there can be deadlocks in page fault > handlers while no page faults are allowed. > > I'm roughly aware of the locking hierarchy that other filesystems use, > and that's something we want to avoid because of two reasons: (1) it > would be an incompatible change, and (2) we want to avoid cluster-wide > locking operations as much as possible because they are very slow. > > These kinds of locking conflicts are so rare in practice that the > theoretical inefficiency of having to retry the operation doesn't > matter. Would you be willing to expand on that? I'm wondering if this would simplify things for gfs2, but you mention locking heirarchy being an incompatible change - how does that work? > > > I'm absolutely not in favour to add workarounds for thes kind of locking > > problems to the core kernel. I already feel bad for allowing the > > small workaround in iomap for btrfs, as just fixing the locking back > > then would have avoid massive ratholing. > > Please let me know when those btrfs changes are in a presentable shape ... I would also be curious to know what btrfs needs and what the approach is there.
next prev parent reply other threads:[~2023-05-25 23:20 UTC|newest] Thread overview: 207+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-05-09 16:56 [PATCH 00/32] bcachefs - a new COW filesystem Kent Overstreet 2023-05-09 16:56 ` [PATCH 01/32] Compiler Attributes: add __flatten Kent Overstreet 2023-05-09 17:04 ` Miguel Ojeda 2023-05-09 17:24 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 02/32] locking/lockdep: lock_class_is_held() Kent Overstreet 2023-05-09 19:30 ` Peter Zijlstra 2023-05-09 20:11 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 03/32] locking/lockdep: lockdep_set_no_check_recursion() Kent Overstreet 2023-05-09 19:31 ` Peter Zijlstra 2023-05-09 19:57 ` Kent Overstreet 2023-05-09 20:18 ` Kent Overstreet 2023-05-09 20:27 ` Waiman Long 2023-05-09 20:35 ` Kent Overstreet 2023-05-09 21:37 ` Waiman Long 2023-05-10 8:59 ` Peter Zijlstra 2023-05-10 20:38 ` Kent Overstreet 2023-05-11 8:25 ` Peter Zijlstra 2023-05-11 9:32 ` Kent Overstreet 2023-05-12 20:49 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 04/32] locking: SIX locks (shared/intent/exclusive) Kent Overstreet 2023-05-11 12:14 ` Jan Engelhardt 2023-05-12 20:58 ` Kent Overstreet 2023-05-12 22:39 ` Jan Engelhardt 2023-05-12 23:26 ` Kent Overstreet 2023-05-12 23:49 ` Randy Dunlap 2023-05-13 0:17 ` Kent Overstreet 2023-05-13 0:45 ` Eric Biggers 2023-05-13 0:51 ` Kent Overstreet 2023-05-14 12:15 ` Jeff Layton 2023-05-15 2:39 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 05/32] MAINTAINERS: Add entry for six locks Kent Overstreet 2023-05-09 16:56 ` [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Kent Overstreet 2023-05-10 1:07 ` Jan Kara 2023-05-10 6:18 ` Kent Overstreet 2023-05-23 13:34 ` Jan Kara 2023-05-23 13:34 ` [Cluster-devel] " Jan Kara 2023-05-23 16:21 ` Christoph Hellwig 2023-05-23 16:21 ` Christoph Hellwig 2023-05-23 16:35 ` Kent Overstreet 2023-05-23 16:35 ` Kent Overstreet 2023-05-24 6:43 ` Christoph Hellwig 2023-05-24 6:43 ` Christoph Hellwig 2023-05-24 8:09 ` Kent Overstreet 2023-05-24 8:09 ` Kent Overstreet 2023-05-25 8:58 ` Christoph Hellwig 2023-05-25 8:58 ` Christoph Hellwig 2023-05-25 20:50 ` Kent Overstreet 2023-05-25 20:50 ` Kent Overstreet 2023-05-26 8:06 ` Christoph Hellwig 2023-05-26 8:06 ` Christoph Hellwig 2023-05-26 8:34 ` Kent Overstreet 2023-05-26 8:34 ` Kent Overstreet 2023-05-25 21:40 ` Kent Overstreet 2023-05-25 21:40 ` Kent Overstreet 2023-05-25 22:25 ` Andreas Grünbacher 2023-05-25 22:25 ` Andreas Grünbacher 2023-05-25 23:20 ` Kent Overstreet [this message] 2023-05-25 23:20 ` Kent Overstreet 2023-05-26 0:05 ` Andreas Grünbacher 2023-05-26 0:05 ` Andreas Grünbacher 2023-05-26 0:39 ` Kent Overstreet 2023-05-26 0:39 ` Kent Overstreet 2023-05-26 8:10 ` Christoph Hellwig 2023-05-26 8:10 ` Christoph Hellwig 2023-05-26 8:38 ` Kent Overstreet 2023-05-26 8:38 ` Kent Overstreet 2023-05-23 16:49 ` Kent Overstreet 2023-05-23 16:49 ` [Cluster-devel] " Kent Overstreet 2023-05-25 8:47 ` Jan Kara 2023-05-25 8:47 ` [Cluster-devel] " Jan Kara 2023-05-25 21:36 ` Kent Overstreet 2023-05-25 21:36 ` [Cluster-devel] " Kent Overstreet 2023-05-25 22:45 ` Andreas Grünbacher 2023-05-25 22:45 ` [Cluster-devel] " Andreas Grünbacher 2023-05-25 22:04 ` Andreas Grünbacher 2023-05-25 22:04 ` [Cluster-devel] " Andreas Grünbacher 2023-05-09 16:56 ` [PATCH 07/32] mm: Bring back vmalloc_exec Kent Overstreet 2023-05-09 18:19 ` Lorenzo Stoakes 2023-05-09 20:15 ` Kent Overstreet 2023-05-09 20:46 ` Christoph Hellwig 2023-05-09 21:12 ` Lorenzo Stoakes 2023-05-09 21:29 ` Kent Overstreet 2023-05-10 6:48 ` Eric Biggers 2023-05-12 18:36 ` Kent Overstreet 2023-05-13 1:57 ` Eric Biggers 2023-05-13 19:28 ` Kent Overstreet 2023-05-14 5:45 ` Kent Overstreet 2023-05-14 18:43 ` Eric Biggers 2023-05-15 5:38 ` Kent Overstreet 2023-05-15 6:13 ` Eric Biggers 2023-05-15 6:18 ` Kent Overstreet 2023-05-15 7:13 ` Eric Biggers 2023-05-15 7:26 ` Kent Overstreet 2023-05-21 21:33 ` Eric Biggers 2023-05-21 22:04 ` Kent Overstreet 2023-05-15 10:29 ` David Laight 2023-05-10 11:56 ` David Laight 2023-05-09 21:43 ` Darrick J. Wong 2023-05-09 21:54 ` Kent Overstreet 2023-05-11 5:33 ` Theodore Ts'o 2023-05-11 5:44 ` Kent Overstreet 2023-05-13 13:25 ` Lorenzo Stoakes 2023-05-14 18:39 ` Christophe Leroy 2023-05-14 23:43 ` Kent Overstreet 2023-05-15 4:45 ` Christophe Leroy 2023-05-15 5:02 ` Kent Overstreet 2023-05-10 14:18 ` Christophe Leroy 2023-05-10 15:05 ` Johannes Thumshirn 2023-05-11 22:28 ` Kees Cook 2023-05-12 18:41 ` Kent Overstreet 2023-05-16 21:02 ` Kees Cook 2023-05-16 21:20 ` Kent Overstreet 2023-05-16 21:47 ` Matthew Wilcox 2023-05-16 21:57 ` Kent Overstreet 2023-05-17 5:28 ` Kent Overstreet 2023-05-17 14:04 ` Mike Rapoport 2023-05-17 14:18 ` Kent Overstreet 2023-05-17 15:44 ` Mike Rapoport 2023-05-17 15:59 ` Kent Overstreet 2023-06-17 4:13 ` Andy Lutomirski 2023-06-17 15:34 ` Kent Overstreet 2023-06-17 19:19 ` Andy Lutomirski 2023-06-17 20:08 ` Kent Overstreet 2023-06-17 20:35 ` Andy Lutomirski 2023-06-19 19:45 ` Kees Cook 2023-06-20 0:39 ` Kent Overstreet 2023-06-19 9:19 ` Mark Rutland 2023-06-19 10:47 ` Kent Overstreet 2023-06-19 12:47 ` Mark Rutland 2023-06-19 19:17 ` Kent Overstreet 2023-06-20 17:42 ` Andy Lutomirski 2023-06-20 18:08 ` Kent Overstreet 2023-06-20 18:15 ` Andy Lutomirski 2023-06-20 18:48 ` Dave Hansen 2023-06-20 20:18 ` Kent Overstreet 2023-06-20 20:42 ` Andy Lutomirski 2023-06-20 22:32 ` Andy Lutomirski 2023-06-20 22:43 ` Nadav Amit 2023-06-21 1:27 ` Andy Lutomirski 2023-05-09 16:56 ` [PATCH 08/32] fs: factor out d_mark_tmpfile() Kent Overstreet 2023-05-09 16:56 ` [PATCH 09/32] block: Add some exports for bcachefs Kent Overstreet 2023-05-09 16:56 ` [PATCH 10/32] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset Kent Overstreet 2023-05-09 16:56 ` [PATCH 11/32] block: Bring back zero_fill_bio_iter Kent Overstreet 2023-05-09 16:56 ` [PATCH 12/32] block: Rework bio_for_each_segment_all() Kent Overstreet 2023-05-09 16:56 ` [PATCH 13/32] block: Rework bio_for_each_folio_all() Kent Overstreet 2023-05-09 16:56 ` [PATCH 14/32] block: Don't block on s_umount from __invalidate_super() Kent Overstreet 2023-05-09 16:56 ` [PATCH 15/32] bcache: move closures to lib/ Kent Overstreet 2023-05-10 1:10 ` Randy Dunlap 2023-05-09 16:56 ` [PATCH 16/32] MAINTAINERS: Add entry for closures Kent Overstreet 2023-05-09 17:05 ` Coly Li 2023-05-09 21:03 ` Randy Dunlap 2023-05-09 16:56 ` [PATCH 17/32] closures: closure_wait_event() Kent Overstreet 2023-05-09 16:56 ` [PATCH 18/32] closures: closure_nr_remaining() Kent Overstreet 2023-05-09 16:56 ` [PATCH 19/32] closures: Add a missing include Kent Overstreet 2023-05-09 16:56 ` [PATCH 20/32] vfs: factor out inode hash head calculation Kent Overstreet 2023-05-23 9:27 ` (subset) " Christian Brauner 2023-05-23 22:53 ` Dave Chinner 2023-05-24 6:44 ` Christoph Hellwig 2023-05-24 7:35 ` Dave Chinner 2023-05-24 8:31 ` Christian Brauner 2023-05-24 8:41 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 21/32] hlist-bl: add hlist_bl_fake() Kent Overstreet 2023-05-10 4:48 ` Dave Chinner 2023-05-23 9:27 ` (subset) " Christian Brauner 2023-05-09 16:56 ` [PATCH 22/32] vfs: inode cache conversion to hash-bl Kent Overstreet 2023-05-10 4:45 ` Dave Chinner 2023-05-16 15:45 ` Christian Brauner 2023-05-16 16:17 ` Kent Overstreet 2023-05-16 23:15 ` Dave Chinner 2023-05-22 13:04 ` Christian Brauner 2023-05-23 9:28 ` (subset) " Christian Brauner 2023-10-19 15:30 ` Mateusz Guzik 2023-10-19 15:59 ` Mateusz Guzik 2023-10-20 11:38 ` Dave Chinner 2023-10-20 17:49 ` Mateusz Guzik 2023-10-21 12:13 ` Mateusz Guzik 2023-10-23 5:10 ` Dave Chinner 2023-10-27 17:13 ` Mateusz Guzik 2023-10-27 18:36 ` Darrick J. Wong 2023-10-31 11:02 ` Christian Brauner 2023-10-31 11:31 ` Mateusz Guzik 2023-11-02 2:36 ` Kent Overstreet 2023-11-04 20:51 ` Dave Chinner 2023-05-09 16:56 ` [PATCH 23/32] iov_iter: copy_folio_from_iter_atomic() Kent Overstreet 2023-05-10 2:20 ` kernel test robot 2023-05-11 2:08 ` kernel test robot 2023-05-09 16:56 ` [PATCH 24/32] MAINTAINERS: Add entry for generic-radix-tree Kent Overstreet 2023-05-09 21:03 ` Randy Dunlap 2023-05-09 16:56 ` [PATCH 25/32] lib/generic-radix-tree.c: Don't overflow in peek() Kent Overstreet 2023-05-09 16:56 ` [PATCH 26/32] lib/generic-radix-tree.c: Add a missing include Kent Overstreet 2023-05-09 16:56 ` [PATCH 27/32] lib/generic-radix-tree.c: Add peek_prev() Kent Overstreet 2023-05-09 16:56 ` [PATCH 28/32] stacktrace: Export stack_trace_save_tsk Kent Overstreet 2023-06-19 9:10 ` Mark Rutland 2023-06-19 11:16 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 29/32] lib/string_helpers: string_get_size() now returns characters wrote Kent Overstreet 2023-07-12 19:58 ` Kees Cook 2023-07-12 20:19 ` Kent Overstreet 2023-07-12 22:38 ` Kees Cook 2023-07-12 23:53 ` Kent Overstreet 2023-07-12 20:23 ` Kent Overstreet 2023-05-09 16:56 ` [PATCH 30/32] lib: Export errname Kent Overstreet 2023-05-09 16:56 ` [PATCH 31/32] lib: add mean and variance module Kent Overstreet 2023-05-09 16:56 ` [PATCH 32/32] MAINTAINERS: Add entry for bcachefs Kent Overstreet 2023-05-09 21:04 ` Randy Dunlap 2023-05-09 21:07 ` Kent Overstreet 2023-06-15 20:41 ` [PATCH 00/32] bcachefs - a new COW filesystem Pavel Machek 2023-06-15 21:26 ` Kent Overstreet
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ZG/tTorh8G2919Jz@moria.home.lan \ --to=kent.overstreet@linux.dev \ --cc=andreas.gruenbacher@gmail.com \ --cc=cluster-devel@redhat.com \ --cc=dhowells@redhat.com \ --cc=djwong@kernel.org \ --cc=hch@infradead.org \ --cc=jack@suse.cz \ --cc=kent.overstreet@gmail.com \ --cc=linux-bcachefs@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.