From: Nick Piggin <npiggin@kernel.dk> To: Dave Chinner <david@fromorbit.com> Cc: Nick Piggin <npiggin@kernel.dk>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 02/46] fs: d_validate fixes Date: Thu, 9 Dec 2010 15:50:17 +1100 [thread overview] Message-ID: <20101209045017.GC3139@amd> (raw) In-Reply-To: <20101209005029.GC32766@dastard> On Thu, Dec 09, 2010 at 11:50:29AM +1100, Dave Chinner wrote: > On Wed, Dec 08, 2010 at 05:59:55PM +1100, Nick Piggin wrote: > > On Wed, Dec 08, 2010 at 12:53:44PM +1100, Dave Chinner wrote: > > > On Sat, Nov 27, 2010 at 08:44:32PM +1100, Nick Piggin wrote: > > > > d_validate has been broken for a long time. > > > > > > > > kmem_ptr_validate does not guarantee that a pointer can be dereferenced > > > > if it can go away at any time. Even rcu_read_lock doesn't help, because > > > > the pointer might be queued in RCU callbacks but not executed yet. > > > > > > > > So the parent cannot be checked, nor the name hashed. The dentry pointer > > > > can not be touched until it can be verified under lock. Hashing simply > > > > cannot be used. > > > > > > > > Instead, verify the parent/child relationship by traversing parent's > > > > d_child list. It's slow, but only ncpfs and the destaged smbfs care > > > > about it, at this point. > > > > > > I'd drop the previous revert patch and just convert the RCU hash > > > traversal straight to the d_child traversal code you introduce here. > > > This is a much better explanation of why the d_validate mechanism > > > needs to be changed, and the revert is really an unnecessary extra > > > step... > > > > Has to be backported, though. > > Backported where? The d_validate() change only got included in .37-rc1. Backported to stable/distro kernels I suppose. I'm not sure what your point is? > > Patch that is to be reverted obviously > > adds more brokenness and is a good example that you cannot dget() under > > rcu read protection even if the rest of the surrounding function is > > bugfree. I wouldn't have thought it's a big deal. > > Reverting something broken to something already broken just to fix > to the less broken version seems like an unnecessary step. Just > fix the brokenneѕs in a single patch - no need to indirect the real > fix through a revert. One less patch to worry about. OK but I disagree. Firstly, reverting that patch gives a good record of that particular pattern of bug (that Christoph and Al both missed). With more RCU going into the vfs, people need to be pretty clear about the pitfalls. Secondly, as I said, reverting means that I can use exact same patch for upstream and stable kernels. And finally, it gives better bisectability. If somebody hits a bug in my patch, I would rather have them bisect into the well-worn (if buggy) version of the code than bisect into a different type of brokenness. It isn't indirecting the real fix through a revert, they are broken in different ways. My fix is for the bug that it doesn't guarantee the persistence of *memory* we are using, and the revert is for the bug that it doesn't guarantee the persistence/validity of the *object*, and which is actually more likely to be a problem if you think about it, because the window is much larger. Git has no problem with lots of patches, so I don't see any advantage to doing one patch, and you lose the advantages above.
WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@kernel.dk> To: Dave Chinner <david@fromorbit.com> Cc: Nick Piggin <npiggin@kernel.dk>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 02/46] fs: d_validate fixes Date: Thu, 9 Dec 2010 15:50:17 +1100 [thread overview] Message-ID: <20101209045017.GC3139@amd> (raw) In-Reply-To: <20101209005029.GC32766@dastard> On Thu, Dec 09, 2010 at 11:50:29AM +1100, Dave Chinner wrote: > On Wed, Dec 08, 2010 at 05:59:55PM +1100, Nick Piggin wrote: > > On Wed, Dec 08, 2010 at 12:53:44PM +1100, Dave Chinner wrote: > > > On Sat, Nov 27, 2010 at 08:44:32PM +1100, Nick Piggin wrote: > > > > d_validate has been broken for a long time. > > > > > > > > kmem_ptr_validate does not guarantee that a pointer can be dereferenced > > > > if it can go away at any time. Even rcu_read_lock doesn't help, because > > > > the pointer might be queued in RCU callbacks but not executed yet. > > > > > > > > So the parent cannot be checked, nor the name hashed. The dentry pointer > > > > can not be touched until it can be verified under lock. Hashing simply > > > > cannot be used. > > > > > > > > Instead, verify the parent/child relationship by traversing parent's > > > > d_child list. It's slow, but only ncpfs and the destaged smbfs care > > > > about it, at this point. > > > > > > I'd drop the previous revert patch and just convert the RCU hash > > > traversal straight to the d_child traversal code you introduce here. > > > This is a much better explanation of why the d_validate mechanism > > > needs to be changed, and the revert is really an unnecessary extra > > > step... > > > > Has to be backported, though. > > Backported where? The d_validate() change only got included in .37-rc1. Backported to stable/distro kernels I suppose. I'm not sure what your point is? > > Patch that is to be reverted obviously > > adds more brokenness and is a good example that you cannot dget() under > > rcu read protection even if the rest of the surrounding function is > > bugfree. I wouldn't have thought it's a big deal. > > Reverting something broken to something already broken just to fix > to the less broken version seems like an unnecessary step. Just > fix the brokenneѕs in a single patch - no need to indirect the real > fix through a revert. One less patch to worry about. OK but I disagree. Firstly, reverting that patch gives a good record of that particular pattern of bug (that Christoph and Al both missed). With more RCU going into the vfs, people need to be pretty clear about the pitfalls. Secondly, as I said, reverting means that I can use exact same patch for upstream and stable kernels. And finally, it gives better bisectability. If somebody hits a bug in my patch, I would rather have them bisect into the well-worn (if buggy) version of the code than bisect into a different type of brokenness. It isn't indirecting the real fix through a revert, they are broken in different ways. My fix is for the bug that it doesn't guarantee the persistence of *memory* we are using, and the revert is for the bug that it doesn't guarantee the persistence/validity of the *object*, and which is actually more likely to be a problem if you think about it, because the window is much larger. Git has no problem with lots of patches, so I don't see any advantage to doing one patch, and you lose the advantages above. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-12-09 4:50 UTC|newest] Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-11-27 10:15 [PATCH 00/46] rcu-walk and dcache scaling Nick Piggin 2010-11-27 9:44 ` [PATCH 02/46] fs: d_validate fixes Nick Piggin 2010-12-08 1:53 ` Dave Chinner 2010-12-08 6:59 ` Nick Piggin 2010-12-09 0:50 ` Dave Chinner 2010-12-09 0:50 ` Dave Chinner 2010-12-09 4:50 ` Nick Piggin [this message] 2010-12-09 4:50 ` Nick Piggin 2010-11-27 9:44 ` [PATCH 03/46] kernel: kmem_ptr_validate considered harmful Nick Piggin 2010-11-27 9:44 ` [PATCH 04/46] fs: dcache documentation cleanup Nick Piggin 2010-11-27 9:44 ` [PATCH 05/46] fs: change d_delete semantics Nick Piggin 2010-11-27 9:44 ` [PATCH 06/46] cifs: dont overwrite dentry name in d_revalidate Nick Piggin 2010-11-27 9:44 ` [PATCH 07/46] jfs: " Nick Piggin 2010-11-27 9:44 ` [PATCH 08/46] fs: change d_compare for rcu-walk Nick Piggin 2010-11-27 9:44 ` [PATCH 09/46] fs: change d_hash " Nick Piggin 2010-11-27 9:44 ` [PATCH 10/46] hostfs: simplify locking Nick Piggin 2010-11-27 9:44 ` [PATCH 11/46] fs: dcache scale hash Nick Piggin 2010-12-09 6:09 ` Dave Chinner 2010-12-09 6:28 ` Nick Piggin 2010-12-09 8:17 ` Dave Chinner 2010-12-09 12:53 ` Nick Piggin 2010-12-09 23:42 ` Dave Chinner 2010-12-10 2:35 ` Nick Piggin 2010-12-10 9:01 ` Dave Chinner 2010-12-13 4:48 ` Nick Piggin 2010-12-13 5:05 ` Nick Piggin 2010-11-27 9:44 ` [PATCH 12/46] fs: dcache scale lru Nick Piggin 2010-12-09 7:22 ` Dave Chinner 2010-12-09 12:34 ` Nick Piggin 2010-11-27 9:44 ` [PATCH 13/46] fs: dcache scale dentry refcount Nick Piggin 2010-11-27 9:44 ` [PATCH 14/46] fs: dcache scale d_unhashed Nick Piggin 2010-11-27 9:44 ` [PATCH 15/46] fs: dcache scale subdirs Nick Piggin 2010-11-27 9:44 ` [PATCH 16/46] fs: scale inode alias list Nick Piggin 2010-11-27 9:44 ` [PATCH 17/46] fs: Use rename lock and RCU for multi-step operations Nick Piggin 2011-01-18 22:32 ` Yehuda Sadeh Weinraub 2011-01-18 22:42 ` Nick Piggin 2011-01-19 22:27 ` Yehuda Sadeh Weinraub 2011-01-19 22:32 ` Nick Piggin 2011-01-25 22:10 ` Yehuda Sadeh Weinraub 2011-01-27 5:18 ` Nick Piggin 2011-02-07 18:52 ` Jim Schutt 2011-02-07 21:04 ` Yehuda Sadeh Weinraub 2011-02-07 21:04 ` Yehuda Sadeh Weinraub 2011-02-07 21:31 ` Jim Schutt 2011-02-07 21:35 ` Gregory Farnum 2011-02-07 22:25 ` Jim Schutt 2011-02-14 17:57 ` Yehuda Sadeh Weinraub 2010-11-27 9:44 ` [PATCH 18/46] fs: increase d_name lock coverage Nick Piggin 2010-11-27 9:44 ` [PATCH 19/46] fs: dcache remove dcache_lock Nick Piggin 2010-11-27 9:44 ` [PATCH 20/46] fs: dcache avoid starvation in dcache multi-step operations Nick Piggin 2010-11-27 9:44 ` [PATCH 21/46] fs: dcache reduce dput locking Nick Piggin 2010-11-27 9:44 ` [PATCH 22/46] fs: dcache reduce locking in d_alloc Nick Piggin 2010-11-27 9:44 ` [PATCH 23/46] fs: dcache reduce dcache_inode_lock Nick Piggin 2010-11-27 9:44 ` [PATCH 24/46] fs: dcache rationalise dget variants Nick Piggin 2010-11-27 9:44 ` [PATCH 25/46] fs: dcache reduce d_parent locking Nick Piggin 2010-11-27 9:44 ` [PATCH 26/46] fs: dcache reduce prune_one_dentry locking Nick Piggin 2010-11-27 9:44 ` [PATCH 27/46] fs: reduce dcache_inode_lock width in lru scanning Nick Piggin 2010-11-27 9:44 ` [PATCH 28/46] fs: use RCU in shrink_dentry_list to reduce lock nesting Nick Piggin 2010-11-27 9:44 ` [PATCH 29/46] fs: consolidate dentry kill sequence Nick Piggin 2010-11-27 9:45 ` [PATCH 30/46] fs: icache RCU free inodes Nick Piggin 2010-11-27 9:45 ` [PATCH 31/46] fs: avoid inode RCU freeing for pseudo fs Nick Piggin 2010-11-27 9:45 ` [PATCH 32/46] kernel: optimise seqlock Nick Piggin 2010-11-27 9:45 ` [PATCH 33/46] fs: rcu-walk for path lookup Nick Piggin 2010-11-27 9:45 ` [PATCH 34/46] fs: fs_struct use seqlock Nick Piggin 2010-11-27 9:45 ` [PATCH 35/46] fs: dcache remove d_mounted Nick Piggin 2010-11-27 9:45 ` [PATCH 36/46] fs: dcache reduce branches in lookup path Nick Piggin 2010-11-27 9:45 ` [PATCH 37/46] fs: cache optimise dentry and inode for rcu-walk Nick Piggin 2010-11-27 9:45 ` [PATCH 38/46] fs: prefetch inode data in dcache lookup Nick Piggin 2010-11-27 9:45 ` [PATCH 39/46] fs: d_revalidate_rcu for rcu-walk Nick Piggin 2010-11-27 9:45 ` [PATCH 40/46] fs: provide rcu-walk aware permission i_ops Nick Piggin 2010-11-27 9:45 ` [PATCH 41/46] fs: provide simple rcu-walk ACL implementation Nick Piggin 2010-11-27 9:45 ` [PATCH 42/46] kernel: add bl_list Nick Piggin 2010-11-27 9:45 ` [PATCH 43/46] bit_spinlock: add required includes Nick Piggin 2010-11-27 9:45 ` [PATCH 44/46] fs: dcache per-bucket dcache hash locking Nick Piggin 2010-11-27 9:45 ` [PATCH 45/46] fs: dcache per-inode inode alias locking Nick Piggin 2010-11-27 9:45 ` [PATCH 46/46] fs: improve scalability of pseudo filesystems Nick Piggin 2010-11-27 9:56 ` [PATCH 01/46] Revert "fs: use RCU read side protection in d_validate" Nick Piggin 2010-12-08 1:16 ` Dave Chinner 2010-12-08 9:38 ` Nick Piggin 2010-12-09 0:44 ` Dave Chinner 2010-12-09 4:38 ` Nick Piggin 2010-12-09 5:16 ` Nick Piggin 2010-11-27 15:04 ` [PATCH 00/46] rcu-walk and dcache scaling Anca Emanuel 2010-11-27 15:04 ` Anca Emanuel 2010-11-28 3:28 ` Nick Piggin 2010-11-28 3:28 ` Nick Piggin 2010-11-28 6:24 ` Sedat Dilek 2010-12-01 18:03 ` David Miller 2010-12-03 16:55 ` Nick Piggin 2010-12-07 11:25 ` Dave Chinner 2010-12-07 15:24 ` Nick Piggin 2010-12-07 15:24 ` Nick Piggin 2010-12-07 15:49 ` Peter Zijlstra 2010-12-07 15:59 ` Nick Piggin 2010-12-07 16:23 ` Peter Zijlstra 2010-12-08 3:28 ` Nick Piggin 2010-12-07 21:56 ` Dave Chinner 2010-12-08 1:47 ` Nick Piggin 2010-12-08 3:32 ` Dave Chinner 2010-12-08 4:28 ` Dave Chinner 2010-12-08 7:09 ` Nick Piggin 2010-12-08 7:09 ` Nick Piggin 2010-12-10 20:32 ` Paul E. McKenney 2010-12-12 14:54 ` Paul E. McKenney 2010-12-12 14:54 ` Paul E. McKenney
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20101209045017.GC3139@amd \ --to=npiggin@kernel.dk \ --cc=david@fromorbit.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.