* [git pull] vfs.git pile 3 - dcache @ 2022-08-03 18:39 Al Viro 2022-08-03 18:57 ` Linus Torvalds 2022-08-03 19:00 ` pr-tracker-bot 0 siblings, 2 replies; 18+ messages in thread From: Al Viro @ 2022-08-03 18:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel The following changes since commit b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3: Linux 5.19-rc2 (2022-06-12 16:11:37 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git tags/pull-work.dcache for you to fetch changes up to 50417d22d0efbb1be76c3cb66b2329f83741c9c7: fs/dcache: Move wakeup out of i_seq_dir write held region. (2022-07-30 00:38:16 -0400) ---------------------------------------------------------------- Main part here is making parallel lookups safe for RT - making sure preemption is disabled in start_dir_add()/ end_dir_add() sections (on non-RT it's automatic, on RT it needs to to be done explicitly) and moving wakeups from __d_lookup_done() inside of such to the end of those sections. Wakeups can be safely delayed for as long as ->d_lock on in-lookup dentry is held; proving that has caught a bug in d_add_ci() that allows memory corruption when sufficiently bogus ntfs (or case-insensitive xfs) image is mounted. Easily fixed, fortunately. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> ---------------------------------------------------------------- Al Viro (1): d_add_ci(): make sure we don't miss d_lookup_done() Sebastian Andrzej Siewior (3): fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT fs/dcache: Move the wakeup from __d_lookup_done() to the caller. fs/dcache: Move wakeup out of i_seq_dir write held region. fs/dcache.c | 54 ++++++++++++++++++++++++++++++++++++++++---------- include/linux/dcache.h | 9 +++------ 2 files changed, 46 insertions(+), 17 deletions(-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 18:39 [git pull] vfs.git pile 3 - dcache Al Viro @ 2022-08-03 18:57 ` Linus Torvalds 2022-08-03 19:49 ` Al Viro 2022-08-03 21:54 ` Steven Rostedt 2022-08-03 19:00 ` pr-tracker-bot 1 sibling, 2 replies; 18+ messages in thread From: Linus Torvalds @ 2022-08-03 18:57 UTC (permalink / raw) To: Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner Cc: linux-kernel, linux-fsdevel On Wed, Aug 3, 2022 at 11:39 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > Main part here is making parallel lookups safe for RT - making > sure preemption is disabled in start_dir_add()/ end_dir_add() sections (on > non-RT it's automatic, on RT it needs to to be done explicitly) and moving > wakeups from __d_lookup_done() inside of such to the end of those sections. Ugh. I really dislike this pattern: if (IS_ENABLED(CONFIG_PREEMPT_RT)) preempt_disable(); ... if (IS_ENABLED(CONFIG_PREEMPT_RT)) preempt_enable(); and while the new comment explains *why* it exists, it's still very ugly indeed. We have it in a couple of other places, and we also end up having another variation on the theme that is about "migrate_{dis,en}able()", except it is written as if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_disable(); else preempt_disable(); because on non-PREEMPT_RT obviously preempt_disable() is the better and simpler thing. Can we please just introduce helper functions? At least that if (IS_ENABLED(CONFIG_PREEMPT_RT)) preempt_disable(); ... pattern could be much more naturally expressed as preempt_disable_under_spinlock(); ... which would make the code really explain what is going on. I would still encourage that *comment* about it, but I think we really should strive for code that makes sense even without a comment. The fact that then without PREEMPT_RT, the whole "preempt_disable_under_spinlock()" becomes a no-op is then an implementation detail - and not so different from how a regular preempt_disable() becomes a no-op when on UP (or with PREEMPT_NONE). And that "preempt_disable_under_spinlock()" really documents what is going on, and I feel would make that code easier to understand? The fact that PREEMPT_RT has different rules about preemption is not something that the dentry code should care about. The dentry code could just say "I want to disable preemption, and I already hold a spinlock, so do what is best". So then "preempt_disable_under_spinlock()" precisely documents what the dentry code really wants. No? Anyway, I have pulled this, but I really would like fewer of these random PREEMPT_RT turds around, and more "this code makes sense" code. Linus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 18:57 ` Linus Torvalds @ 2022-08-03 19:49 ` Al Viro 2022-08-03 21:54 ` Steven Rostedt 1 sibling, 0 replies; 18+ messages in thread From: Al Viro @ 2022-08-03 19:49 UTC (permalink / raw) To: Linus Torvalds Cc: Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Wed, Aug 03, 2022 at 11:57:27AM -0700, Linus Torvalds wrote: > On Wed, Aug 3, 2022 at 11:39 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Main part here is making parallel lookups safe for RT - making > > sure preemption is disabled in start_dir_add()/ end_dir_add() sections (on > > non-RT it's automatic, on RT it needs to to be done explicitly) and moving > > wakeups from __d_lookup_done() inside of such to the end of those sections. > > Ugh. > > I really dislike this pattern: > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > preempt_disable(); > ... > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > preempt_enable(); > > and while the new comment explains *why* it exists, it's still very ugly indeed. > > We have it in a couple of other places, and we also end up having > another variation on the theme that is about "migrate_{dis,en}able()", > except it is written as > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > migrate_disable(); > else > preempt_disable(); > > because on non-PREEMPT_RT obviously preempt_disable() is the better > and simpler thing. > > Can we please just introduce helper functions? > > At least that > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > preempt_disable(); > ... > > pattern could be much more naturally expressed as > > preempt_disable_under_spinlock(); > ... > > which would make the code really explain what is going on. I would > still encourage that *comment* about it, but I think we really should > strive for code that makes sense even without a comment. > > The fact that then without PREEMPT_RT, the whole > "preempt_disable_under_spinlock()" becomes a no-op is then an > implementation detail - and not so different from how a regular > preempt_disable() becomes a no-op when on UP (or with PREEMPT_NONE). > > And that "preempt_disable_under_spinlock()" really documents what is > going on, and I feel would make that code easier to understand? The > fact that PREEMPT_RT has different rules about preemption is not > something that the dentry code should care about. > > The dentry code could just say "I want to disable preemption, and I > already hold a spinlock, so do what is best". > > So then "preempt_disable_under_spinlock()" precisely documents what > the dentry code really wants. > > No? Fine by me, but I think that this is better dealt with by the rt folks; I've no objections to replacing that open-coded stuff in dcache.c with better documented primitives, so when such patches materialize... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 18:57 ` Linus Torvalds 2022-08-03 19:49 ` Al Viro @ 2022-08-03 21:54 ` Steven Rostedt 2022-08-03 22:09 ` Linus Torvalds 1 sibling, 1 reply; 18+ messages in thread From: Steven Rostedt @ 2022-08-03 21:54 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Wed, Aug 03, 2022 at 11:57:27AM -0700, Linus Torvalds wrote: > > I really dislike this pattern: > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > preempt_disable(); > ... > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > preempt_enable(); > > and while the new comment explains *why* it exists, it's still very ugly indeed. > > We have it in a couple of other places, and we also end up having > another variation on the theme that is about "migrate_{dis,en}able()", > except it is written as > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > migrate_disable(); > else > preempt_disable(); > > because on non-PREEMPT_RT obviously preempt_disable() is the better > and simpler thing. > > Can we please just introduce helper functions? > > At least that > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > preempt_disable(); > ... > > pattern could be much more naturally expressed as > > preempt_disable_under_spinlock(); > ... > The original patch years ago use to have: preempt_disable_rt() preempt_enable_rt() That did exactly that, but an effort was made to get rid of it. But your more descriptive "preempt_enable/disable_under_spinlock()" may make more sense. -- Steve ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 21:54 ` Steven Rostedt @ 2022-08-03 22:09 ` Linus Torvalds 2022-08-03 22:59 ` Steven Rostedt 0 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2022-08-03 22:09 UTC (permalink / raw) To: Steven Rostedt Cc: Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Wed, Aug 3, 2022 at 2:55 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > The original patch years ago use to have: > > preempt_disable_rt() > > preempt_enable_rt() That may be visually simpler, but I dislike how it's named for some implementation detail, rather than for the semantic meaning. Admittedly I think "preempt_enable_under_spinlock()" may be a bit *too* cumbersome as a name. It does explain what is going on - and both the implementation and the use end up being fairly clear (and the non-RT case could have some debug version that actually tests that preemption has already been disabled). But it is also a ridiculously long name, no question about that. I still feel is less cumbersome than having that "IS_ENABLED(CONFIG_PREEMPT_RT)" test that also then pretty much requires a comment to explain what is going on. Linus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 22:09 ` Linus Torvalds @ 2022-08-03 22:59 ` Steven Rostedt 2022-08-03 23:24 ` Matthew Wilcox 0 siblings, 1 reply; 18+ messages in thread From: Steven Rostedt @ 2022-08-03 22:59 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Wed, 3 Aug 2022 15:09:23 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > Admittedly I think "preempt_enable_under_spinlock()" may be a bit > *too* cumbersome as a name. It does explain what is going on - and > both the implementation and the use end up being fairly clear (and the > non-RT case could have some debug version that actually tests that > preemption has already been disabled). preempt_disable_inlock() ? -- Steve ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 22:59 ` Steven Rostedt @ 2022-08-03 23:24 ` Matthew Wilcox 2022-08-03 23:42 ` Linus Torvalds 0 siblings, 1 reply; 18+ messages in thread From: Matthew Wilcox @ 2022-08-03 23:24 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Wed, Aug 03, 2022 at 06:59:36PM -0400, Steven Rostedt wrote: > On Wed, 3 Aug 2022 15:09:23 -0700 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > Admittedly I think "preempt_enable_under_spinlock()" may be a bit > > *too* cumbersome as a name. It does explain what is going on - and > > both the implementation and the use end up being fairly clear (and the > > non-RT case could have some debug version that actually tests that > > preemption has already been disabled). > > preempt_disable_inlock() ? preempt_disable_locked()? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 23:24 ` Matthew Wilcox @ 2022-08-03 23:42 ` Linus Torvalds 2022-08-04 0:42 ` Matthew Wilcox 2022-08-08 22:06 ` Thomas Gleixner 0 siblings, 2 replies; 18+ messages in thread From: Linus Torvalds @ 2022-08-03 23:42 UTC (permalink / raw) To: Matthew Wilcox Cc: Steven Rostedt, Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Wed, Aug 3, 2022 at 4:24 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Aug 03, 2022 at 06:59:36PM -0400, Steven Rostedt wrote: > > > > preempt_disable_inlock() ? > > preempt_disable_locked()? Heh. Shed painting in full glory. Let's try just "preempt_enable_under_spinlock()" and see. It's a bit long, but it's still shorter than the existing usage pattern. And we don't have "inlock" anywhere else, and while "locked" is a real pattern we have, it tends to be about other things (ie "I hold the lock that you need, so don't take it"). And this is _explicitly_ only about spinning locks, because sleeping locks don't do the preemption disable even without RT. So let's make it verbose and clear and unambiguous. It's not like I expect to see a _lot_ of those. Knock wood. We had a handful of those things before (in mm/vmstat, and now added another case to the dentry code. So it has become a pattern, but I really really hope it's not exactly a common pattern. And so because it's not common, typing a bit more is a good idea - and making it really clear is probably also a good idea. Linus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 23:42 ` Linus Torvalds @ 2022-08-04 0:42 ` Matthew Wilcox 2022-08-04 1:32 ` Steven Rostedt 2022-08-08 22:06 ` Thomas Gleixner 1 sibling, 1 reply; 18+ messages in thread From: Matthew Wilcox @ 2022-08-04 0:42 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Wed, Aug 03, 2022 at 04:42:43PM -0700, Linus Torvalds wrote: > On Wed, Aug 3, 2022 at 4:24 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Wed, Aug 03, 2022 at 06:59:36PM -0400, Steven Rostedt wrote: > > > > > > preempt_disable_inlock() ? > > > > preempt_disable_locked()? > > Heh. Shed painting in full glory. > > Let's try just "preempt_enable_under_spinlock()" and see. > > It's a bit long, but it's still shorter than the existing usage pattern. > > And we don't have "inlock" anywhere else, and while "locked" is a real > pattern we have, it tends to be about other things (ie "I hold the > lock that you need, so don't take it"). > > And this is _explicitly_ only about spinning locks, because sleeping > locks don't do the preemption disable even without RT. > > So let's make it verbose and clear and unambiguous. It's not like I > expect to see a _lot_ of those. Knock wood. Should we have it take a spinlock_t pointer? We could have lockdep check it is actually held. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-04 0:42 ` Matthew Wilcox @ 2022-08-04 1:32 ` Steven Rostedt 2022-08-04 2:16 ` Linus Torvalds 0 siblings, 1 reply; 18+ messages in thread From: Steven Rostedt @ 2022-08-04 1:32 UTC (permalink / raw) To: Matthew Wilcox Cc: Linus Torvalds, Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Thu, 4 Aug 2022 01:42:25 +0100 Matthew Wilcox <willy@infradead.org> wrote: > > So let's make it verbose and clear and unambiguous. It's not like I > > expect to see a _lot_ of those. Knock wood. > > Should we have it take a spinlock_t pointer? We could have lockdep > check it is actually held. We don't care if the lock is held or not. The point of the matter is that spinlocks in RT do not disable preemption. The code that the preempt_disable_under_spinlock() is inside, can not be preempted. If it is, bad things can happen. Currently this code assumes that spinlocks disable preemption, so there's no need to disable preemption here. But in RT, just holding the spinlock is not enough to disable preemption, hence we need to explicitly call it here. As Linus's name suggests, the "preempt_enable_under_spinlock" is to make sure preemption is disabled regardless if it's under a normal spinlock that disables preemption, or a RT spinlock that does not. I wonder if raw_preempt_disable() would be another name to use? We have raw_spin_lock() to denote that it's a real spinlock even under PREEMPT_RT. We could say that "raw_preempt_disable()" makes sure the location really has preemption disabled regardless of PREEMPT_RT. -- Steve ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-04 1:32 ` Steven Rostedt @ 2022-08-04 2:16 ` Linus Torvalds 2022-08-04 10:52 ` Steven Rostedt 0 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2022-08-04 2:16 UTC (permalink / raw) To: Steven Rostedt Cc: Matthew Wilcox, Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Wed, Aug 3, 2022 at 6:33 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > We don't care if the lock is held or not. The point of the matter is that > spinlocks in RT do not disable preemption. The code that the > preempt_disable_under_spinlock() is inside, can not be preempted. If it is, > bad things can happen. I think you're missing Willy's point - the use would be to verify that the spinlock really *is* held, because that's what disables preemption on non-RT. But no, I don't think it's worth the pain to have to specify which spinlock is held, because the spinlock might have been taken by the caller and we don't even have access to it - or care - we just know somebody did take it. If we want extra debuggingm it might be something like just verifying that yes, the preempt count (on a non-RT preemptible kernel) really is elevated already. > I wonder if raw_preempt_disable() would be another name to use? NO! The point is that normal non-RT code does *not* disable preemption at all, because it is already disabled thanks to the earlier spinlock. So we definitely do *not* want to call this "raw_preempt_disable()", because it's actually not supposed to normally disable anything at all. Only for RT, where the spinlock code doesn't do it. Linus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-04 2:16 ` Linus Torvalds @ 2022-08-04 10:52 ` Steven Rostedt 0 siblings, 0 replies; 18+ messages in thread From: Steven Rostedt @ 2022-08-04 10:52 UTC (permalink / raw) To: Linus Torvalds Cc: Matthew Wilcox, Al Viro, Sebastian Andrzej Siewior, Thomas Gleixner, linux-kernel, linux-fsdevel On Wed, 3 Aug 2022 19:16:12 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I wonder if raw_preempt_disable() would be another name to use? > > NO! > > The point is that normal non-RT code does *not* disable preemption at > all, because it is already disabled thanks to the earlier spinlock. > > So we definitely do *not* want to call this "raw_preempt_disable()", > because it's actually not supposed to normally disable anything at > all. Only for RT, where the spinlock code doesn't do it. Yeah, I'm just brainstorming ideas on what we could use to make that name a little shorter, and I'm not coming up with much. OK, I'm becoming colorblind with this shed. -- Steve ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 23:42 ` Linus Torvalds 2022-08-04 0:42 ` Matthew Wilcox @ 2022-08-08 22:06 ` Thomas Gleixner 2022-08-08 22:43 ` Linus Torvalds 1 sibling, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2022-08-08 22:06 UTC (permalink / raw) To: Linus Torvalds, Matthew Wilcox Cc: Steven Rostedt, Al Viro, Sebastian Andrzej Siewior, linux-kernel, linux-fsdevel On Wed, Aug 03 2022 at 16:42, Linus Torvalds wrote: > On Wed, Aug 3, 2022 at 4:24 PM Matthew Wilcox <willy@infradead.org> wrote: >> On Wed, Aug 03, 2022 at 06:59:36PM -0400, Steven Rostedt wrote: >> > >> > preempt_disable_inlock() ? >> >> preempt_disable_locked()? > > Heh. Shed painting in full glory. > > Let's try just "preempt_enable_under_spinlock()" and see. > > It's a bit long, but it's still shorter than the existing usage pattern. > > And we don't have "inlock" anywhere else, and while "locked" is a real > pattern we have, it tends to be about other things (ie "I hold the > lock that you need, so don't take it"). > > And this is _explicitly_ only about spinning locks, because sleeping > locks don't do the preemption disable even without RT. > > So let's make it verbose and clear and unambiguous. It's not like I > expect to see a _lot_ of those. Knock wood. > > We had a handful of those things before (in mm/vmstat, and now added > another case to the dentry code. So it has become a pattern, but I > really really hope it's not exactly a common pattern. > > And so because it's not common, typing a bit more is a good idea - and > making it really clear is probably also a good idea. Sebastian and me looked over it. The use cases in mm/vmstat are not really all under spinlocks. That code gets called also just from plain local_irq or even just preempt disabled regions (depending on the stats item), which makes the proposed name less accurate than you describe. A worse case is the u64_stat code which is an ifdef maze (only partially due to RT). Those stats updates can also be called from various contexts where no spinlock is involved. That code is extra convoluted due to irqsave variants and "optimizations" for 32bit UP. Removing the latter would make a cleanup with write_seqcount_...() wrappers pretty simple. Aside of that we have RT conditional preempt related code in page_alloc() and kmap_atomic(). Both care only about the task staying pinned on a CPU. In page_alloc() using preempt_disable() on !RT is more lightweight than migrate_disable(). So something like task_[un]pin_temporary() might work and be descriptive enough. For kmap_atomic() it was decided back then when we introduced kmap_local() that we don't do a wholesale conversion and leave it to the maintainers/developers to look at it on a case by case basis as that has quite some cleanup potential at the call sites. 18 month later we still have 435 of the back then 527 call sites. Sadly enough there are 21 new instances vs. 71 removed and about 20 related cleanup patches ignored. So either we come up with something generic or we just resort to different wrappers for those use cases. I'll have another look with Sebastian tomorrow. Thoughts? Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-08 22:06 ` Thomas Gleixner @ 2022-08-08 22:43 ` Linus Torvalds 2022-08-09 16:00 ` Thomas Gleixner 0 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2022-08-08 22:43 UTC (permalink / raw) To: Thomas Gleixner Cc: Matthew Wilcox, Steven Rostedt, Al Viro, Sebastian Andrzej Siewior, linux-kernel, linux-fsdevel On Mon, Aug 8, 2022 at 3:06 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > The use cases in mm/vmstat are not really all under spinlocks. That code > gets called also just from plain local_irq or even just preempt disabled > regions (depending on the stats item), which makes the proposed name > less accurate than you describe. Augh. How about "preempt_disable_nested()" with a big comment about how some operations normally disable preemption (interrupts off, spinlocks, anything else?) but not on PREEMPT_RT? > A worse case is the u64_stat code which is an ifdef maze (only partially > due to RT). Those stats updates can also be called from various contexts > where no spinlock is involved. That code is extra convoluted due to > irqsave variants and "optimizations" for 32bit UP. Removing the latter > would make a cleanup with write_seqcount_...() wrappers pretty simple. I think we most definitely can start removing optimisations for 32-bit UP by now. Let's not do them without any reason, but any time you hit a code that makes you go "this makes it harder to do better", feel free to go all Alexander the Great on the 32-bit UP code and just cut through the problem by removing it. > Aside of that we have RT conditional preempt related code in > page_alloc() and kmap_atomic(). Both care only about the task staying > pinned on a CPU. In page_alloc() using preempt_disable() on !RT is more > lightweight than migrate_disable(). So something like > task_[un]pin_temporary() might work and be descriptive enough. Yeah, that was the other odd pattern. I'm not sure "temporary" is all that relevant, but yes, if we end up having more of those, some kind of "thread_{un]pin_cpu()" would probably be worth it. But the kmap code may be so special that nothing else has _that_ particular issue. Linus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-08 22:43 ` Linus Torvalds @ 2022-08-09 16:00 ` Thomas Gleixner 2022-08-09 16:15 ` Matthew Wilcox 0 siblings, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2022-08-09 16:00 UTC (permalink / raw) To: Linus Torvalds Cc: Matthew Wilcox, Steven Rostedt, Al Viro, Sebastian Andrzej Siewior, linux-kernel, linux-fsdevel Linus, On Mon, Aug 08 2022 at 15:43, Linus Torvalds wrote: > On Mon, Aug 8, 2022 at 3:06 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> The use cases in mm/vmstat are not really all under spinlocks. That code >> gets called also just from plain local_irq or even just preempt disabled >> regions (depending on the stats item), which makes the proposed name >> less accurate than you describe. > > Augh. > > How about "preempt_disable_nested()" with a big comment about how some > operations normally disable preemption (interrupts off, spinlocks, > anything else?) but not on PREEMPT_RT? Let me do that. >> A worse case is the u64_stat code which is an ifdef maze (only partially >> due to RT). Those stats updates can also be called from various contexts >> where no spinlock is involved. That code is extra convoluted due to >> irqsave variants and "optimizations" for 32bit UP. Removing the latter >> would make a cleanup with write_seqcount_...() wrappers pretty simple. > > I think we most definitely can start removing optimisations for 32-bit > UP by now. > > Let's not do them without any reason, but any time you hit a code that > makes you go "this makes it harder to do better", feel free to go all > Alexander the Great on the 32-bit UP code and just cut through the > problem by removing it. With that mopped up: 1 file changed, 42 insertions(+), 84 deletions(-) plus a followup cleanup of the then not longer required _irqsave/restore() variants: 8 files changed, 33 insertions(+), 62 deletions(-) is not a Great conquest, but makes the code definitely readable. The fetch/retry_irq() variants are then obsolete as well, but that's just a rename in 70 files and the removal of the two helpers. >> Aside of that we have RT conditional preempt related code in >> page_alloc() and kmap_atomic(). Both care only about the task staying >> pinned on a CPU. In page_alloc() using preempt_disable() on !RT is more >> lightweight than migrate_disable(). So something like >> task_[un]pin_temporary() might work and be descriptive enough. > > Yeah, that was the other odd pattern. I'm not sure "temporary" is all > that relevant, but yes, if we end up having more of those, some kind > of "thread_{un]pin_cpu()" would probably be worth it. > > But the kmap code may be so special that nothing else has _that_ > particular issue. We just want to get rid of kmap_atomic() completely. I'll go and find minions. Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-09 16:00 ` Thomas Gleixner @ 2022-08-09 16:15 ` Matthew Wilcox 2022-08-09 17:58 ` Thomas Gleixner 0 siblings, 1 reply; 18+ messages in thread From: Matthew Wilcox @ 2022-08-09 16:15 UTC (permalink / raw) To: Thomas Gleixner Cc: Linus Torvalds, Steven Rostedt, Al Viro, Sebastian Andrzej Siewior, linux-kernel, linux-fsdevel, Ira Weiny, Fabio M. De Francesco On Tue, Aug 09, 2022 at 06:00:17PM +0200, Thomas Gleixner wrote: > On Mon, Aug 08 2022 at 15:43, Linus Torvalds wrote: > > But the kmap code may be so special that nothing else has _that_ > > particular issue. > > We just want to get rid of kmap_atomic() completely. I'll go and find > minions. Be sure to coordinate with Ira & Fabio who are working on this. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-09 16:15 ` Matthew Wilcox @ 2022-08-09 17:58 ` Thomas Gleixner 0 siblings, 0 replies; 18+ messages in thread From: Thomas Gleixner @ 2022-08-09 17:58 UTC (permalink / raw) To: Matthew Wilcox Cc: Linus Torvalds, Steven Rostedt, Al Viro, Sebastian Andrzej Siewior, linux-kernel, linux-fsdevel, Ira Weiny, Fabio M. De Francesco On Tue, Aug 09 2022 at 17:15, Matthew Wilcox wrote: > On Tue, Aug 09, 2022 at 06:00:17PM +0200, Thomas Gleixner wrote: >> On Mon, Aug 08 2022 at 15:43, Linus Torvalds wrote: >> > But the kmap code may be so special that nothing else has _that_ >> > particular issue. >> >> We just want to get rid of kmap_atomic() completely. I'll go and find >> minions. > > Be sure to coordinate with Ira & Fabio who are working on this. Will do. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] vfs.git pile 3 - dcache 2022-08-03 18:39 [git pull] vfs.git pile 3 - dcache Al Viro 2022-08-03 18:57 ` Linus Torvalds @ 2022-08-03 19:00 ` pr-tracker-bot 1 sibling, 0 replies; 18+ messages in thread From: pr-tracker-bot @ 2022-08-03 19:00 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel The pull request you sent on Wed, 3 Aug 2022 19:39:25 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git tags/pull-work.dcache has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/200e340f2196d7fd427a5810d06e893b932f145a Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-08-09 17:58 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-03 18:39 [git pull] vfs.git pile 3 - dcache Al Viro 2022-08-03 18:57 ` Linus Torvalds 2022-08-03 19:49 ` Al Viro 2022-08-03 21:54 ` Steven Rostedt 2022-08-03 22:09 ` Linus Torvalds 2022-08-03 22:59 ` Steven Rostedt 2022-08-03 23:24 ` Matthew Wilcox 2022-08-03 23:42 ` Linus Torvalds 2022-08-04 0:42 ` Matthew Wilcox 2022-08-04 1:32 ` Steven Rostedt 2022-08-04 2:16 ` Linus Torvalds 2022-08-04 10:52 ` Steven Rostedt 2022-08-08 22:06 ` Thomas Gleixner 2022-08-08 22:43 ` Linus Torvalds 2022-08-09 16:00 ` Thomas Gleixner 2022-08-09 16:15 ` Matthew Wilcox 2022-08-09 17:58 ` Thomas Gleixner 2022-08-03 19:00 ` pr-tracker-bot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.