From: Johannes Weiner <firstname.lastname@example.org> To: Dave Chinner <email@example.com> Cc: Andrew Morton <firstname.lastname@example.org>, Roman Gushchin <email@example.com>, Tejun Heo <firstname.lastname@example.org>, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org Subject: Re: [PATCH 4/4] vfs: keep inodes with page cache off the inode shrinker LRU Date: Fri, 18 Jun 2021 12:45:14 -0400 [thread overview] Message-ID: <YMzNmpaFurN3email@example.com> (raw) In-Reply-To: <20210617004942.GF2419729@dread.disaster.area> On Thu, Jun 17, 2021 at 10:49:42AM +1000, Dave Chinner wrote: > On Wed, Jun 16, 2021 at 12:54:15AM -0400, Johannes Weiner wrote: > > On Wed, Jun 16, 2021 at 11:20:08AM +1000, Dave Chinner wrote: > > > On Tue, Jun 15, 2021 at 02:50:09PM -0400, Johannes Weiner wrote: > > > > On Tue, Jun 15, 2021 at 04:26:40PM +1000, Dave Chinner wrote: > > > > > On Mon, Jun 14, 2021 at 05:19:04PM -0400, Johannes Weiner wrote: > > > > > > @@ -1123,6 +1125,9 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, > > > > > > shadow = workingset_eviction(page, target_memcg); > > > > > > __delete_from_page_cache(page, shadow); > > > > > > xa_unlock_irq(&mapping->i_pages); > > > > > > + if (mapping_shrinkable(mapping)) > > > > > > + inode_add_lru(mapping->host); > > > > > > + spin_unlock(&mapping->host->i_lock); > > > > > > > > > > > > > > > > No. Inode locks have absolutely no place serialising core vmscan > > > > > algorithms. > > > > > > > > What if, and hear me out on this one, core vmscan algorithms change > > > > the state of the inode? > > > > > > Then the core vmscan algorithm has a layering violation. > > > > You're just playing a word game here. > > No, I've given you plenty of constructive justification and ways to > restructure your patches to acheive what you say needs to be done. > > You're the one that is rejecting any proposal I make outright and > making unjustified claims that "I don't understand this code". Hey, come on now. The argument I was making is that page cache state is already used to update the inode LRU, and you incorrectly claimed that this wasn't the case. My statement was a direct response to this impasse, not a way to weasel out of your feedback. > I haven't disagreed at all with what you are trying to do, nor do I > think that being more selective about how we track inodes on the > LRUs is a bad thing. That's what it sounded like to me, but I'll chalk that up as a misunderstanding then. > What I'm commenting on is that the proposed changes are *really bad > code*. I'm not in love with it either, I can tell you that. But it also depends on the alternatives. I don't want to fix one bug and introduce a scalability issue. Or reintroduce subtle unforeseen shrinker issues that had you revert the previous fix. A revert, I might add, that could have been the incremental fix you proposed here. Instead you glossed over Roman's rootcausing and eintroduced the original bug. Now we're here, almost three years later, still on square one. So yeah, my priority is to get the behavior right first, and then worry about architectural beauty within those constraints. > If you can work out a *clean* way to move inodes onto the LRU when > they are dirty then I'm all for it. But sprinkling inode->i_lock all > over the mm/ subsystem and then adding seemling randomly placed > inode lru manipulations isn't the way to do it. > > You should consider centralising all the work involved marking a > mapping clean somewhere inside the mm/ code. Then add a single > callout that does the inode LRU work, similar to how the > fs-writeback.c code does it when the inode is marked clean by > inode_sync_complete(). Yes, I'd prefer that as well. Let's look at the options. The main source of complication is that the page cache cannot hold a direct reference on the inode; holding the xa_lock or the i_lock is the only thing that keeps the inode alive after we removed the page. So our options are either overlapping the lock sections, or taking the rcu readlock on the page cache side to bridge the deletion and the inode callback - which then has to deal with the possibility that the inode may have already been destroyed by the time it's called. I would put the RCU thing aside for now as it sounds just a bit too hairy, and too low-level an insight into the inode lifetime from the mapping side. The xa_lock is also dropped in several outer entry functions, so while it would clean up the fs side a bit, we wouldn't reduce the blast radius on the MM side. When we overlap lock sections, there are two options: a) This patch, with the page cache lock nesting inside the i_lock. Or, b) the way we handle dirty state: When we call set_page_dirty() -> mark_inode_dirty(), we hold the lock that serializes the page cache state when locking and updating the inode state. The hierarchy is: lock_page(page) # MM spin_lock(&inode->i_lock) # FS The equivalent for this patch would be to have page_cache_delete() call mark_inode_empty() (or whatever name works for everybody), again under the lock that serializes the page cache state: xa_lock_irq(&mapping->i_pages) # MM spin_lock(&inode->i_lock) # FS There would be one central place calling into the fs with an API function, encapsulating i_lock handling in fs/inode.c. Great. The major caveat here is that i_lock would need to become IRQ safe in order to nest inside the xa_lock. It's not that the semantical layering of code here is new in any way, it's simply the lock type. As far as I can see, i_lock hold times are quite short - it's a spinlock after all. But I haven't reviewed all the sites yet, and there are a lot of them. They would all need to be updated. Likewise, list_lru locking needs to be made irq-safe. However, irqsafe spinlock is sort of the inevitable fate of any lock embedded in a data structure API. So I'm less concerned about that. AFAICS nothing else nests under i_lock. If FS folks are fine with that, I would give that conversion a shot. Lock type dependency aside, this would retain full modularity and a clear delineation between mapping and inode property. It would also be a fully locked scheme, so none of the subtleties of the current patch. The end result seems clean and maintanable.
next prev parent reply other threads:[~2021-06-18 16:45 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-14 21:19 [PATCH 1/4] mm: remove irqsave/restore locking from contexts with irqs enabled Johannes Weiner 2021-06-14 21:19 ` [PATCH 2/4] fs: drop_caches: fix skipping over shadow cache inodes Johannes Weiner 2021-06-14 22:31 ` Roman Gushchin 2021-06-14 21:19 ` [PATCH 3/4] fs: inode: count invalidated shadow pages in pginodesteal Johannes Weiner 2021-06-14 21:19 ` [PATCH 4/4] vfs: keep inodes with page cache off the inode shrinker LRU Johannes Weiner 2021-06-14 21:59 ` Andrew Morton 2021-06-14 22:41 ` Johannes Weiner 2021-06-15 6:26 ` Dave Chinner 2021-06-15 18:50 ` Johannes Weiner 2021-06-16 1:20 ` Dave Chinner 2021-06-16 4:54 ` Johannes Weiner 2021-06-17 0:49 ` Dave Chinner 2021-06-18 16:45 ` Johannes Weiner [this message] 2021-06-28 18:58 ` Johannes Weiner 2021-06-17 1:30 ` Andrew Morton 2021-06-18 17:09 ` Johannes Weiner 2021-06-28 17:12 ` Johannes Weiner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YMzNmpaFurN3firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH 4/4] vfs: keep inodes with page cache off the inode shrinker LRU' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).