On Wed, Mar 17, 2021 at 3:23 PM Nicholas Piggin wrote: > > Might take some time to get a system and run tests. We actually had > difficulty recreating it before this patch too, so it's kind of > hard to say _that_ was the exact case that previously ran badly and > is now fixed. We thought just the statistical nature of collisions > and page / lock contention made things occasionally line up and > tank. Yeah, it probably depends a lot on the exact page allocation patterns and just bad luck. Plus the actual spinlocks will end up with false sharing anyway, so you might need to have both contention on multiple pages on the same hash queue _and_ perhaps some action on other pages that just hash close-by so that they make the cache coherency protocol have to work harder.. And then 99% of the time it all just works out fine, because we only need to look at the hashed spinlocks when we have any contention on the page lock in the first place, and that generally should be something we should avoid for other reasons. But it is perhaps also a sign that while 256 entries is "ridiculously small", it might be the case that it's not necessarily much of a real problem in the end, and I get the feeling that growing it by several orders of magnitude is overkill. In fact, the problems we have had with the page wait queue have often been because we did too much page locking in the first place. I actually have a couple of patches in my "still thinking about it tree" (that have been there for six months by now, so maybe not thinking too _actively_ about it) which remove some overly stupid page locking. For example, look at "do_shared_fault()". Currently __do_fault() always locks the result page. Then if you have a page_mkwrite() function, we immediately unlock it again. And then we lock it again in do_page_mkwrite(). Does that particular case matter? I really have no idea. But we basically lock it twice for what looks like absolutely zero reason other than calling conventions. Bah. I'll just attach my three "not very actively thinking about it" patches here in case anybody else wants to not actively think about them. I've actually been running these on my own machine since October, since the kernel I actually boot on my main desktop always includes my "thinking about it" patches. The two first patches should fix the double lock thing I mentioned. The third patch should be a no-op right now, but the thinking is outlined in the commit message: why do we even lock pages in filemap_fault? I'm not actually convinced we need to. I think we could return an unputodate page unlocked, and instead do the checking at pte install time (I had some discussions with Kirill about instead doing it at pte install time, and depending entirely on memory ordering constraints wrt page truncation that *does* take the page lock, and then does various other checks - including the page mapcount and taking the ptl lock - under that lock). Anyway, I don't mind the "make the hash table larger" regardless of this, but I do want it to be documented a bit more. Linus