On Thu, Jul 23, 2020 at 4:11 PM Hugh Dickins wrote: > > On Thu, 23 Jul 2020, Linus Torvalds wrote: > > > > I'll send a new version after I actually test it. > > I'll give it a try when you're happy with it. Ok, what I described is what I've been running for a while now. But I don't put much stress on my system with my normal workload, so.. > I did try yesterday's > with my swapping loads on home machines (3 of 4 survived 16 hours), > and with some google stresstests on work machines (0 of 10 survived). > > I've not spent long analyzing the crashes, all of them in or below > __wake_up_common() called from __wake_up_locked_key_bookmark(): > sometimes gets to run the curr->func() and crashes on something > inside there (often list_del's lib/list_debug.c:53!), sometimes > cannot get that far. Looks like the wait queue entries on the list > were not entirely safe with that patch. Hmm. The bug Oleg pointed out should be pretty theoretical. But I think the new approach with WQ_FLAG_WOKEN was much better anyway, despite me missing that one spot in the first version of the patch. So here's two patches - the first one does that wake_page_function() conversion, and the second one just does the memory ordering cleanup I mentioned. I don't think the second one shouldn't matter on x86, but who knows. I don't enable list debugging, but I find list corruption surprising. All of _that_ should be inside the page waiqueue lock, the only unlocked part was the "list_empty_careful()" part. But I'll walk over my patch mentally one more time. Here's the current version, anyway. Linus