On Sat, Oct 29, 2022 at 11:05 AM Nadav Amit wrote: > > Anyhow, I am not sure whether the solution that you propose would work. > Please let me know if my understanding makes sense. Let me include my (UNTESTED! PROBABLY GARBAGE!) patch here just as a "I meant something like this". Note that it's untested, but I tried to make it intentionally use different names and an opaque type in the 'mmu_gather' data structure so that at least these bit games end up being type-safe and more likely to be correct if it compiles. And I will say that it does compile for me - but only in my normal x86-64 config. I haven't tested anything else, and I guarantee it won't build on s390, for example, because s390 has its own mmu_gather functions. > Let’s assume that we do not call set_page_dirty() before we remove the rmap > but only after we invalidate the page [*]. Let’s assume that > shrink_page_list() is called after the page’s rmap is removed and the page > is no longer mapped, but before set_page_dirty() was actually called. > > In such a case, shrink_page_list() would consider the page clean, and would > indeed keep the page (since __remove_mapping() would find elevated page > refcount), which appears to give us a chance to mark the page as dirty > later. Right. That is not different to any other function (like "write()" having looked up the page. > However, IIUC, in this case shrink_page_list() might still call > filemap_release_folio() and release the buffers, so calling set_page_dirty() > afterwards - after the actual TLB invalidation took place - would fail. I'm not seeing why. That would imply that any "look up page, do set_page_dirty()" is broken. They don't have rmap either. And we have a number of them all over (eg think "GUP users" etc). But hey, you were right about the stale TLB case, so I may just be missing something. I *think* the important thing is just that we need to mark the page dirtty from the page tables _after_ we've flushed the TLB, just to make sure that there can be no subsequent dirtiers that then get lost. Anyway, I think the best documentation for "this is what I meant" is simply the patch. Does this affect your PoC on your setup? I tried to run your program on my machine (WITHOUT this patch - I have compiled this patch, but I haven't even booted it - it really could be horrible broken). But it doesn't fail for me even without the patch and I just get "Finished without an error" over and over again - but you said it had to be run on a RAM block device, which I didn't do, so my testing is very suspect,. Again: THIS PATCH IS UNTESTED. I feel like it might actually work, if only because I tried to be so careful with the type system. But that "might actually work" is probably me being wildly optimistic, and also depends on the whole concept being ok in the first place. So realistically, think of this patch more as a "document in code what Linus meant with his incoherent ramblings" rather than anything else. Hmm? Linus