On Thu, Sep 17, 2020 at 12:51:49PM -0700, Linus Torvalds wrote: > On Thu, Sep 17, 2020 at 12:38 PM Jason Gunthorpe wrote: > > > > Looking for awhile, this now looks reasonable and > > doable. page_maybe_dma_pinned() was created for exactly this kind of > > case. > > > > I've attached a dumb sketch for the pte level (surely wrong! I have > > never looked at this part of the mm before!) at the end of this > > message. > > This looks conceptually fine to me. > > But as mentioned, I think I'd be even happier if we added a "thsi vma > has seen a page pin event" flag to the vma flags, and didn't rely > _just_ on the page_maybe_dma_pinned() check, which migth be triggered > by those fork-happy loads. > > Side note: I wonder if that COW mapping check could be entirely within > that vm_normal_page() path. > > Because how could a non-normal page be a COW page and not already > write-protected? I feel like we still need to keep those to cover !page case for PFNMAP. I tried to draft a patch, however I do see some other issues, so I'd like to discuss here before mindlessly going on with it. Firstly in the draft patch mm->has_pinned is introduced and it's written to 1 as long as FOLL_GUP is called once. It's never reset after set. One more thing (I think) we need to do is to pass the new vma from copy_page_range() down into the end because if we want to start cow during fork() then we need to operate on that new vma too when new page linked to it rather than the parent's. One issue is when we charge for cgroup we probably can't do that onto the new mm/task, since copy_namespaces() is called after copy_mm(). I don't know enough about cgroup, I thought the child will inherit the parent's, but I'm not sure. Or, can we change that order of copy_namespaces() && copy_mm()? I don't see a problem so far but I'd like to ask first.. The other thing is on how to fail. E.g., when COW failed due to either charging of cgroup or ENOMEM, ideally we should fail fork() too. Though that might need more changes - current patch silently kept the shared page for simplicity. The draft patch is attached for reference. It should cover the small pages, but at least it needs to cover huge pages and also a well written commit log. Thanks, -- Peter Xu