On Wed, Oct 1, 2014 at 9:18 AM, Linus Torvalds wrote: > > So I'd really suggest we do exactly that. Get rid of "pte_numa()" > entirely, get rid of "_PAGE_[BIT_]NUMA" entirely, and instead add a > "pte_protnone()" helper to check for the "protnone" case (which on x86 > is testing the _PAGE_PROTNONE bit, and on most other architectures is > just testing that the page has no access rights). > > Then we throw away "pte_mknuma()" and "pte_mknonnuma()" entirely, > because they are brainless sh*t, and we just use > > ptent = ptep_modify_prot_start(mm, addr, pte); > ptent = pte_modify(ptent, newprot); > ptep_modify_prot_commit(mm, addr, pte, ptent); > > reliably instead (where for the mknuma case "newprot" is PROT_NONE, > and for mknonnuma() it is vma->vm_page_prot. Yes, that means that you > have to pass in the vma to those functions, but that just makes sense > anyway. > > And if that means that we lose the numa flag on mprotect etc, nobody sane cares. So here is a *COMPLETELY UNTESTED* and probably seriously buggy first version of such a patch. It doesn't do the powerpc conversion, so somebody would need to check that eventually, but aside from that obvious issue, can people fix this up? Or comment on why it doesn't work. Now, I like this because it gets rid of the horrible PAGE_NUMA special cases, but it really seems to simplify things in general. Lookie here: 13 files changed, 74 insertions(+), 268 deletions(-) that's really mainly just the removal of odd and broken numa pte/pmd helper functions from that aren't needed any more because the normal "change protections" functions just DTRT automatically. Although there are actually a few other cases that got simpler too, so it's not *just* removal of those _PAGE_NUMA-specific helpers. One thing this does *not* remove is the special pte locking rule in the "change_*_range()" functions: they still take that broken "prot_numa" argument. HOWEVER, it isn't actually used for any page table modifications, the only reason for it existing is the hacky locking issue (see lock_pte_protection(), and the comment about races with the transhuge accesses). Now, I'll be honest: this patch *migth* just work, but I expect it to have some stupid problem. It compiles. I haven't even dared boot it, much less try any numa benchmarks that woudln't show anything sane on my machine anyway. So I'm really sending this patch out in the hope that it will get comments, fixup and possibly even testing by people who actually know the NUMA balancing code. Rik? Anybody? Linus