This is a followup to my post of last week (Aug 12) about remap_file_pages protection support. I've improved and consolidated the patches and updated them against 2.6.13-rc6/rc7 (the same patches apply against both versions). I'm sending the full patch series only to akpm, mingo and LKML. I've also reduced them to only 18, and made the splitting more significant. I'm not resending all the patches for foreign architectures, because they're almost unchanged since last time (there's just a trivial reject from ppc32, because one change has already been done after -rc4). I'm working on this to provide support for UML, which currently easily creates more than 64K (the default limit) vma's for a single process. Actually, it needs one VMA per each page. So, with this patch and specific UML support, which Ingo wrote and which I'm porting to recent UMLs. Some highlights: * The first 2 patches modify the PTE encoding macros and start preparing the VM for the new situation (i.e. VMA which have variable protections, which are called VM_NONUNIFORM. I dropped the early VM_MANYPROTS name). Patch number 2 will require fixing up all arches like in 2.6.4-rc2-mm1, to provide the new PTE encoding macros. * Patch 5 allows the syscall to actually create such VMAs. Before that, there's no difference in behaviour with the current kernel (except that there's less space for file offset encoding in PTEs). And even here, the new operations are only enabled for arch explicitly supporting it (see patch #7). * Patch 8 and 9 change the path for handling page faults, since the permission checking on nonuniform vmas cannot be done until the PTE entry has been read. This is the most intrusive part, but a) archs are not required to adequate to this immediately b) it isn't so difficult in practice. * Patch 11 is a big simplification. Since we must encode the PTE's on swapout like in VM_NONLINEAR vmas, the simplest way to reuse the existing code is to make sure that VM_NONUNIFORM vmas are also marked as VM_NONLINEAR. It is possible to avoid this, as in patch #18, but it's just a bit scary, and Then there are 4 optimization patches and 3 fixups for some odd cases that we maybe won't support. They are namely: *) vmas with default PROT_NONE protection (I actually feel we're going to support this, the only patch which has problems is an optimization) *) MAP_POPULATE on private VMA (no problem on this) and consequently remap_file_pages on private VMA to install linear uniform mappings (since MAP_POPULATE is implemented in terms of remap_file_pages): there's a patch to stop this from truncating COW pages away, but I don't think it's worth it. *) linear nonuniform vmas. I initially created them because there's no relation between being nonlinear and nonuniform, but it later turned out supporting them is intrusive. I have improved even more the patches, and understood better some changes from Ingo which I didn't last time, and fixed their bugs. I hope these changes can be reviewed, and included inside -mm, even if they'll conflict with pagefault scalability patches (even if I think the conflicts are not difficult to solve). Still, the patch is IMHO in better shape, in many ways, than when it was in -mm last time. To handle properly all possibilities it has become a bit more intrusive. The original one was designed to handle only the simpler needs of UML (an mmap'ing with PROT_NONE followed by nonlinear and nonuniform remappings), but it still failed in some cases. I've taken original Ingo's test-program and significantly extended it, it's attached to this patch. I'll appreciate any comments. ============== Changes from 2.6.5-mm1/dropped version of the patches: ============== *) Actually implemented _real_ and _anal_ protection support, safe against swapout; programs get SIGSEGV *always* when they should. I've used the attached test program (an improved version of Ingo's one) to check that. I tested just until patch 25, onto UML. The subsequent ones are either patches for foreign archs or proposed *) Fixed many changes present in the patches. *) Fixed UML bits *) Added some headaches for arches ports. I've also included some patches which reduce this. *) No more usage of a new syscall slot: to use the new interface, application will use the new MAP_NOINHERIT flag I've added. I've still the patches to use the old -mm ABI, if there's any reason they're needed. *) Fixed a regression wrt using mprotect() against remapped area (see patch 15) ====== Changes from my last patch-bomb of the patches: ====== *) fixed mprotect VS remap_file_pages(MAP_NOINHERIT) interaction *) fixed truncation (with madvise_dontneed or truncate()) of nonuniform but linear vmas. Either with patch 11, by removing "nonuniform but linear VMAs", or with patch 18. ====== Still todo ====== *) ->populate flushes each TLB individually, instead of using mmu_gathers as it should; this was suggested even by Ingo when sending the patch, but it seems he didn't get the time to finish this. And I'm now wondering how would that relate with I/O... at each I/O point we should finish and regather the mmu_gather, as in zap_page_range. But here we are reading pages, not the reverse! Seems rewriting the kernel locking is a quite time-consuming task! -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade