[RFC] [patch 0/18] remap_file_pages protection support (for UML), try 3

* [RFC] [patch 0/18] remap_file_pages protection support (for UML), try 3
@ 2005-08-26 18:23 Blaisorblade
  2005-08-26 19:11 ` Hugh Dickins
  2005-09-02 21:02 ` Hugh Dickins
  0 siblings, 2 replies; 16+ messages in thread
From: Blaisorblade @ 2005-08-26 18:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Hugh Dickins, Andi Kleen, LKML, Jeff Dike,
	Bodo Stroesser, user-mode-linux-devel

[-- Attachment #1: Type: text/plain, Size: 5522 bytes --]

This is a followup to my post of last week (Aug 12) about remap_file_pages 
protection support. I've improved and consolidated the patches and updated 
them against 2.6.13-rc6/rc7 (the same patches apply against both versions).
I'm sending the full patch series only to akpm, mingo and LKML.

I've also reduced them to only 18, and made the splitting more significant. 
I'm not resending all the patches for foreign architectures, because they're 
almost unchanged since last time (there's just a trivial reject from ppc32, 
because one change has already been done after -rc4).

I'm working on this to provide support for UML, which currently easily creates 
more than 64K (the default limit) vma's for a single process. Actually, it 
needs one VMA per each page. So, with this patch and specific UML support, 
which Ingo wrote and which I'm porting to recent UMLs.

Some highlights:

* The first 2 patches modify the PTE encoding macros and start preparing the 
VM for the new situation (i.e. VMA which have variable protections, which are 
called VM_NONUNIFORM. I dropped the early VM_MANYPROTS name).

Patch number 2 will require fixing up all arches like in 2.6.4-rc2-mm1, to 
provide the new PTE encoding macros.

* Patch 5 allows the syscall to actually create such VMAs. Before that, 
there's no difference in behaviour with the current kernel (except that 
there's less space for file offset encoding in PTEs). And even here, the new 
operations are only enabled for arch explicitly supporting it (see patch #7).

* Patch 8 and 9 change the path for handling page faults, since the permission 
checking on nonuniform vmas cannot be done until the PTE entry has been read.

This is the most intrusive part, but
a) archs are not required to adequate to this immediately
b) it isn't so difficult in practice.

* Patch 11 is a big simplification. Since we must encode the PTE's on swapout 
like in VM_NONLINEAR vmas, the simplest way to reuse the existing code is to 
make sure that VM_NONUNIFORM vmas are also marked as VM_NONLINEAR.

It is possible to avoid this, as in patch #18, but it's just a bit scary, and 

Then there are 4 optimization patches and 3 fixups for some odd cases that we 
maybe won't support. They are namely:
*) vmas with default PROT_NONE protection (I actually feel we're going to 
support this, the only patch which has problems is an optimization)

*) MAP_POPULATE on private VMA (no problem on this) and consequently 
remap_file_pages on private VMA to install linear uniform mappings (since 
MAP_POPULATE is implemented in terms of remap_file_pages): there's a patch to 
stop this from truncating COW pages away, but I don't think it's worth it.

*) linear nonuniform vmas. I initially created them because there's no 
relation between being nonlinear and nonuniform, but it later turned out 
supporting them is intrusive.

I have improved even more the patches, and understood better some changes from 
Ingo which I didn't last time, and fixed their bugs.

I hope these changes can be reviewed, and included inside -mm, even if they'll 
conflict with pagefault scalability patches (even if I think the conflicts 
are not difficult to solve).

Still, the patch is IMHO in better shape, in many ways, than when it was in 
-mm last time. To handle properly all possibilities it has become a bit more 
intrusive.

The original one was designed to handle only the simpler needs of 
UML (an mmap'ing with PROT_NONE followed by nonlinear and nonuniform 
remappings), but it still failed in some cases. I've taken original Ingo's 
test-program and significantly extended it, it's attached to this patch.

I'll appreciate any comments.

==============
Changes from 2.6.5-mm1/dropped version of the patches:
==============
*) Actually implemented _real_ and _anal_ protection support, safe against 
swapout; programs get SIGSEGV *always* when they should. I've used the 
attached test program (an improved version of Ingo's one) to check that.
I tested just until patch 25, onto UML. The subsequent ones are either patches 
for foreign archs or proposed

*) Fixed many changes present in the patches.
*) Fixed UML bits
*) Added some headaches for arches ports. I've also included some patches 
which reduce this.

*) No more usage of a new syscall slot: to use the new interface, application 
will use the new MAP_NOINHERIT flag I've added. I've still the patches to use 
the old -mm ABI, if there's any reason they're needed.

*) Fixed a regression wrt using mprotect() against remapped area (see patch 
15)

======
Changes from my last patch-bomb of the patches:
======
*) fixed mprotect VS remap_file_pages(MAP_NOINHERIT) interaction

*) fixed truncation (with madvise_dontneed or truncate()) of nonuniform but 
linear vmas. Either with patch 11, by removing "nonuniform but linear VMAs", 
or with patch 18.

======
Still todo
======
*) ->populate flushes each TLB individually, instead of using mmu_gathers as 
it should; this was suggested even by Ingo when sending the patch, but it 
seems he didn't get the time to finish this. And I'm now wondering how would 
that relate with I/O... at each I/O point we should finish and regather the 
mmu_gather, as in zap_page_range. But here we are reading pages, not the 
reverse!

Seems rewriting the kernel locking is a quite time-consuming task!
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

[-- Attachment #2: fremap-test-complete.c.bz2 --]
[-- Type: application/x-bzip2, Size: 5377 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread