Hi, Andi Kleen wrote: > castet.matthieu@free.fr writes: > >> Hi, >> >> I am wondering why we can't set the KERNEL_DS data segment to not contain the >> first page, ie changing it from R/W flat model to R/W expand down from >> 0xffffffff to 4096. > > As Alan pointed out setting segment limits/bases has large penalties. > > This has been already addressed by the mmap limit defaults > on the VM level by disallowing to place something on the zero page. > > In fact a lot of systems should already run with that default. Yes, but lot's of system run with access to zero page enabled. Mmap limit was added near 2 years ago. But this summer lot's of machines were still vulnerable to 'NULL deference exploits'. Why ? May be because the kernel still allow it (mmap_min_addr is 0 by default). Openbsd enforce it. There are lots of way to bypass it (root, RAW_IO cap, personality, ...). Also some distro doesn't enable it because it break some applications. For example vm86 can't be used by dosemu, wine. I attach a basic (, slow and probably buggy) protection with segments. It works for wine and dosemu, and catch kernel access to page0. I believe a better solution should be to implement a new vm86 syscall. This syscall will allow to run code in virtual 8086 mode that doesn't need to be in low pages. For that an extra argument pointing to the code region could be added. The kernel in the syscall entry will : - duplicate the memory mapping of the calling thread. - map at low pages (zero page and more) the code to run - switch to this mapping - enter in vm86 mode ... - exit vm86 mode - switch back to original mapping (without page0). - return to user With that new syscall, there should less programs that need page0 mapping. Matthieu > >> PS : why x86_64 segment got access bit set and x86_32 doesn't ? > > It's a extremly minor optimization, but the CPU sets it on the first > access anyways. Setting it for x86_32 will allow to merge them out the ifdef. Not that it is important...