From: Robin Getz <rgetz@blackfin.uclinux.org>
To: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>,
bryan.wu@analog.com, Robin Holt <holt@sgi.com>,
"Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com>,
Andrew Morton <akpm@osdl.org>,
kernel list <linux-kernel@vger.kernel.org>,
Pavel Machek <pavel@ucw.cz>, Alan Cox <alan@lxorguk.ukuu.org.uk>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
sugita <yumiko.sugita.yf@hitachi.com>,
Satoshi OSHIMA <soshima@redhat.com>,
haoki@redhat.com, Bernd Schmidt <bernds_cb1@t-online.de>
Subject: Re: Move to unshared VMAs in NOMMU mode?
Date: Mon, 12 Mar 2007 16:50:34 -0400 [thread overview]
Message-ID: <200703121650.35699.rgetz@blackfin.uclinux.org> (raw)
In-Reply-To: <12852.1173449522@redhat.com>
On Fri 9 Mar 2007 09:12, David Howells pondered:
> I've been considering how to deal with the SYSV SHM problem, and I think we
> may have to move to unshared VMAs in NOMMU mode to deal with this.
Thanks for putting some good thoughts down.
> Currently, what we have is each mm_struct has in its arch-specific context
> argument a list of VMLs. Take the FRV context for example:
>
> [include/asm-frv/mmu.h]
> typedef struct {
> #ifdef CONFIG_MMU
> ...
> struct vm_list_struct *vmlist;
> unsigned long end_brk;
>
> #endif
> ...
> } mm_context_t;
>
> Each VML struct containes a pointer to a systemwide VMA and the next VML in
> the list:
>
> struct vm_list_struct {
> struct vm_list_struct *next;
> struct vm_area_struct *vma;
> };
>
> The VMAs themselves are kept in an rb-tree in mm/nommu.c:
>
> /* list of shareable VMAs */
> struct rb_root nommu_vma_tree = RB_ROOT;
>
> which can then be displayed through /proc/maps.
>
> There are some restrictions of this system, mainly due to the NOMMU
> constraints:
>
> (*) mmap() may not be used to overlay one mapping upon another
>
> (*) mmap() may not be used with MAP_FIXED.
>
> (*) mmap()'s of the same part of the same file will result in multiple
> mappings returning the same base address, assuming the maps are
> shareable. If they aren't shareable, they'll be at different base
> addresses.
>
> (*) for normal shareable file mappings, two mappings will only be shared
> if they precisely match offset, size and protection, otherwise a new
> mapping will be created (this is because VMAs will be shared). Splitting
> VMAs would reduce the this restriction, though subsequent mappings would
> have to be bounded by the first mapping, but wouldn't have to be the same
> size.
>
> (*) munmap() may only unmap a precise match amongst the mappings made; it
> may not be used to cut down or punch a hole in an existing mapping.
>
> The VMAs for private file mappings, private blockdev mappings and anonymous
> mappings, be they shared[*] or unshared, hold a pointer to the kmalloc()'d
> region of memory in which the mapping contents reside. This region is
> discarded when the VMA is deleted. When a region can be shared the VMA is
> also shared, and so no reference counting need take place on the mapping
> contents as that is implied by the VMA.
>
> [*] MAP_PRIVATE+!PROT_WRITE+!PT_PTRACED regions may be shared
>
> Note that for mappable chardevs with special BDI capability flags, extra
> VMAs may be allocated because (a) they may need to overlap non-exactly, and
> (b) the chardev itself pins the backing storage, if the backing storage is
> potentially transient.
>
>
> If VMAs are not shared for shared memory regions then some other means of
> retaining the actual allocated memory region must be found. The obvious
> way to do this is to have the VMA point to a shared, refcounted record that
> keeps track of the region:
>
> struct vm_region {
> /* the first parameters define the region as for the VMA */
> pgprot_t vm_page_prot;
> unsigned long vm_start;
> unsigned long vm_end
> unsigned long vm_pgoff;
> struct file *vm_file;
>
> atomic_t vm_usage; /* region usage count */
> struct rb_node vm_rb; /* region tree */
> };
>
> The VMA itself would then have to be modified to include a pointer to this,
> but wouldn't then need its own refcount. VMAs would belong, once again, to
> the mm_struct, the VML struct would vanish, and the VML list rooted in
> mm_context_t would vanish.
>
> For R/O shareable file mappings, it might be possible to actually use the
> target file's pagecache for the mapping. I do something of that sort for
> shared-writable mappings on ramfs files (to support POSIX SHM and SYSV
> SHM).
>
> The downside of allocating all these extra VMAs is that, of course, it
> takes up more memory, though that may not be too bad, especially if it's at
> the gain of additional consistency with the MM code.
I guess I don't look at it as consistency with the MM code as being the
primary request, but consistency in operation with the MM code from a user
space perspective - hopefully the two goals are not divergent.
> However, consistency isn't for the most part a real issue. As I see it,
> drivers and filesystems should not concern themselves with anything other
> than the VMA they're given, and so it doesn't matter if these are shared or
> not.
>
> That brings us on to the problem with SYSV SHM which keeps an attachment
> count that the VMA mmap(), open() and release() ops manipulate. This means
> that the nattch count comes out wrong on NOMMU systems. Note that on MMU
> systems, doing a munmap() in the middle of an attached region will *also*
> break the nattch count, though this is self-correcting.
>
> Another way of dealing with the nattch count on NOMMU systems is to do it
> through the VML list, but that then needs more special casing in the SHM
> driver and perhaps others.
We (noMMU) folks need to have special code anyway - so why not put it there,
and try not to increase memory footprint?
-Robin
next prev parent reply other threads:[~2007-03-12 21:50 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro
2007-02-16 13:39 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro
2007-02-16 13:40 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro
2007-02-16 13:41 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro
2007-02-16 13:42 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro
2007-02-16 15:05 ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory David Howells
2007-02-16 16:50 ` Robin Holt
2007-02-16 20:09 ` David Howells
2007-03-02 16:55 ` Hugh Dickins
2007-03-03 14:10 ` David Howells
2007-03-05 19:04 ` Hugh Dickins
2007-03-06 18:13 ` David Howells
2007-03-09 14:12 ` Move to unshared VMAs in NOMMU mode? David Howells
2007-03-12 20:50 ` Robin Getz [this message]
2007-03-13 10:14 ` David Howells
2007-03-15 21:20 ` Hugh Dickins
2007-03-15 22:47 ` David Howells
2007-03-19 19:23 ` Eric W. Biederman
2007-03-20 11:06 ` David Howells
2007-03-20 16:48 ` Eric W. Biederman
2007-03-20 19:12 ` David Howells
2007-03-20 19:51 ` David Howells
2007-03-21 16:11 ` David Howells
2007-03-03 14:25 ` [PATCH] NOMMU: Hide vm_mm in NOMMU mode David Howells
2007-02-20 9:45 ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory Kawai, Hidehiro
2007-02-20 10:58 ` David Howells
2007-02-20 12:56 ` Robin Holt
2007-02-21 10:00 ` Kawai, Hidehiro
2007-02-21 11:33 ` David Howells
2007-02-21 11:54 ` Robin Holt
2007-02-22 5:33 ` Kawai, Hidehiro
2007-02-22 11:47 ` David Howells
2007-02-16 15:08 ` [PATCH 0/4] coredump: core dump masking support v3 David Howells
2007-02-20 9:48 ` Kawai, Hidehiro
2007-02-24 3:32 ` Markus Gutschke
2007-02-24 11:39 ` Pavel Machek
2007-03-01 12:35 ` Kawai, Hidehiro
2007-03-01 18:16 ` Markus Gutschke
2007-02-24 10:02 ` David Howells
2007-02-24 20:01 ` Markus Gutschke
2007-02-26 11:49 ` David Howells
2007-02-26 12:01 ` Pavel Machek
2007-02-26 12:42 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200703121650.35699.rgetz@blackfin.uclinux.org \
--to=rgetz@blackfin.uclinux.org \
--cc=akpm@osdl.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=bernds_cb1@t-online.de \
--cc=bryan.wu@analog.com \
--cc=dhowells@redhat.com \
--cc=haoki@redhat.com \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=holt@sgi.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=pavel@ucw.cz \
--cc=soshima@redhat.com \
--cc=yumiko.sugita.yf@hitachi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).