linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robin Getz <rgetz@blackfin.uclinux.org>
To: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>,
	bryan.wu@analog.com, Robin Holt <holt@sgi.com>,
	"Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com>,
	Andrew Morton <akpm@osdl.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Pavel Machek <pavel@ucw.cz>, Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	sugita <yumiko.sugita.yf@hitachi.com>,
	Satoshi OSHIMA <soshima@redhat.com>,
	haoki@redhat.com, Bernd Schmidt <bernds_cb1@t-online.de>
Subject: Re: Move to unshared VMAs in NOMMU mode?
Date: Mon, 12 Mar 2007 16:50:34 -0400	[thread overview]
Message-ID: <200703121650.35699.rgetz@blackfin.uclinux.org> (raw)
In-Reply-To: <12852.1173449522@redhat.com>

On Fri 9 Mar 2007 09:12, David Howells pondered:
> I've been considering how to deal with the SYSV SHM problem, and I think we
> may have to move to unshared VMAs in NOMMU mode to deal with this. 

Thanks for putting some good thoughts down.

> Currently, what we have is each mm_struct has in its arch-specific context
> argument a list of VMLs.  Take the FRV context for example:
>
> 	[include/asm-frv/mmu.h]
> 	typedef struct {
> 	#ifdef CONFIG_MMU
> 	...
> 		struct vm_list_struct	*vmlist;
> 		unsigned long		end_brk;
>
> 	#endif
> 	...
> 	} mm_context_t;
>
> Each VML struct containes a pointer to a systemwide VMA and the next VML in
> the list:
>
> 	struct vm_list_struct {
> 		struct vm_list_struct	*next;
> 		struct vm_area_struct	*vma;
> 	};
>
> The VMAs themselves are kept in an rb-tree in mm/nommu.c:
>
> 	/* list of shareable VMAs */
> 	struct rb_root nommu_vma_tree = RB_ROOT;
>
> which can then be displayed through /proc/maps.
>
> There are some restrictions of this system, mainly due to the NOMMU
> constraints:
>
>  (*) mmap() may not be used to overlay one mapping upon another
>
>  (*) mmap() may not be used with MAP_FIXED.
>
>  (*) mmap()'s of the same part of the same file will result in multiple
>      mappings returning the same base address, assuming the maps are
> shareable. If they aren't shareable, they'll be at different base
> addresses.
>
>  (*) for normal shareable file mappings, two mappings will only be shared
> if they precisely match offset, size and protection, otherwise a new
> mapping will be created (this is because VMAs will be shared).  Splitting
> VMAs would reduce the this restriction, though subsequent mappings would
> have to be bounded by the first mapping, but wouldn't have to be the same
> size.
>
>  (*) munmap() may only unmap a precise match amongst the mappings made; it
> may not be used to cut down or punch a hole in an existing mapping.
>
> The VMAs for private file mappings, private blockdev mappings and anonymous
> mappings, be they shared[*] or unshared, hold a pointer to the kmalloc()'d
> region of memory in which the mapping contents reside.  This region is
> discarded when the VMA is deleted.  When a region can be shared the VMA is
> also shared, and so no reference counting need take place on the mapping
> contents as that is implied by the VMA.
>
> [*] MAP_PRIVATE+!PROT_WRITE+!PT_PTRACED regions may be shared
>
> Note that for mappable chardevs with special BDI capability flags, extra
> VMAs may be allocated because (a) they may need to overlap non-exactly, and
> (b) the chardev itself pins the backing storage, if the backing storage is
> potentially transient.
>
>
> If VMAs are not shared for shared memory regions then some other means of
> retaining the actual allocated memory region must be found.  The obvious
> way to do this is to have the VMA point to a shared, refcounted record that
> keeps track of the region:
>
> 	struct vm_region {
> 		/* the first parameters define the region as for the VMA */
> 		pgprot_t	vm_page_prot;
> 		unsigned long	vm_start;
> 		unsigned long	vm_end
> 		unsigned long	vm_pgoff;
> 		struct file	*vm_file;
>
> 		atomic_t	vm_usage;	/* region usage count */
> 		struct rb_node	vm_rb;		/* region tree */
> 	};
>
> The VMA itself would then have to be modified to include a pointer to this,
> but wouldn't then need its own refcount.  VMAs would belong, once again, to
> the mm_struct, the VML struct would vanish, and the VML list rooted in
> mm_context_t would vanish.
>
> For R/O shareable file mappings, it might be possible to actually use the
> target file's pagecache for the mapping.  I do something of that sort for
> shared-writable mappings on ramfs files (to support POSIX SHM and SYSV
> SHM).
>
> The downside of allocating all these extra VMAs is that, of course, it
> takes up more memory, though that may not be too bad, especially if it's at
> the gain of additional consistency with the MM code.

I guess I don't look at it as consistency with the MM code as being the 
primary request, but consistency in operation with the MM code from a user 
space perspective - hopefully the two goals are not divergent.

> However, consistency isn't for the most part a real issue.  As I see it,
> drivers and filesystems should not concern themselves with anything other
> than the VMA they're given, and so it doesn't matter if these are shared or
> not.
>
> That brings us on to the problem with SYSV SHM which keeps an attachment
> count that the VMA mmap(), open() and release() ops manipulate.  This means
> that the nattch count comes out wrong on NOMMU systems.  Note that on MMU
> systems, doing a munmap() in the middle of an attached region will *also*
> break the nattch count, though this is self-correcting.
>
> Another way of dealing with the nattch count on NOMMU systems is to do it
> through the VML list, but that then needs more special casing in the SHM
> driver and perhaps others.

We (noMMU) folks need to have special code anyway - so why not put it there, 
and try not to increase memory footprint?

-Robin

  reply	other threads:[~2007-03-12 21:50 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro
2007-02-16 13:39 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro
2007-02-16 13:40 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro
2007-02-16 13:41 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro
2007-02-16 13:42 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro
2007-02-16 15:05 ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory David Howells
2007-02-16 16:50   ` Robin Holt
2007-02-16 20:09   ` David Howells
2007-03-02 16:55     ` Hugh Dickins
2007-03-03 14:10     ` David Howells
2007-03-05 19:04       ` Hugh Dickins
2007-03-06 18:13       ` David Howells
2007-03-09 14:12       ` Move to unshared VMAs in NOMMU mode? David Howells
2007-03-12 20:50         ` Robin Getz [this message]
2007-03-13 10:14         ` David Howells
2007-03-15 21:20         ` Hugh Dickins
2007-03-15 22:47         ` David Howells
2007-03-19 19:23           ` Eric W. Biederman
2007-03-20 11:06           ` David Howells
2007-03-20 16:48             ` Eric W. Biederman
2007-03-20 19:12             ` David Howells
2007-03-20 19:51             ` David Howells
2007-03-21 16:11             ` David Howells
2007-03-03 14:25     ` [PATCH] NOMMU: Hide vm_mm in NOMMU mode David Howells
2007-02-20  9:45   ` [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory Kawai, Hidehiro
2007-02-20 10:58   ` David Howells
2007-02-20 12:56     ` Robin Holt
2007-02-21 10:00     ` Kawai, Hidehiro
2007-02-21 11:33     ` David Howells
2007-02-21 11:54       ` Robin Holt
2007-02-22  5:33         ` Kawai, Hidehiro
2007-02-22 11:47         ` David Howells
2007-02-16 15:08 ` [PATCH 0/4] coredump: core dump masking support v3 David Howells
2007-02-20  9:48   ` Kawai, Hidehiro
2007-02-24  3:32 ` Markus Gutschke
2007-02-24 11:39   ` Pavel Machek
2007-03-01 12:35   ` Kawai, Hidehiro
2007-03-01 18:16     ` Markus Gutschke
2007-02-24 10:02 ` David Howells
2007-02-24 20:01   ` Markus Gutschke
2007-02-26 11:49   ` David Howells
2007-02-26 12:01     ` Pavel Machek
2007-02-26 12:42     ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200703121650.35699.rgetz@blackfin.uclinux.org \
    --to=rgetz@blackfin.uclinux.org \
    --cc=akpm@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=bernds_cb1@t-online.de \
    --cc=bryan.wu@analog.com \
    --cc=dhowells@redhat.com \
    --cc=haoki@redhat.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=holt@sgi.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=pavel@ucw.cz \
    --cc=soshima@redhat.com \
    --cc=yumiko.sugita.yf@hitachi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).