linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Helsley <matthltc@us.ibm.com>
To: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Matt Helsley <matthltc@us.ibm.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Oleg Nesterov <oleg@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Eric Paris <eparis@redhat.com>,
	"linux-security-module@vger.kernel.org" 
	<linux-security-module@vger.kernel.org>,
	"oprofile-list@lists.sf.net" <oprofile-list@lists.sf.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 6/7] mm: kill vma flag VM_EXECUTABLE
Date: Thu, 5 Apr 2012 14:44:47 -0700	[thread overview]
Message-ID: <20120405214447.GC7761@count0.beaverton.ibm.com> (raw)
In-Reply-To: <4F7E08EB.5070600@openvz.org>

On Fri, Apr 06, 2012 at 01:04:43AM +0400, Konstantin Khlebnikov wrote:
> Matt Helsley wrote:
> >On Tue, Apr 03, 2012 at 11:32:04PM +0400, Cyrill Gorcunov wrote:
> >>On Tue, Apr 03, 2012 at 11:16:31AM -0700, Matt Helsley wrote:
> >>>On Tue, Apr 03, 2012 at 09:10:20AM +0400, Konstantin Khlebnikov wrote:
> >>>>Matt Helsley wrote:
> >>>>>On Sat, Mar 31, 2012 at 10:13:24PM +0200, Oleg Nesterov wrote:
> >>>>>>On 03/31, Konstantin Khlebnikov wrote:
> >>>>>>>
> >>>>>>>comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"),
> >>>>>>>where all this stuff was introduced:
> >>>>>>>
> >>>>>>>>...
> >>>>>>>>This avoids pinning the mounted filesystem.
> >>>>>>>
> >>>>>>>So, this logic is hooked into every file mmap/unmmap and vma split/merge just to
> >>>>>>>fix some hypothetical pinning fs from umounting by mm which already unmapped all
> >>>>>>>its executable files, but still alive. Does anyone know any real world example?
> >>>>>>
> >>>>>>This is the question to Matt.
> >>>>>
> >>>>>This is where I got the scenario:
> >>>>>
> >>>>>https://lkml.org/lkml/2007/7/12/398
> >>>>
> >>>>Cyrill Gogcunov's patch "c/r: prctl: add ability to set new mm_struct::exe_file"
> >>>>gives userspace ability to unpin vfsmount explicitly.
> >>>
> >>>Doesn't that break the semantics of the kernel ABI?
> >>
> >>Which one? exe_file can be changed iif there is no MAP_EXECUTABLE left.
> >>Still, once assigned (via this prctl) the mm_struct::exe_file can't be changed
> >>again, until program exit.
> >
> >The prctl() interface itself is fine as it stands now.
> >
> >As far as I can tell Konstantin is proposing that we remove the unusual
> >counter that tracks the number of mappings of the exe_file and require
> >userspace use the prctl() to drop the last reference. That's what I think
> >will break the ABI because after that change you *must* change userspace
> >code to use the prctl(). It's an ABI change because the same sequence of
> >system calls with the same input bits produces different behavior.
> 
> But common software does not require this at all. I did not found real examples,
> only hypothesis by Al Viro: https://lkml.org/lkml/2007/7/12/398
> libhugetlbfs isn't good example too, the man proc says: /proc/[pid]/exe is alive until
> main thread is alive, but in case libhugetlbfs /proc/[pid]/exe disappears too early.

*shrug*

Where did you look for real examples? chroot? pivot_root? various initrd
systems? Which versions?

This sort of argument brings up classic questions. How do we know when
to stop looking given the incredible amount of obscure code that's out
there -- most of which we're unlikely to even be aware of? Even if we
only look at "popular" distros how far back do we go? etc.

Perhaps before going through all that effort it would be better to
verify that removing that code impacts performance enough to care. Do
you have numbers? If the numbers aren't there then why bother with
exhaustive and exhausting code searches?

>
> Also I would not call it ABI, this corner-case isn't documented, I'm afraid only few
> people in the world knows about it =)

I don't think the definition of an ABI is whether there's documentation
for it. It's whether the interface is used or not. At least that's the
impression I've gotten from reading Linus' rants over the years.

I think of the ABI as bits input versus behavior (including bits) out. If
the input bits remain the same the qualitative behavior should remain the
same unless there is a bug. Here, roughly speaking, the input bits are the
arguments passed to a sequence of one or more munmap() calls followed by a
umount(). The output is a 0 return value from the umount. Your proposal
would change that output value to -1 -- different bits and different
behavior.

Cheers,
	-Matt Helsley


  reply	other threads:[~2012-04-05 21:44 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-31  9:25 [PATCH 0/7] mm: vma->vm_flags diet Konstantin Khlebnikov
2012-03-31  9:29 ` [PATCH 1/7] mm, x86, PAT: rework linear pfn-mmap tracking Konstantin Khlebnikov
2012-03-31 17:09   ` [PATCH 1/7 v2] " Konstantin Khlebnikov
2012-04-03  0:46     ` [x86 PAT PATCH 0/2] x86 PAT vm_flag code refactoring Suresh Siddha
2012-04-03  0:46       ` [x86 PAT PATCH 1/2] x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines Suresh Siddha
2012-04-03  5:37         ` Konstantin Khlebnikov
2012-04-03 23:31           ` Suresh Siddha
2012-04-04  4:43             ` Konstantin Khlebnikov
2012-04-05 11:56             ` Konstantin Khlebnikov
2012-04-06  0:01               ` [v3 VM_PAT PATCH 0/3] x86 VM_PAT series Suresh Siddha
2012-04-06  0:01                 ` [v3 VM_PAT PATCH 1/3] x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines Suresh Siddha
2012-04-06  0:01                 ` [v3 VM_PAT PATCH 2/3] x86, pat: separate the pfn attribute tracking for remap_pfn_range and vm_insert_pfn Suresh Siddha
2012-04-06  0:01                 ` [v3 VM_PAT PATCH 3/3] mm, x86, PAT: rework linear pfn-mmap tracking Suresh Siddha
2012-04-03  0:46       ` [x86 PAT PATCH 2/2] " Suresh Siddha
2012-04-03  5:48         ` Konstantin Khlebnikov
2012-04-03  5:55           ` Konstantin Khlebnikov
2012-04-03  6:03       ` [x86 PAT PATCH 0/2] x86 PAT vm_flag code refactoring Konstantin Khlebnikov
2012-04-03 23:14         ` Suresh Siddha
2012-04-04  4:40           ` Konstantin Khlebnikov
2012-03-31  9:29 ` [PATCH 2/7] mm: introduce vma flag VM_ARCH_1 Konstantin Khlebnikov
2012-03-31 22:25   ` Benjamin Herrenschmidt
2012-03-31  9:29 ` [PATCH 3/7] mm: kill vma flag VM_CAN_NONLINEAR Konstantin Khlebnikov
2012-03-31 17:01   ` Linus Torvalds
2012-03-31  9:29 ` [PATCH 4/7] mm: kill vma flag VM_INSERTPAGE Konstantin Khlebnikov
2012-03-31  9:29 ` [PATCH 5/7] mm, drm/udl: fixup vma flags on mmap Konstantin Khlebnikov
2012-03-31  9:29 ` [PATCH 6/7] mm: kill vma flag VM_EXECUTABLE Konstantin Khlebnikov
2012-03-31 20:13   ` Oleg Nesterov
2012-03-31 20:39     ` Cyrill Gorcunov
2012-04-02  9:46       ` Konstantin Khlebnikov
2012-04-02  9:54         ` Cyrill Gorcunov
2012-04-02 10:13           ` Konstantin Khlebnikov
2012-04-02 14:48         ` Oleg Nesterov
2012-04-02 16:02           ` Cyrill Gorcunov
2012-04-02 16:19           ` Konstantin Khlebnikov
2012-04-02 16:27             ` Cyrill Gorcunov
2012-04-02 17:14               ` Konstantin Khlebnikov
2012-04-02 18:05                 ` Cyrill Gorcunov
2012-04-02 23:04     ` Matt Helsley
2012-04-03  5:10       ` Konstantin Khlebnikov
2012-04-03 18:16         ` Matt Helsley
2012-04-03 19:32           ` Cyrill Gorcunov
2012-04-05 20:29             ` Matt Helsley
2012-04-05 20:53               ` Cyrill Gorcunov
2012-04-05 21:04               ` Konstantin Khlebnikov
2012-04-05 21:44                 ` Matt Helsley [this message]
2012-04-05 21:55                   ` Linus Torvalds
2012-04-06  4:36                     ` Konstantin Khlebnikov
2012-04-02 23:18   ` Matt Helsley
2012-04-03  5:06     ` Konstantin Khlebnikov
2012-04-06 22:48       ` Andrew Morton
2012-03-31  9:29 ` [PATCH 7/7] mm: move madvise vma flags to the end Konstantin Khlebnikov
2012-03-31 14:06 ` [PATCH 0/7] mm: vma->vm_flags diet Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120405214447.GC7761@count0.beaverton.ibm.com \
    --to=matthltc@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=eparis@redhat.com \
    --cc=gorcunov@openvz.org \
    --cc=khlebnikov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=oprofile-list@lists.sf.net \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).