From: Yury Norov <yury.norov@gmail.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael Aquini <aquini@redhat.com>,
Joel Savitz <jsavitz@redhat.com>,
linux-kernel@vger.kernel.org,
Alexey Dobriyan <adobriyan@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
Ram Pai <linuxram@us.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Huang Ying <ying.huang@intel.com>,
Sandeep Patil <sspatil@android.com>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v3] fs/proc: add VmTaskSize field to /proc/$$/status
Date: Sun, 19 May 2019 14:31:29 -0700 [thread overview]
Message-ID: <20190519213129.GA32053@yury-thinkpad> (raw)
In-Reply-To: <87k1ettv91.fsf@concordia.ellerman.id.au>
On Tue, May 14, 2019 at 04:17:46PM +1000, Michael Ellerman wrote:
> Yury Norov <yury.norov@gmail.com> writes:
> > On Fri, May 10, 2019 at 01:32:22PM +1000, Michael Ellerman wrote:
> >> Yury Norov <yury.norov@gmail.com> writes:
> >> > On Tue, May 07, 2019 at 08:54:31AM -0400, Rafael Aquini wrote:
> >> >> On Mon, May 06, 2019 at 11:53:43AM -0400, Joel Savitz wrote:
> >> >> > There is currently no easy and architecture-independent way to find the
> >> >> > lowest unusable virtual address available to a process without
> >> >> > brute-force calculation. This patch allows a user to easily retrieve
> >> >> > this value via /proc/<pid>/status.
> >> >> >
> >> >> > Using this patch, any program that previously needed to waste cpu cycles
> >> >> > recalculating a non-sensitive process-dependent value already known to
> >> >> > the kernel can now be optimized to use this mechanism.
> >> >> >
> >> >> > Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> >> >> > ---
> >> >> > Documentation/filesystems/proc.txt | 2 ++
> >> >> > fs/proc/task_mmu.c | 2 ++
> >> >> > 2 files changed, 4 insertions(+)
> >> >> >
> >> >> > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> >> >> > index 66cad5c86171..1c6a912e3975 100644
> >> >> > --- a/Documentation/filesystems/proc.txt
> >> >> > +++ b/Documentation/filesystems/proc.txt
> >> >> > @@ -187,6 +187,7 @@ read the file /proc/PID/status:
> >> >> > VmLib: 1412 kB
> >> >> > VmPTE: 20 kb
> >> >> > VmSwap: 0 kB
> >> >> > + VmTaskSize: 137438953468 kB
> >> >> > HugetlbPages: 0 kB
> >> >> > CoreDumping: 0
> >> >> > THP_enabled: 1
> >> >> > @@ -263,6 +264,7 @@ Table 1-2: Contents of the status files (as of 4.19)
> >> >> > VmPTE size of page table entries
> >> >> > VmSwap amount of swap used by anonymous private data
> >> >> > (shmem swap usage is not included)
> >> >> > + VmTaskSize lowest unusable address in process virtual memory
> >> >>
> >> >> Can we change this help text to "size of process' virtual address space memory" ?
> >> >
> >> > Agree. Or go in other direction and make it VmEnd
> >>
> >> Yeah I think VmEnd would be clearer to folks who aren't familiar with
> >> the kernel's usage of the TASK_SIZE terminology.
> >>
> >> >> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> >> >> > index 95ca1fe7283c..0af7081f7b19 100644
> >> >> > --- a/fs/proc/task_mmu.c
> >> >> > +++ b/fs/proc/task_mmu.c
> >> >> > @@ -74,6 +74,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
> >> >> > seq_put_decimal_ull_width(m,
> >> >> > " kB\nVmPTE:\t", mm_pgtables_bytes(mm) >> 10, 8);
> >> >> > SEQ_PUT_DEC(" kB\nVmSwap:\t", swap);
> >> >> > + seq_put_decimal_ull_width(m,
> >> >> > + " kB\nVmTaskSize:\t", mm->task_size >> 10, 8);
> >> >> > seq_puts(m, " kB\n");
> >> >> > hugetlb_report_usage(m, mm);
> >> >> > }
> >> >
> >> > I'm OK with technical part, but I still have questions not answered
> >> > (or wrongly answered) in v1 and v2. Below is the very detailed
> >> > description of the concerns I have.
> >> >
> >> > 1. What is the exact reason for it? Original version tells about some
> >> > test that takes so much time that you were able to drink a cup of
> >> > coffee before it was done. The test as you said implements linear
> >> > search to find the last page and so is of O(n). If it's only for some
> >> > random test, I think the kernel can survive without it. Do you have a
> >> > real example of useful programs that suffer without this information?
> >> >
> >> >
> >> > 2. I have nothing against taking breaks and see nothing weird if
> >> > ineffective algorithms take time. On my system (x86, Ubuntu) the last
> >> > mapped region according to /proc/<pid>/maps is:
> >> > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
> >> > So to find the required address, we have to inspect 2559 pages. With a
> >> > binary search it would take 12 iterations at max. If my calculation is
> >> > wrong or your environment is completely different - please elaborate.
> >>
> >> I agree it should not be hard to calculate, but at the same time it's
> >> trivial for the kernel to export the information so I don't see why the
> >> kernel shouldn't.
> >
> > Kernel shouldn't do it unless there will be real users of the feature.
> > Otherwise it's pure bloating.
>
> A single line or two of code to print a value that's useful information
> for userspace is hardly "bloat".
>
> I agree it's good to have users for things, but this seems like it's so
> trivial that we should just add it and someone will find a use for it.
Little bloat is still bloat. Trivial useless code is still useless.
If someone finds a use of VmEnd, it should be thoroughly reviewed for
better alternatives.
> > One possible user of it that I can imagine is mmap(MAP_FIXED). The
> > documentation is very clear about it:
> >
> > Furthermore, this option is extremely hazardous (when used on its own),
> > because it forcibly removes preexisting mappings, making it easy for a
> > multithreaded process to corrupt its own address space.
> >
> > VmEnd provided by kernel may encourage people to solve their problems
> > by using MAP_FIXED which is potentially dangerous.
>
> There's MAP_FIXED_NOREPLACE now which is not dangerous.
MAP_FIXED_NOREPLACE is still not supported by glibc and not
documented. (Glibc doesn't use mman-common.h that comes from kernel,
and defines all mmap-related stuff in its own bits/mman.h). Therefore
from the point of view of 99% users MAP_FIXED_NOREPLACE doesn't exist.
Bionic defines MAP_FIXED_NOREPLACE but does not document it and doesn't
use.
> Using MAX_FIXED_NOREPLACE and VmEnd would make it relatively easy to do
> a userspace ASLR implementation, so that actually is an argument in
> favour IMHO.
Kernel-supported ASLR works well since 2.6.12. Do you see any
downside of using it?
MAP_RANDOM would be even more handy for userspace ASLR.
VmEnd in current form would break certain userspace programs that
has DEFAULT_MAP_WINDOW != TASK_SIZE. This is the case for 48-bit VA
programs running on 52-bits VA ARM kernel. See 363524d2b1227
(arm64: mm: Introduce DEFAULT_MAP_WINDOW).
> > Another scenario of VmEnd is to understand how many top bits of address will
> > be always zero to allocate them for user's purpose, like smart pointers. It
> > worth to discuss this usecase with compiler people. If they have interest,
> > I think it's more straightforward to give them something like:
> > int preserve_top_bits(int nbits);
>
> You mean a syscall?
>
> With things like hardware pointer tagging / colouring coming along I
> think you're right that using VmEnd and assuming the top bits are never
> used is a bad idea, an explicit interface would be better.
>
> cheers
prev parent reply other threads:[~2019-05-19 21:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-06 15:53 [PATCH v3] fs/proc: add VmTaskSize field to /proc/$$/status Joel Savitz
2019-05-07 12:54 ` Rafael Aquini
2019-05-08 6:37 ` Yury Norov
2019-05-08 13:47 ` Rafael Aquini
2019-05-10 3:32 ` Michael Ellerman
2019-05-10 7:25 ` Yury Norov
2019-05-14 6:17 ` Michael Ellerman
2019-05-19 21:31 ` Yury Norov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190519213129.GA32053@yury-thinkpad \
--to=yury.norov@gmail.com \
--cc=aarcange@redhat.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=aquini@redhat.com \
--cc=jsavitz@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxram@us.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=sspatil@android.com \
--cc=vbabka@suse.cz \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).