All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	David Miller <davem@davemloft.net>,
	Andres Freund <andres@2ndquadrant.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Linux API <linux-api@vger.kernel.org>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Kees Cook <kees@outflux.net>
Subject: Re: [PATCH v3 1/3] mm: introduce fincore()
Date: Mon, 7 Jul 2014 17:48:20 -0400	[thread overview]
Message-ID: <20140707214820.GA13596@nhori.bos.redhat.com> (raw)
In-Reply-To: <53BB0673.8020604@intel.com>

On Mon, Jul 07, 2014 at 01:43:31PM -0700, Dave Hansen wrote:
> On 07/07/2014 01:21 PM, Naoya Horiguchi wrote:
> > On Mon, Jul 07, 2014 at 12:01:41PM -0700, Dave Hansen wrote:
> >> But, is this trying to do too many things at once?  Do we have solid use
> >> cases spelled out for each of these modes?  Have we thought out how they
> >> will be used in practice?
> > 
> > tools/vm/page-types.c will be an in-kernel user after this base code is
> > accepted. The idea of doing fincore() thing comes up during the discussion
> > with Konstantin over file cache mode of this tool.
> > pfn and page flag are needed there, so I think it's one clear usecase.
> 
> I'm going to take that as a no. :)

As for other usecases, database developers should have some demand for
physical addresses (especially numa node?) or page flags (especially
page reclaim or writeback related ones).
But I'm not a database expert so can't say how, sorry.

> The whole FINCORE_PGOFF vs. FINCORE_BMAP issue is something that will
> come up in practice.  We just don't have the interfaces for an end user
> to pick which one they want to use.
> 
> >> Is it really right to say this is going to be 8 bytes?  Would we want it
> >> to share types with something else, like be an loff_t?
> > 
> > Could you elaborate it more?
> 
> We specify file offsets in other system calls, like the lseek family.  I
> was just thinking that this type should match up with those calls since
> they are expressing the same data type with the same ranges and limitations.

The 2nd parameter is loff_t, do we already do this?

> >>> + * - FINCORE_PFN:
> >>> + *     stores pfn, using 8 bytes.
> >>
> >> These are all an unprivileged operations from what I can tell.  I know
> >> we're going to a lot of trouble to hide kernel addresses from being seen
> >> in userspace.  This seems like it would be undesirable for the folks
> >> that care about not leaking kernel addresses, especially for
> >> unprivileged users.
> >>
> >> This would essentially tell userspace where in the kernel's address
> >> space some user-controlled data will be.
> > 
> > OK, so this and FINCORE_PAGEFLAGS will be limited for privileged users.

Sorry, this statement of mine might a bit short-sighted, and I'd like
to revoke it.
I think that some page flags and/or numa info should be useful outside
the debugging environment, and safe to expose to userspace. So limiting
to bitmap-one for unprivileged users is too strict.

> Then I'd just question their usefulness outside of a debugging
> environment, especially when you can get at them in other (more
> roundabout) ways in a debugging environment.
> 
> This is really looking to me like two system calls.  The bitmap-based
> one, and another more extensible one.  I don't think there's any harm in
> having two system calls, especially when they're trying to glue together
> two disparate interfaces.

I think that if separating syscall into two, one for privileged users
and one for unprivileged users migth be fine (rather than bitmap-based
one and extensible one.)

Thanks,
Naoya Horiguchi

WARNING: multiple messages have this Message-ID (diff)
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	David Miller <davem@davemloft.net>,
	Andres Freund <andres@2ndquadrant.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Linux API <linux-api@vger.kernel.org>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Kees Cook <kees@outflux.net>
Subject: Re: [PATCH v3 1/3] mm: introduce fincore()
Date: Mon, 7 Jul 2014 17:48:20 -0400	[thread overview]
Message-ID: <20140707214820.GA13596@nhori.bos.redhat.com> (raw)
In-Reply-To: <53BB0673.8020604@intel.com>

On Mon, Jul 07, 2014 at 01:43:31PM -0700, Dave Hansen wrote:
> On 07/07/2014 01:21 PM, Naoya Horiguchi wrote:
> > On Mon, Jul 07, 2014 at 12:01:41PM -0700, Dave Hansen wrote:
> >> But, is this trying to do too many things at once?  Do we have solid use
> >> cases spelled out for each of these modes?  Have we thought out how they
> >> will be used in practice?
> > 
> > tools/vm/page-types.c will be an in-kernel user after this base code is
> > accepted. The idea of doing fincore() thing comes up during the discussion
> > with Konstantin over file cache mode of this tool.
> > pfn and page flag are needed there, so I think it's one clear usecase.
> 
> I'm going to take that as a no. :)

As for other usecases, database developers should have some demand for
physical addresses (especially numa node?) or page flags (especially
page reclaim or writeback related ones).
But I'm not a database expert so can't say how, sorry.

> The whole FINCORE_PGOFF vs. FINCORE_BMAP issue is something that will
> come up in practice.  We just don't have the interfaces for an end user
> to pick which one they want to use.
> 
> >> Is it really right to say this is going to be 8 bytes?  Would we want it
> >> to share types with something else, like be an loff_t?
> > 
> > Could you elaborate it more?
> 
> We specify file offsets in other system calls, like the lseek family.  I
> was just thinking that this type should match up with those calls since
> they are expressing the same data type with the same ranges and limitations.

The 2nd parameter is loff_t, do we already do this?

> >>> + * - FINCORE_PFN:
> >>> + *     stores pfn, using 8 bytes.
> >>
> >> These are all an unprivileged operations from what I can tell.  I know
> >> we're going to a lot of trouble to hide kernel addresses from being seen
> >> in userspace.  This seems like it would be undesirable for the folks
> >> that care about not leaking kernel addresses, especially for
> >> unprivileged users.
> >>
> >> This would essentially tell userspace where in the kernel's address
> >> space some user-controlled data will be.
> > 
> > OK, so this and FINCORE_PAGEFLAGS will be limited for privileged users.

Sorry, this statement of mine might a bit short-sighted, and I'd like
to revoke it.
I think that some page flags and/or numa info should be useful outside
the debugging environment, and safe to expose to userspace. So limiting
to bitmap-one for unprivileged users is too strict.

> Then I'd just question their usefulness outside of a debugging
> environment, especially when you can get at them in other (more
> roundabout) ways in a debugging environment.
> 
> This is really looking to me like two system calls.  The bitmap-based
> one, and another more extensible one.  I don't think there's any harm in
> having two system calls, especially when they're trying to glue together
> two disparate interfaces.

I think that if separating syscall into two, one for privileged users
and one for unprivileged users migth be fine (rather than bitmap-based
one and extensible one.)

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-07-07 21:49 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-07 18:00 [PATCH v3 0/3] mm: introduce fincore() v3 Naoya Horiguchi
2014-07-07 18:00 ` Naoya Horiguchi
2014-07-07 18:00 ` [PATCH v3 1/3] mm: introduce fincore() Naoya Horiguchi
2014-07-07 18:00   ` Naoya Horiguchi
2014-07-07 19:01   ` Dave Hansen
2014-07-07 19:01     ` Dave Hansen
2014-07-07 20:21     ` Naoya Horiguchi
2014-07-07 20:21       ` Naoya Horiguchi
2014-07-07 20:43       ` Dave Hansen
2014-07-07 20:43         ` Dave Hansen
2014-07-07 21:48         ` Naoya Horiguchi [this message]
2014-07-07 21:48           ` Naoya Horiguchi
2014-07-07 22:44           ` Dave Hansen
2014-07-07 22:44             ` Dave Hansen
2014-07-08 15:35             ` Naoya Horiguchi
2014-07-08 15:35               ` Naoya Horiguchi
2014-07-08 19:03     ` Naoya Horiguchi
2014-07-08 19:03       ` Naoya Horiguchi
2014-07-08 19:42       ` Dave Hansen
2014-07-08 19:42         ` Dave Hansen
2014-07-08 20:41         ` Naoya Horiguchi
2014-07-08 20:41           ` Naoya Horiguchi
2014-07-08 22:32           ` Dave Hansen
2014-07-08 22:32             ` Dave Hansen
2014-07-11 16:53             ` Naoya Horiguchi
2014-07-11 16:53               ` Naoya Horiguchi
2014-07-07 18:00 ` [PATCH v3 2/3] selftests/fincore: add test code for fincore() Naoya Horiguchi
2014-07-07 18:00   ` Naoya Horiguchi
2014-07-07 18:00 ` [PATCH v3 3/3] man2/fincore.2: document general description about fincore(2) Naoya Horiguchi
2014-07-07 18:00   ` Naoya Horiguchi
2014-07-07 19:08   ` Dave Hansen
2014-07-07 19:08     ` Dave Hansen
2014-07-07 19:08     ` Dave Hansen
2014-07-07 20:59     ` Naoya Horiguchi
2014-07-07 20:59       ` Naoya Horiguchi
2014-07-07 22:34       ` Dave Hansen
2014-07-07 22:34         ` Dave Hansen
2014-07-08 15:43         ` Naoya Horiguchi
2014-07-08 15:43           ` Naoya Horiguchi
2014-07-08 12:16 ` [PATCH v3 0/3] mm: introduce fincore() v3 Christoph Hellwig
2014-07-08 12:16   ` Christoph Hellwig
2014-07-08 13:27   ` Naoya Horiguchi
2014-07-08 13:27     ` Naoya Horiguchi
2014-07-09  8:51     ` Christoph Hellwig
2014-07-09  8:51       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140707214820.GA13596@nhori.bos.redhat.com \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andres@2ndquadrant.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kees@outflux.net \
    --cc=kirill@shutemov.name \
    --cc=koct9i@gmail.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mtk.manpages@gmail.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.