All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	David Miller <davem@davemloft.net>,
	Andres Freund <andres@2ndquadrant.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Linux API <linux-api@vger.kernel.org>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Kees Cook <kees@outflux.net>
Subject: Re: [PATCH v3 1/3] mm: introduce fincore()
Date: Mon, 07 Jul 2014 15:44:22 -0700	[thread overview]
Message-ID: <53BB22C6.2020502@intel.com> (raw)
In-Reply-To: <20140707214820.GA13596@nhori.bos.redhat.com>

On 07/07/2014 02:48 PM, Naoya Horiguchi wrote:
> On Mon, Jul 07, 2014 at 01:43:31PM -0700, Dave Hansen wrote:
>> The whole FINCORE_PGOFF vs. FINCORE_BMAP issue is something that will
>> come up in practice.  We just don't have the interfaces for an end user
>> to pick which one they want to use.
>>
>>>> Is it really right to say this is going to be 8 bytes?  Would we want it
>>>> to share types with something else, like be an loff_t?
>>>
>>> Could you elaborate it more?
>>
>> We specify file offsets in other system calls, like the lseek family.  I
>> was just thinking that this type should match up with those calls since
>> they are expressing the same data type with the same ranges and limitations.
> 
> The 2nd parameter is loff_t, do we already do this?

I mean the fields in the buffer, like:

> +Any of the following flags are to be set to add an 8 byte field in each entry.
> +You can set any of these flags at the same time, although you can't set
> +FINCORE_BMAP combined with these 8 byte field flags.


>>>> This would essentially tell userspace where in the kernel's address
>>>> space some user-controlled data will be.
>>>
>>> OK, so this and FINCORE_PAGEFLAGS will be limited for privileged users.
> 
> Sorry, this statement of mine might a bit short-sighted, and I'd like
> to revoke it.
> I think that some page flags and/or numa info should be useful outside
> the debugging environment, and safe to expose to userspace. So limiting
> to bitmap-one for unprivileged users is too strict.

The PFN is not the same as NUMA information, and the PFN is insufficient
to describe the NUMA node on all systems that Linux supports.

Trying to get NUMA information back out is a good goal, but doing it
with PFNs is a bad idea since they have so many consequences.

I'm also bummed exporting NUMA information was a design goal of these
patches, but they weren't mentioned in any of the patch descriptions.

>> Then I'd just question their usefulness outside of a debugging
>> environment, especially when you can get at them in other (more
>> roundabout) ways in a debugging environment.
>>
>> This is really looking to me like two system calls.  The bitmap-based
>> one, and another more extensible one.  I don't think there's any harm in
>> having two system calls, especially when they're trying to glue together
>> two disparate interfaces.
> 
> I think that if separating syscall into two, one for privileged users
> and one for unprivileged users migth be fine (rather than bitmap-based
> one and extensible one.)

The problem as I see it is shoehorning two interfaces in to the same
syscall.  If there are privileged and unprivileged operations that use
the same _interfaces_ I think they should share a syscall.

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave.hansen@intel.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	David Miller <davem@davemloft.net>,
	Andres Freund <andres@2ndquadrant.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Linux API <linux-api@vger.kernel.org>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Kees Cook <kees@outflux.net>
Subject: Re: [PATCH v3 1/3] mm: introduce fincore()
Date: Mon, 07 Jul 2014 15:44:22 -0700	[thread overview]
Message-ID: <53BB22C6.2020502@intel.com> (raw)
In-Reply-To: <20140707214820.GA13596@nhori.bos.redhat.com>

On 07/07/2014 02:48 PM, Naoya Horiguchi wrote:
> On Mon, Jul 07, 2014 at 01:43:31PM -0700, Dave Hansen wrote:
>> The whole FINCORE_PGOFF vs. FINCORE_BMAP issue is something that will
>> come up in practice.  We just don't have the interfaces for an end user
>> to pick which one they want to use.
>>
>>>> Is it really right to say this is going to be 8 bytes?  Would we want it
>>>> to share types with something else, like be an loff_t?
>>>
>>> Could you elaborate it more?
>>
>> We specify file offsets in other system calls, like the lseek family.  I
>> was just thinking that this type should match up with those calls since
>> they are expressing the same data type with the same ranges and limitations.
> 
> The 2nd parameter is loff_t, do we already do this?

I mean the fields in the buffer, like:

> +Any of the following flags are to be set to add an 8 byte field in each entry.
> +You can set any of these flags at the same time, although you can't set
> +FINCORE_BMAP combined with these 8 byte field flags.


>>>> This would essentially tell userspace where in the kernel's address
>>>> space some user-controlled data will be.
>>>
>>> OK, so this and FINCORE_PAGEFLAGS will be limited for privileged users.
> 
> Sorry, this statement of mine might a bit short-sighted, and I'd like
> to revoke it.
> I think that some page flags and/or numa info should be useful outside
> the debugging environment, and safe to expose to userspace. So limiting
> to bitmap-one for unprivileged users is too strict.

The PFN is not the same as NUMA information, and the PFN is insufficient
to describe the NUMA node on all systems that Linux supports.

Trying to get NUMA information back out is a good goal, but doing it
with PFNs is a bad idea since they have so many consequences.

I'm also bummed exporting NUMA information was a design goal of these
patches, but they weren't mentioned in any of the patch descriptions.

>> Then I'd just question their usefulness outside of a debugging
>> environment, especially when you can get at them in other (more
>> roundabout) ways in a debugging environment.
>>
>> This is really looking to me like two system calls.  The bitmap-based
>> one, and another more extensible one.  I don't think there's any harm in
>> having two system calls, especially when they're trying to glue together
>> two disparate interfaces.
> 
> I think that if separating syscall into two, one for privileged users
> and one for unprivileged users migth be fine (rather than bitmap-based
> one and extensible one.)

The problem as I see it is shoehorning two interfaces in to the same
syscall.  If there are privileged and unprivileged operations that use
the same _interfaces_ I think they should share a syscall.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-07-07 22:44 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-07 18:00 [PATCH v3 0/3] mm: introduce fincore() v3 Naoya Horiguchi
2014-07-07 18:00 ` Naoya Horiguchi
2014-07-07 18:00 ` [PATCH v3 1/3] mm: introduce fincore() Naoya Horiguchi
2014-07-07 18:00   ` Naoya Horiguchi
2014-07-07 19:01   ` Dave Hansen
2014-07-07 19:01     ` Dave Hansen
2014-07-07 20:21     ` Naoya Horiguchi
2014-07-07 20:21       ` Naoya Horiguchi
2014-07-07 20:43       ` Dave Hansen
2014-07-07 20:43         ` Dave Hansen
2014-07-07 21:48         ` Naoya Horiguchi
2014-07-07 21:48           ` Naoya Horiguchi
2014-07-07 22:44           ` Dave Hansen [this message]
2014-07-07 22:44             ` Dave Hansen
2014-07-08 15:35             ` Naoya Horiguchi
2014-07-08 15:35               ` Naoya Horiguchi
2014-07-08 19:03     ` Naoya Horiguchi
2014-07-08 19:03       ` Naoya Horiguchi
2014-07-08 19:42       ` Dave Hansen
2014-07-08 19:42         ` Dave Hansen
2014-07-08 20:41         ` Naoya Horiguchi
2014-07-08 20:41           ` Naoya Horiguchi
2014-07-08 22:32           ` Dave Hansen
2014-07-08 22:32             ` Dave Hansen
2014-07-11 16:53             ` Naoya Horiguchi
2014-07-11 16:53               ` Naoya Horiguchi
2014-07-07 18:00 ` [PATCH v3 2/3] selftests/fincore: add test code for fincore() Naoya Horiguchi
2014-07-07 18:00   ` Naoya Horiguchi
2014-07-07 18:00 ` [PATCH v3 3/3] man2/fincore.2: document general description about fincore(2) Naoya Horiguchi
2014-07-07 18:00   ` Naoya Horiguchi
2014-07-07 19:08   ` Dave Hansen
2014-07-07 19:08     ` Dave Hansen
2014-07-07 19:08     ` Dave Hansen
2014-07-07 20:59     ` Naoya Horiguchi
2014-07-07 20:59       ` Naoya Horiguchi
2014-07-07 22:34       ` Dave Hansen
2014-07-07 22:34         ` Dave Hansen
2014-07-08 15:43         ` Naoya Horiguchi
2014-07-08 15:43           ` Naoya Horiguchi
2014-07-08 12:16 ` [PATCH v3 0/3] mm: introduce fincore() v3 Christoph Hellwig
2014-07-08 12:16   ` Christoph Hellwig
2014-07-08 13:27   ` Naoya Horiguchi
2014-07-08 13:27     ` Naoya Horiguchi
2014-07-09  8:51     ` Christoph Hellwig
2014-07-09  8:51       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53BB22C6.2020502@intel.com \
    --to=dave.hansen@intel.com \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andres@2ndquadrant.com \
    --cc=bp@alien8.de \
    --cc=davem@davemloft.net \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kees@outflux.net \
    --cc=kirill@shutemov.name \
    --cc=koct9i@gmail.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mtk.manpages@gmail.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.