All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] mm/gup_benchmark: fix MAP_HUGETLB case
@ 2019-10-21 21:24 John Hubbard
  2019-10-21 21:24 ` [PATCH 1/1] " John Hubbard
  0 siblings, 1 reply; 7+ messages in thread
From: John Hubbard @ 2019-10-21 21:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Keith Busch, LKML, linux-mm, linux-kselftest, John Hubbard

Hi,

Here's another gup_benchmark.c fix, which I ran into while adding
support for the upcoming FOLL_PIN work. Anyway, the problem is
clearly described in the patch commit description, and the fix seems
like the best way to me, but the fix is not *completely* black and
white.

This fix forces MAP_ANONYMOUS for the MAP_HUGETLB case. However,
another way to do it might be to mmap() against a valid hugetlb
page file, instead of /dev/zero. But that seems like a lot of
trouble and if I'm reading the intent correctly, MAP_ANONYMOUS
is what's desired anyway.

John Hubbard (1):
  mm/gup_benchmark: fix MAP_HUGETLB case

 tools/testing/selftests/vm/gup_benchmark.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] mm/gup_benchmark: fix MAP_HUGETLB case
  2019-10-21 21:24 [PATCH 0/1] mm/gup_benchmark: fix MAP_HUGETLB case John Hubbard
@ 2019-10-21 21:24 ` John Hubbard
  2019-10-22 17:14   ` Jerome Glisse
  0 siblings, 1 reply; 7+ messages in thread
From: John Hubbard @ 2019-10-21 21:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Keith Busch, LKML, linux-mm, linux-kselftest, John Hubbard

The MAP_HUGETLB ("-H" option) of gup_benchmark fails:

$ sudo ./gup_benchmark -H
mmap: Invalid argument

This is because gup_benchmark.c is passing in a file descriptor to
mmap(), but the fd came from opening up the /dev/zero file. This
confuses the mmap syscall implementation, which thinks that, if the
caller did not specify MAP_ANONYMOUS, then the file must be a huge
page file. So it attempts to verify that the file really is a huge
page file, as you can see here:

ksys_mmap_pgoff()
{
    if (!(flags & MAP_ANONYMOUS)) {
        retval = -EINVAL;
        if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
            goto out_fput; /* THIS IS WHERE WE END UP */

    else if (flags & MAP_HUGETLB) {
        ...proceed normally, /dev/zero is ok here...

...and of course is_file_hugepages() returns "false" for the /dev/zero
file.

The problem is that the user space program, gup_benchmark.c, really just
wants anonymous memory here. The simplest way to get that is to pass
MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this
patch does.

Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 tools/testing/selftests/vm/gup_benchmark.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index cb3fc09645c4..485cf06ef013 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -71,7 +71,7 @@ int main(int argc, char **argv)
 			flags |= MAP_SHARED;
 			break;
 		case 'H':
-			flags |= MAP_HUGETLB;
+			flags |= (MAP_HUGETLB | MAP_ANONYMOUS);
 			break;
 		default:
 			return -1;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mm/gup_benchmark: fix MAP_HUGETLB case
  2019-10-21 21:24 ` [PATCH 1/1] " John Hubbard
@ 2019-10-22 17:14   ` Jerome Glisse
  2019-10-22 18:41     ` John Hubbard
  0 siblings, 1 reply; 7+ messages in thread
From: Jerome Glisse @ 2019-10-22 17:14 UTC (permalink / raw)
  To: John Hubbard; +Cc: Andrew Morton, Keith Busch, LKML, linux-mm, linux-kselftest

On Mon, Oct 21, 2019 at 02:24:35PM -0700, John Hubbard wrote:
> The MAP_HUGETLB ("-H" option) of gup_benchmark fails:
> 
> $ sudo ./gup_benchmark -H
> mmap: Invalid argument
> 
> This is because gup_benchmark.c is passing in a file descriptor to
> mmap(), but the fd came from opening up the /dev/zero file. This
> confuses the mmap syscall implementation, which thinks that, if the
> caller did not specify MAP_ANONYMOUS, then the file must be a huge
> page file. So it attempts to verify that the file really is a huge
> page file, as you can see here:
> 
> ksys_mmap_pgoff()
> {
>     if (!(flags & MAP_ANONYMOUS)) {
>         retval = -EINVAL;
>         if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
>             goto out_fput; /* THIS IS WHERE WE END UP */
> 
>     else if (flags & MAP_HUGETLB) {
>         ...proceed normally, /dev/zero is ok here...
> 
> ...and of course is_file_hugepages() returns "false" for the /dev/zero
> file.
> 
> The problem is that the user space program, gup_benchmark.c, really just
> wants anonymous memory here. The simplest way to get that is to pass
> MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this
> patch does.

This looks wrong, MAP_HUGETLB should only be use to create vma
for hugetlbfs. If you want anonymous private vma do not set the
MAP_HUGETLB. If you want huge page inside your anonymous vma
there is nothing to do at the mmap time, this is the job of the
transparent huge page code (THP).

NAK as misleading

Cheers,
Jérôme


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mm/gup_benchmark: fix MAP_HUGETLB case
  2019-10-22 17:14   ` Jerome Glisse
@ 2019-10-22 18:41     ` John Hubbard
  2019-10-22 18:57       ` Jerome Glisse
  0 siblings, 1 reply; 7+ messages in thread
From: John Hubbard @ 2019-10-22 18:41 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Andrew Morton, Keith Busch, LKML, linux-mm, linux-kselftest

On 10/22/19 10:14 AM, Jerome Glisse wrote:
> On Mon, Oct 21, 2019 at 02:24:35PM -0700, John Hubbard wrote:
>> The MAP_HUGETLB ("-H" option) of gup_benchmark fails:
>>
>> $ sudo ./gup_benchmark -H
>> mmap: Invalid argument
>>
>> This is because gup_benchmark.c is passing in a file descriptor to
>> mmap(), but the fd came from opening up the /dev/zero file. This
>> confuses the mmap syscall implementation, which thinks that, if the
>> caller did not specify MAP_ANONYMOUS, then the file must be a huge
>> page file. So it attempts to verify that the file really is a huge
>> page file, as you can see here:
>>
>> ksys_mmap_pgoff()
>> {
>>     if (!(flags & MAP_ANONYMOUS)) {
>>         retval = -EINVAL;
>>         if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
>>             goto out_fput; /* THIS IS WHERE WE END UP */
>>
>>     else if (flags & MAP_HUGETLB) {
>>         ...proceed normally, /dev/zero is ok here...
>>
>> ...and of course is_file_hugepages() returns "false" for the /dev/zero
>> file.
>>
>> The problem is that the user space program, gup_benchmark.c, really just
>> wants anonymous memory here. The simplest way to get that is to pass
>> MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this
>> patch does.
> 
> This looks wrong, MAP_HUGETLB should only be use to create vma
> for hugetlbfs. If you want anonymous private vma do not set the
> MAP_HUGETLB. If you want huge page inside your anonymous vma
> there is nothing to do at the mmap time, this is the job of the
> transparent huge page code (THP).
> 

Not the point. Please look more closely at ksys_mmap_pgoff(). You'll 
see that, since 2009 (and probably earlier; 2009 is just when Hugh Dickens 
moved it over from util.c), this routine has had full support for using
hugetlbfs automatically, via mmap.

It does that via hugetlb_file_setup():

unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
			      unsigned long prot, unsigned long flags,
			      unsigned long fd, unsigned long pgoff)
{
...
	if (!(flags & MAP_ANONYMOUS)) {
...
	} else if (flags & MAP_HUGETLB) {
		struct user_struct *user = NULL;
		struct hstate *hs;

		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
		if (!hs)
			return -EINVAL;

		len = ALIGN(len, huge_page_size(hs));
		/*
		 * VM_NORESERVE is used because the reservations will be
		 * taken when vm_ops->mmap() is called
		 * A dummy user value is used because we are not locking
		 * memory so no accounting is necessary
		 */
		file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
				VM_NORESERVE,
				&user, HUGETLB_ANONHUGE_INODE,
				(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
		if (IS_ERR(file))
			return PTR_ERR(file);
	}
...


Also, there are 14 (!) other pre-existing examples of passing
MAP_HUGETLB | MAP_ANONYMOUS to mmap, so I'm not exactly the first one
to reach this understanding.


> NAK as misleading

Ouch. But I think I'm actually leading correctly, rather than misleading.
Can you prove me wrong? :)


thanks,

John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mm/gup_benchmark: fix MAP_HUGETLB case
  2019-10-22 18:41     ` John Hubbard
@ 2019-10-22 18:57       ` Jerome Glisse
  2019-10-22 19:39         ` John Hubbard
  0 siblings, 1 reply; 7+ messages in thread
From: Jerome Glisse @ 2019-10-22 18:57 UTC (permalink / raw)
  To: John Hubbard; +Cc: Andrew Morton, Keith Busch, LKML, linux-mm, linux-kselftest

On Tue, Oct 22, 2019 at 11:41:57AM -0700, John Hubbard wrote:
> On 10/22/19 10:14 AM, Jerome Glisse wrote:
> > On Mon, Oct 21, 2019 at 02:24:35PM -0700, John Hubbard wrote:
> >> The MAP_HUGETLB ("-H" option) of gup_benchmark fails:
> >>
> >> $ sudo ./gup_benchmark -H
> >> mmap: Invalid argument
> >>
> >> This is because gup_benchmark.c is passing in a file descriptor to
> >> mmap(), but the fd came from opening up the /dev/zero file. This
> >> confuses the mmap syscall implementation, which thinks that, if the
> >> caller did not specify MAP_ANONYMOUS, then the file must be a huge
> >> page file. So it attempts to verify that the file really is a huge
> >> page file, as you can see here:
> >>
> >> ksys_mmap_pgoff()
> >> {
> >>     if (!(flags & MAP_ANONYMOUS)) {
> >>         retval = -EINVAL;
> >>         if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
> >>             goto out_fput; /* THIS IS WHERE WE END UP */
> >>
> >>     else if (flags & MAP_HUGETLB) {
> >>         ...proceed normally, /dev/zero is ok here...
> >>
> >> ...and of course is_file_hugepages() returns "false" for the /dev/zero
> >> file.
> >>
> >> The problem is that the user space program, gup_benchmark.c, really just
> >> wants anonymous memory here. The simplest way to get that is to pass
> >> MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this
> >> patch does.
> > 
> > This looks wrong, MAP_HUGETLB should only be use to create vma
> > for hugetlbfs. If you want anonymous private vma do not set the
> > MAP_HUGETLB. If you want huge page inside your anonymous vma
> > there is nothing to do at the mmap time, this is the job of the
> > transparent huge page code (THP).
> > 
> 
> Not the point. Please look more closely at ksys_mmap_pgoff(). You'll 
> see that, since 2009 (and probably earlier; 2009 is just when Hugh Dickens 
> moved it over from util.c), this routine has had full support for using
> hugetlbfs automatically, via mmap.
> 
> It does that via hugetlb_file_setup():
> 
> unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
> 			      unsigned long prot, unsigned long flags,
> 			      unsigned long fd, unsigned long pgoff)
> {
> ...
> 	if (!(flags & MAP_ANONYMOUS)) {
> ...
> 	} else if (flags & MAP_HUGETLB) {
> 		struct user_struct *user = NULL;
> 		struct hstate *hs;
> 
> 		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
> 		if (!hs)
> 			return -EINVAL;
> 
> 		len = ALIGN(len, huge_page_size(hs));
> 		/*
> 		 * VM_NORESERVE is used because the reservations will be
> 		 * taken when vm_ops->mmap() is called
> 		 * A dummy user value is used because we are not locking
> 		 * memory so no accounting is necessary
> 		 */
> 		file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
> 				VM_NORESERVE,
> 				&user, HUGETLB_ANONHUGE_INODE,
> 				(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
> 		if (IS_ERR(file))
> 			return PTR_ERR(file);
> 	}
> ...
> 
> 
> Also, there are 14 (!) other pre-existing examples of passing
> MAP_HUGETLB | MAP_ANONYMOUS to mmap, so I'm not exactly the first one
> to reach this understanding.
> 
> 
> > NAK as misleading
> 
> Ouch. But I think I'm actually leading correctly, rather than misleading.
> Can you prove me wrong? :)

So i was misslead by the file descriptor, passing a file descriptor and
asking for anonymous always bugs me. But yeah the _linux_ kernel is happy
to ignore the file argument if you set the anonymous flag. I guess the
rules of passing -1 for fd when anonymous is just engrave in my brain.

Also i thought that the file was an argument of the test and thus that
for huge you needed to pass a hugetlbfs' file.

Anyway my mistake, you are right, you can pass a file and ask for anonymous
and hugetlb at the same time.

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mm/gup_benchmark: fix MAP_HUGETLB case
  2019-10-22 18:57       ` Jerome Glisse
@ 2019-10-22 19:39         ` John Hubbard
  2019-10-22 21:34           ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: John Hubbard @ 2019-10-22 19:39 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Andrew Morton, Keith Busch, LKML, linux-mm, linux-kselftest

On 10/22/19 11:57 AM, Jerome Glisse wrote:
> On Tue, Oct 22, 2019 at 11:41:57AM -0700, John Hubbard wrote:
>> On 10/22/19 10:14 AM, Jerome Glisse wrote:
>>> On Mon, Oct 21, 2019 at 02:24:35PM -0700, John Hubbard wrote:
>>>> The MAP_HUGETLB ("-H" option) of gup_benchmark fails:
>> ...
> So i was misslead by the file descriptor, passing a file descriptor and
> asking for anonymous always bugs me. But yeah the _linux_ kernel is happy
> to ignore the file argument if you set the anonymous flag. I guess the
> rules of passing -1 for fd when anonymous is just engrave in my brain.
> 

Yeah, I definitely get that. In fact, I initially considered further changing 
the test code so as to pass -1 for fd in this case, but because it's pure 
Linux-only test code, it didn't really seem worth the (small) additional
change.

> Also i thought that the file was an argument of the test and thus that
> for huge you needed to pass a hugetlbfs' file.
> 
> Anyway my mistake, you are right, you can pass a file and ask for anonymous
> and hugetlb at the same time.
> 
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> 

Thanks for the review! 

Admin note: this already went into mmotm, so I'm hoping Andrew will notice this
email and add the Reviewed-by tag, please?

thanks,

John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] mm/gup_benchmark: fix MAP_HUGETLB case
  2019-10-22 19:39         ` John Hubbard
@ 2019-10-22 21:34           ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2019-10-22 21:34 UTC (permalink / raw)
  To: John Hubbard; +Cc: Jerome Glisse, Keith Busch, LKML, linux-mm, linux-kselftest

On Tue, 22 Oct 2019 12:39:53 -0700 John Hubbard <jhubbard@nvidia.com> wrote:

> Admin note: this already went into mmotm, so I'm hoping Andrew will notice this
> email and add the Reviewed-by tag, please?

Always. (Well, almost ;))


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-22 21:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 21:24 [PATCH 0/1] mm/gup_benchmark: fix MAP_HUGETLB case John Hubbard
2019-10-21 21:24 ` [PATCH 1/1] " John Hubbard
2019-10-22 17:14   ` Jerome Glisse
2019-10-22 18:41     ` John Hubbard
2019-10-22 18:57       ` Jerome Glisse
2019-10-22 19:39         ` John Hubbard
2019-10-22 21:34           ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.