All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Rientjes <rientjes@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Al Viro <viro@zeniv.linux.org.uk>, <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Anatoly Stepanov <astepanov@cloudlinux.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Theodore Ts'o" <tytso@mit.edu>
Subject: Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
Date: Mon, 16 Jan 2017 11:09:37 -0800	[thread overview]
Message-ID: <0ca8a212-c651-7915-af25-23925e1c1cc3@nvidia.com> (raw)
In-Reply-To: <20170116084717.GA13641@dhcp22.suse.cz>



On 01/16/2017 12:47 AM, Michal Hocko wrote:
> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>
>>
>> On 01/12/2017 07:37 AM, Michal Hocko wrote:
> [...]
>>> diff --git a/mm/util.c b/mm/util.c
>>> index 3cb2164f4099..7e0c240b5760 100644
>>> --- a/mm/util.c
>>> +++ b/mm/util.c
>>> @@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
>>>  }
>>>  EXPORT_SYMBOL(vm_mmap);
>>>
>>> +/**
>>> + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
>>
>> Hi Michal,
>>
>> How about this wording instead:
>>
>> kvmalloc_node - attempt to allocate physically contiguous memory, but upon
>> failure, fall back to non-contiguous (vmalloc) allocation.
>
> OK, why not.
>
>>> + * @size: size of the request.
>>> + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
>>> + * @node: numa node to allocate from
>>> + *
>>> + * Uses kmalloc to get the memory but if the allocation fails then falls back
>>> + * to the vmalloc allocator. Use kvfree for freeing the memory.
>>> + *
>>> + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
>>
>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>> earlier approach? I am having trouble reconciling it with rest of the
>> patchset, because:
>>
>> a) the flags argument below is effectively passed on to either kmalloc_node
>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>
> The above only says thos are _unsupported_ - in other words the behavior
> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> it doesn't mean they are used that way.  Remember that vmalloc uses
> some hardcoded GFP_KERNEL allocations.  So while I could be really
> strict about this and mask away these flags I doubt this is worth the
> additional code.

I do wonder about passing those flags through to kmalloc. Maybe it is worth stripping out 
__GFP_NORETRY and __GFP_NOFAIL, after all. It provides some insulation from any future changes to 
the implementation of kmalloc, and it also makes the documentation more believable.

>
>> b) In patch 6/6, you are in fact passing in __GFP_REPEAT to the wrappers
>> (kvzalloc, for example), and again, only adding, not removing flags.
>
> Patch 2 adds a support for __GFP_REPEAT and updates the above line as
> well.

OK, I see.

>
>>> + */
>>> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>> +{
>>> +	gfp_t kmalloc_flags = flags;
>>> +	void *ret;
>>> +
>>> +	/*
>>> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>> +	 * so the given set of flags has to be compatible.
>>> +	 */
>>> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
>>> +
>>> +	/*
>>> +	 * Make sure that larger requests are not too disruptive - no OOM
>>> +	 * killer and no allocation failure warnings as we have a fallback
>>> +	 */
>>> +	if (size > PAGE_SIZE)
>>> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
>>> +
>>> +	ret = kmalloc_node(size, kmalloc_flags, node);
>>
>> Along those lines (dealing with larger requests), is there any value in
>> picking some threshold value, and going straight to vmalloc if size is
>> greater than that threshold?
>
> I am not a fan of thresholds. PAGE_ALLOC_COSTLY_ORDER which is
> internally used by the page allocator has turned out to be a major pain.
> I do not want to repeat the same mistake again here. Besides that you
> could hard find a "one suits all" value so it would have to be a part of
> the API. If we ever grow users who would really like to do something
> like that then a specialized API should be added.

Thanks for explaining, and the note about the pain of dealing with PAGE_ALLOC_COSTLY_ORDER is 
especially interesting. Sounds good, then.

thanks
john h

>
>> It's less flexible and might even require
>> occasional maintenance over the years, but it would save some time on *some*
>> systems in some cases...OK, I think I just talked myself out of the whole
>> idea. But I still want to put the question out there, because I think others
>> may also ask it, and I'd like to hear a more experienced opinion.
>
>
> --
> Michal Hocko
> SUSE Labs
>

WARNING: multiple messages have this Message-ID (diff)
From: John Hubbard <jhubbard@nvidia.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Rientjes <rientjes@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Anatoly Stepanov <astepanov@cloudlinux.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Theodore Ts'o <tytso@mit.edu>
Subject: Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
Date: Mon, 16 Jan 2017 11:09:37 -0800	[thread overview]
Message-ID: <0ca8a212-c651-7915-af25-23925e1c1cc3@nvidia.com> (raw)
In-Reply-To: <20170116084717.GA13641@dhcp22.suse.cz>



On 01/16/2017 12:47 AM, Michal Hocko wrote:
> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>
>>
>> On 01/12/2017 07:37 AM, Michal Hocko wrote:
> [...]
>>> diff --git a/mm/util.c b/mm/util.c
>>> index 3cb2164f4099..7e0c240b5760 100644
>>> --- a/mm/util.c
>>> +++ b/mm/util.c
>>> @@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
>>>  }
>>>  EXPORT_SYMBOL(vm_mmap);
>>>
>>> +/**
>>> + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
>>
>> Hi Michal,
>>
>> How about this wording instead:
>>
>> kvmalloc_node - attempt to allocate physically contiguous memory, but upon
>> failure, fall back to non-contiguous (vmalloc) allocation.
>
> OK, why not.
>
>>> + * @size: size of the request.
>>> + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
>>> + * @node: numa node to allocate from
>>> + *
>>> + * Uses kmalloc to get the memory but if the allocation fails then falls back
>>> + * to the vmalloc allocator. Use kvfree for freeing the memory.
>>> + *
>>> + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
>>
>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>> earlier approach? I am having trouble reconciling it with rest of the
>> patchset, because:
>>
>> a) the flags argument below is effectively passed on to either kmalloc_node
>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>
> The above only says thos are _unsupported_ - in other words the behavior
> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> it doesn't mean they are used that way.  Remember that vmalloc uses
> some hardcoded GFP_KERNEL allocations.  So while I could be really
> strict about this and mask away these flags I doubt this is worth the
> additional code.

I do wonder about passing those flags through to kmalloc. Maybe it is worth stripping out 
__GFP_NORETRY and __GFP_NOFAIL, after all. It provides some insulation from any future changes to 
the implementation of kmalloc, and it also makes the documentation more believable.

>
>> b) In patch 6/6, you are in fact passing in __GFP_REPEAT to the wrappers
>> (kvzalloc, for example), and again, only adding, not removing flags.
>
> Patch 2 adds a support for __GFP_REPEAT and updates the above line as
> well.

OK, I see.

>
>>> + */
>>> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>> +{
>>> +	gfp_t kmalloc_flags = flags;
>>> +	void *ret;
>>> +
>>> +	/*
>>> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>> +	 * so the given set of flags has to be compatible.
>>> +	 */
>>> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
>>> +
>>> +	/*
>>> +	 * Make sure that larger requests are not too disruptive - no OOM
>>> +	 * killer and no allocation failure warnings as we have a fallback
>>> +	 */
>>> +	if (size > PAGE_SIZE)
>>> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
>>> +
>>> +	ret = kmalloc_node(size, kmalloc_flags, node);
>>
>> Along those lines (dealing with larger requests), is there any value in
>> picking some threshold value, and going straight to vmalloc if size is
>> greater than that threshold?
>
> I am not a fan of thresholds. PAGE_ALLOC_COSTLY_ORDER which is
> internally used by the page allocator has turned out to be a major pain.
> I do not want to repeat the same mistake again here. Besides that you
> could hard find a "one suits all" value so it would have to be a part of
> the API. If we ever grow users who would really like to do something
> like that then a specialized API should be added.

Thanks for explaining, and the note about the pain of dealing with PAGE_ALLOC_COSTLY_ORDER is 
especially interesting. Sounds good, then.

thanks
john h

>
>> It's less flexible and might even require
>> occasional maintenance over the years, but it would save some time on *some*
>> systems in some cases...OK, I think I just talked myself out of the whole
>> idea. But I still want to put the question out there, because I think others
>> may also ask it, and I'd like to hear a more experienced opinion.
>
>
> --
> Michal Hocko
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-01-16 19:09 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-12 15:37 [PATCH 0/6 v3] kvmalloc Michal Hocko
2017-01-12 15:37 ` Michal Hocko
2017-01-12 15:37 ` [PATCH 1/6] mm: introduce kv[mz]alloc helpers Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-16  4:34   ` John Hubbard
2017-01-16  4:34     ` John Hubbard
2017-01-16  8:47     ` Michal Hocko
2017-01-16  8:47       ` Michal Hocko
2017-01-16 19:09       ` John Hubbard [this message]
2017-01-16 19:09         ` John Hubbard
2017-01-16 19:40         ` Michal Hocko
2017-01-16 19:40           ` Michal Hocko
2017-01-16 21:15           ` John Hubbard
2017-01-16 21:15             ` John Hubbard
2017-01-16 21:48             ` Michal Hocko
2017-01-16 21:48               ` Michal Hocko
2017-01-16 21:57               ` John Hubbard
2017-01-16 21:57                 ` John Hubbard
2017-01-17  7:51                 ` Michal Hocko
2017-01-17  7:51                   ` Michal Hocko
2017-01-18  5:59                   ` John Hubbard
2017-01-18  5:59                     ` John Hubbard
2017-01-18  8:21                     ` Michal Hocko
2017-01-18  8:21                       ` Michal Hocko
2017-01-19  8:37                       ` John Hubbard
2017-01-19  8:37                         ` John Hubbard
2017-01-19  8:45                         ` Michal Hocko
2017-01-19  8:45                           ` Michal Hocko
2017-01-19  9:09                           ` John Hubbard
2017-01-19  9:09                             ` John Hubbard
2017-01-19  9:56                             ` Michal Hocko
2017-01-19  9:56                               ` Michal Hocko
2017-01-19 21:28                               ` John Hubbard
2017-01-19 21:28                                 ` John Hubbard
2017-01-26 12:09   ` Michal Hocko
2017-01-26 12:09     ` Michal Hocko
2017-01-30  8:42     ` Vlastimil Babka
2017-01-30  8:42       ` Vlastimil Babka
2017-01-12 15:37 ` [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 16:12   ` Michael S. Tsirkin
2017-01-12 16:12     ` Michael S. Tsirkin
2017-01-14  2:42   ` Tetsuo Handa
2017-01-14  2:42     ` Tetsuo Handa
2017-01-14  8:45     ` Michal Hocko
2017-01-14  8:45       ` Michal Hocko
2017-01-24 15:40   ` Michael S. Tsirkin
2017-01-24 15:40     ` Michael S. Tsirkin
2017-01-12 15:37 ` [PATCH 3/6] rhashtable: simplify a strange allocation pattern Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:37 ` [PATCH 4/6] ila: " Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:37 ` [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:57   ` David Sterba
2017-01-12 15:57     ` David Sterba
2017-01-12 15:57     ` David Sterba
2017-01-12 16:05   ` Christian Borntraeger
2017-01-12 16:05     ` Christian Borntraeger
2017-01-12 16:05     ` Christian Borntraeger
2017-01-12 16:54   ` Ilya Dryomov
2017-01-12 16:54     ` Ilya Dryomov
2017-01-12 16:54     ` Ilya Dryomov
2017-01-12 17:18     ` Michal Hocko
2017-01-12 17:18       ` Michal Hocko
2017-01-12 17:18       ` Michal Hocko
2017-01-12 17:00   ` Dan Williams
2017-01-12 17:00     ` Dan Williams
2017-01-12 17:00     ` Dan Williams
2017-01-12 17:26   ` Kees Cook
2017-01-12 17:26     ` Kees Cook
2017-01-12 17:26     ` Kees Cook
2017-01-12 17:37     ` Michal Hocko
2017-01-12 17:37       ` Michal Hocko
2017-01-12 17:37       ` Michal Hocko
2017-01-20 13:41       ` Vlastimil Babka
2017-01-20 13:41         ` Vlastimil Babka
2017-01-20 13:41         ` Vlastimil Babka
2017-01-24 15:00         ` Michal Hocko
2017-01-24 15:00           ` Michal Hocko
2017-01-24 15:00           ` Michal Hocko
2017-01-25 11:15           ` Vlastimil Babka
2017-01-25 11:15             ` Vlastimil Babka
2017-01-25 11:15             ` Vlastimil Babka
2017-01-25 13:09             ` Michal Hocko
2017-01-25 13:09               ` Michal Hocko
2017-01-25 13:09               ` Michal Hocko
2017-01-25 13:40               ` Ilya Dryomov
2017-01-25 13:40                 ` Ilya Dryomov
2017-01-25 13:40                 ` Ilya Dryomov
2017-01-12 17:29   ` Michal Hocko
2017-01-12 17:29     ` Michal Hocko
2017-01-12 17:29     ` Michal Hocko
2017-01-14  3:01     ` Tetsuo Handa
2017-01-14  3:01       ` Tetsuo Handa
2017-01-14  8:49       ` Michal Hocko
2017-01-14  8:49         ` Michal Hocko
2017-01-12 20:14   ` Boris Ostrovsky
2017-01-12 20:14     ` Boris Ostrovsky
2017-01-12 20:14     ` Boris Ostrovsky
2017-01-13  1:11   ` Dilger, Andreas
2017-01-13  1:11     ` Dilger, Andreas
2017-01-13  1:11     ` Dilger, Andreas
2017-01-14 10:56   ` Leon Romanovsky
2017-01-14 10:56     ` Leon Romanovsky
2017-01-16  7:33     ` Michal Hocko
2017-01-16  7:33       ` Michal Hocko
2017-01-16  7:33       ` Michal Hocko
2017-01-16  8:28       ` Leon Romanovsky
2017-01-16  8:28         ` Leon Romanovsky
2017-01-16  8:18   ` Tariq Toukan
2017-01-16  8:18     ` Tariq Toukan
2017-01-16  8:18     ` Tariq Toukan
2017-01-12 15:37 ` [RFC PATCH 6/6] net: use kvmalloc with __GFP_REPEAT rather than open coded variant Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-24 15:17 ` [PATCH 0/6 v3] kvmalloc Michal Hocko
2017-01-24 15:17   ` Michal Hocko
2017-01-24 16:00   ` Eric Dumazet
2017-01-24 16:00     ` Eric Dumazet
2017-01-25 13:10     ` Michal Hocko
2017-01-25 13:10       ` Michal Hocko
2017-01-24 19:17   ` Alexei Starovoitov
2017-01-24 19:17     ` Alexei Starovoitov
2017-01-25 13:10     ` Michal Hocko
2017-01-25 13:10       ` Michal Hocko
2017-01-25 13:21       ` Michal Hocko
2017-01-25 13:21         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ca8a212-c651-7915-af25-23925e1c1cc3@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=astepanov@cloudlinux.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rientjes@google.com \
    --cc=snitzer@redhat.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.