* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-25 18:14 ` Alexei Starovoitov
0 siblings, 0 replies; 49+ messages in thread
From: Alexei Starovoitov @ 2017-01-25 18:14 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner,
linux-mm, LKML, Daniel Borkmann, netdev
On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 25-01-17 14:10:06, Michal Hocko wrote:
>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
>> > On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
>> > > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
>> > > > Hi,
>> > > > this has been previously posted as a single patch [1] but later on more
>> > > > built on top. It turned out that there are users who would like to have
>> > > > __GFP_REPEAT semantic. This is currently implemented for costly >64B
>> > > > requests. Doing the same for smaller requests would require to redefine
>> > > > __GFP_REPEAT semantic in the page allocator which is out of scope of
>> > > > this series.
>> > > >
>> > > > There are many open coded kmalloc with vmalloc fallback instances in
>> > > > the tree. Most of them are not careful enough or simply do not care
>> > > > about the underlying semantic of the kmalloc/page allocator which means
>> > > > that a) some vmalloc fallbacks are basically unreachable because the
>> > > > kmalloc part will keep retrying until it succeeds b) the page allocator
>> > > > can invoke a really disruptive steps like the OOM killer to move forward
>> > > > which doesn't sound appropriate when we consider that the vmalloc
>> > > > fallback is available.
>> > > >
>> > > > As it can be seen implementing kvmalloc requires quite an intimate
>> > > > knowledge if the page allocator and the memory reclaim internals which
>> > > > strongly suggests that a helper should be implemented in the memory
>> > > > subsystem proper.
>> > > >
>> > > > Most callers I could find have been converted to use the helper instead.
>> > > > This is patch 5. There are some more relying on __GFP_REPEAT in the
>> > > > networking stack which I have converted as well but considering we do
>> > > > not have a support for __GFP_REPEAT for requests smaller than 64kB I
>> > > > have marked it RFC.
>> > >
>> > > Are there any more comments? I would really appreciate to hear from
>> > > networking folks before I resubmit the series.
>> >
>> > while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
>> > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
>> > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
>> > it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
>> > So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
>>
>> OK, will do. Thanks for the heads up.
>
> Just for the record, I will fold the following into the patch 1
> ---
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 19b6129eab23..8697f43cf93c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>
> void *bpf_map_area_alloc(size_t size)
> {
> - /* We definitely need __GFP_NORETRY, so OOM killer doesn't
> - * trigger under memory pressure as we really just want to
> - * fail instead.
> - */
> - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
> - void *area;
> -
> - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> - area = kmalloc(size, GFP_USER | flags);
> - if (area != NULL)
> - return area;
> - }
> -
> - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
> - PAGE_KERNEL);
> + return kvzalloc(size, GFP_USER);
> }
>
> void bpf_map_area_free(void *area)
Looks fine by me.
Daniel, thoughts?
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-25 18:14 ` Alexei Starovoitov 0 siblings, 0 replies; 49+ messages in thread From: Alexei Starovoitov @ 2017-01-25 18:14 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, Daniel Borkmann, netdev On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote: > On Wed 25-01-17 14:10:06, Michal Hocko wrote: >> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote: >> > On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote: >> > > On Thu 12-01-17 16:37:11, Michal Hocko wrote: >> > > > Hi, >> > > > this has been previously posted as a single patch [1] but later on more >> > > > built on top. It turned out that there are users who would like to have >> > > > __GFP_REPEAT semantic. This is currently implemented for costly >64B >> > > > requests. Doing the same for smaller requests would require to redefine >> > > > __GFP_REPEAT semantic in the page allocator which is out of scope of >> > > > this series. >> > > > >> > > > There are many open coded kmalloc with vmalloc fallback instances in >> > > > the tree. Most of them are not careful enough or simply do not care >> > > > about the underlying semantic of the kmalloc/page allocator which means >> > > > that a) some vmalloc fallbacks are basically unreachable because the >> > > > kmalloc part will keep retrying until it succeeds b) the page allocator >> > > > can invoke a really disruptive steps like the OOM killer to move forward >> > > > which doesn't sound appropriate when we consider that the vmalloc >> > > > fallback is available. >> > > > >> > > > As it can be seen implementing kvmalloc requires quite an intimate >> > > > knowledge if the page allocator and the memory reclaim internals which >> > > > strongly suggests that a helper should be implemented in the memory >> > > > subsystem proper. >> > > > >> > > > Most callers I could find have been converted to use the helper instead. >> > > > This is patch 5. There are some more relying on __GFP_REPEAT in the >> > > > networking stack which I have converted as well but considering we do >> > > > not have a support for __GFP_REPEAT for requests smaller than 64kB I >> > > > have marked it RFC. >> > > >> > > Are there any more comments? I would really appreciate to hear from >> > > networking folks before I resubmit the series. >> > >> > while this patchset was baking the bpf side switched to use bpf_map_area_alloc() >> > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly. >> > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc") >> > it covers all kmalloc/vmalloc pairs instead of just one place as in this set. >> > So please rebase and switch bpf_map_area_alloc() to use kvmalloc(). >> >> OK, will do. Thanks for the heads up. > > Just for the record, I will fold the following into the patch 1 > --- > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 19b6129eab23..8697f43cf93c 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) > > void *bpf_map_area_alloc(size_t size) > { > - /* We definitely need __GFP_NORETRY, so OOM killer doesn't > - * trigger under memory pressure as we really just want to > - * fail instead. > - */ > - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; > - void *area; > - > - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { > - area = kmalloc(size, GFP_USER | flags); > - if (area != NULL) > - return area; > - } > - > - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, > - PAGE_KERNEL); > + return kvzalloc(size, GFP_USER); > } > > void bpf_map_area_free(void *area) Looks fine by me. Daniel, thoughts? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-25 18:14 ` Alexei Starovoitov @ 2017-01-25 20:16 ` Daniel Borkmann -1 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-25 20:16 UTC (permalink / raw) To: Alexei Starovoitov, Michal Hocko Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev On 01/25/2017 07:14 PM, Alexei Starovoitov wrote: > On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote: >> On Wed 25-01-17 14:10:06, Michal Hocko wrote: >>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote: [...] >>>>> Are there any more comments? I would really appreciate to hear from >>>>> networking folks before I resubmit the series. >>>> >>>> while this patchset was baking the bpf side switched to use bpf_map_area_alloc() >>>> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly. >>>> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc") >>>> it covers all kmalloc/vmalloc pairs instead of just one place as in this set. >>>> So please rebase and switch bpf_map_area_alloc() to use kvmalloc(). >>> >>> OK, will do. Thanks for the heads up. >> >> Just for the record, I will fold the following into the patch 1 >> --- >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index 19b6129eab23..8697f43cf93c 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) >> >> void *bpf_map_area_alloc(size_t size) >> { >> - /* We definitely need __GFP_NORETRY, so OOM killer doesn't >> - * trigger under memory pressure as we really just want to >> - * fail instead. >> - */ >> - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; >> - void *area; >> - >> - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { >> - area = kmalloc(size, GFP_USER | flags); >> - if (area != NULL) >> - return area; >> - } >> - >> - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, >> - PAGE_KERNEL); >> + return kvzalloc(size, GFP_USER); >> } >> >> void bpf_map_area_free(void *area) > > Looks fine by me. > Daniel, thoughts? I assume that kvzalloc() is still the same from [1], right? If so, then it would unfortunately (partially) reintroduce the issue that was fixed. If you look above at flags, they're also passed to __vmalloc() to not trigger OOM in these situations I've experienced. This is effectively the same requirement as in other networking areas f.e. that 5bad87348c70 ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. In your comment in kvzalloc() you eventually say that some of the above modifiers are not supported. So there would be two options, i) just leave out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle it later (along with similar code from 5bad87348c70), or ii) implement support for these modifiers as well to your original set. I guess it's not too urgent, so we could also proceed with i) if that is easier for you to proceed (I don't mind either way). Thanks a lot, Daniel [1] https://lkml.org/lkml/2017/1/12/442 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-25 20:16 ` Daniel Borkmann 0 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-25 20:16 UTC (permalink / raw) To: Alexei Starovoitov, Michal Hocko Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev On 01/25/2017 07:14 PM, Alexei Starovoitov wrote: > On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote: >> On Wed 25-01-17 14:10:06, Michal Hocko wrote: >>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote: [...] >>>>> Are there any more comments? I would really appreciate to hear from >>>>> networking folks before I resubmit the series. >>>> >>>> while this patchset was baking the bpf side switched to use bpf_map_area_alloc() >>>> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly. >>>> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc") >>>> it covers all kmalloc/vmalloc pairs instead of just one place as in this set. >>>> So please rebase and switch bpf_map_area_alloc() to use kvmalloc(). >>> >>> OK, will do. Thanks for the heads up. >> >> Just for the record, I will fold the following into the patch 1 >> --- >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index 19b6129eab23..8697f43cf93c 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) >> >> void *bpf_map_area_alloc(size_t size) >> { >> - /* We definitely need __GFP_NORETRY, so OOM killer doesn't >> - * trigger under memory pressure as we really just want to >> - * fail instead. >> - */ >> - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; >> - void *area; >> - >> - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { >> - area = kmalloc(size, GFP_USER | flags); >> - if (area != NULL) >> - return area; >> - } >> - >> - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, >> - PAGE_KERNEL); >> + return kvzalloc(size, GFP_USER); >> } >> >> void bpf_map_area_free(void *area) > > Looks fine by me. > Daniel, thoughts? I assume that kvzalloc() is still the same from [1], right? If so, then it would unfortunately (partially) reintroduce the issue that was fixed. If you look above at flags, they're also passed to __vmalloc() to not trigger OOM in these situations I've experienced. This is effectively the same requirement as in other networking areas f.e. that 5bad87348c70 ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. In your comment in kvzalloc() you eventually say that some of the above modifiers are not supported. So there would be two options, i) just leave out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle it later (along with similar code from 5bad87348c70), or ii) implement support for these modifiers as well to your original set. I guess it's not too urgent, so we could also proceed with i) if that is easier for you to proceed (I don't mind either way). Thanks a lot, Daniel [1] https://lkml.org/lkml/2017/1/12/442 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-25 20:16 ` Daniel Borkmann @ 2017-01-26 7:43 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 7:43 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: > On 01/25/2017 07:14 PM, Alexei Starovoitov wrote: > > On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote: > > > On Wed 25-01-17 14:10:06, Michal Hocko wrote: > > > > On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote: > [...] > > > > > > Are there any more comments? I would really appreciate to hear from > > > > > > networking folks before I resubmit the series. > > > > > > > > > > while this patchset was baking the bpf side switched to use bpf_map_area_alloc() > > > > > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly. > > > > > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc") > > > > > it covers all kmalloc/vmalloc pairs instead of just one place as in this set. > > > > > So please rebase and switch bpf_map_area_alloc() to use kvmalloc(). > > > > > > > > OK, will do. Thanks for the heads up. > > > > > > Just for the record, I will fold the following into the patch 1 > > > --- > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > index 19b6129eab23..8697f43cf93c 100644 > > > --- a/kernel/bpf/syscall.c > > > +++ b/kernel/bpf/syscall.c > > > @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) > > > > > > void *bpf_map_area_alloc(size_t size) > > > { > > > - /* We definitely need __GFP_NORETRY, so OOM killer doesn't > > > - * trigger under memory pressure as we really just want to > > > - * fail instead. > > > - */ > > > - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; > > > - void *area; > > > - > > > - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { > > > - area = kmalloc(size, GFP_USER | flags); > > > - if (area != NULL) > > > - return area; > > > - } > > > - > > > - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, > > > - PAGE_KERNEL); > > > + return kvzalloc(size, GFP_USER); > > > } > > > > > > void bpf_map_area_free(void *area) > > > > Looks fine by me. > > Daniel, thoughts? > > I assume that kvzalloc() is still the same from [1], right? If so, then > it would unfortunately (partially) reintroduce the issue that was fixed. > If you look above at flags, they're also passed to __vmalloc() to not > trigger OOM in these situations I've experienced. Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might think it would. It can still trigger the OOM killer becauset the flags are no propagated all the way down to all allocations requests (e.g. page tables). This is the same reason why GFP_NOFS is not supported in vmalloc. > This is effectively the > same requirement as in other networking areas f.e. that 5bad87348c70 > ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. > In your comment in kvzalloc() you eventually say that some of the above > modifiers are not supported. So there would be two options, i) just leave > out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle > it later (along with similar code from 5bad87348c70), or ii) implement > support for these modifiers as well to your original set. I guess it's not > too urgent, so we could also proceed with i) if that is easier for you to > proceed (I don't mind either way). Could you clarify why the oom killer in vmalloc matters actually? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 7:43 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 7:43 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: > On 01/25/2017 07:14 PM, Alexei Starovoitov wrote: > > On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote: > > > On Wed 25-01-17 14:10:06, Michal Hocko wrote: > > > > On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote: > [...] > > > > > > Are there any more comments? I would really appreciate to hear from > > > > > > networking folks before I resubmit the series. > > > > > > > > > > while this patchset was baking the bpf side switched to use bpf_map_area_alloc() > > > > > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly. > > > > > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc") > > > > > it covers all kmalloc/vmalloc pairs instead of just one place as in this set. > > > > > So please rebase and switch bpf_map_area_alloc() to use kvmalloc(). > > > > > > > > OK, will do. Thanks for the heads up. > > > > > > Just for the record, I will fold the following into the patch 1 > > > --- > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > index 19b6129eab23..8697f43cf93c 100644 > > > --- a/kernel/bpf/syscall.c > > > +++ b/kernel/bpf/syscall.c > > > @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) > > > > > > void *bpf_map_area_alloc(size_t size) > > > { > > > - /* We definitely need __GFP_NORETRY, so OOM killer doesn't > > > - * trigger under memory pressure as we really just want to > > > - * fail instead. > > > - */ > > > - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; > > > - void *area; > > > - > > > - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { > > > - area = kmalloc(size, GFP_USER | flags); > > > - if (area != NULL) > > > - return area; > > > - } > > > - > > > - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, > > > - PAGE_KERNEL); > > > + return kvzalloc(size, GFP_USER); > > > } > > > > > > void bpf_map_area_free(void *area) > > > > Looks fine by me. > > Daniel, thoughts? > > I assume that kvzalloc() is still the same from [1], right? If so, then > it would unfortunately (partially) reintroduce the issue that was fixed. > If you look above at flags, they're also passed to __vmalloc() to not > trigger OOM in these situations I've experienced. Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might think it would. It can still trigger the OOM killer becauset the flags are no propagated all the way down to all allocations requests (e.g. page tables). This is the same reason why GFP_NOFS is not supported in vmalloc. > This is effectively the > same requirement as in other networking areas f.e. that 5bad87348c70 > ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. > In your comment in kvzalloc() you eventually say that some of the above > modifiers are not supported. So there would be two options, i) just leave > out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle > it later (along with similar code from 5bad87348c70), or ii) implement > support for these modifiers as well to your original set. I guess it's not > too urgent, so we could also proceed with i) if that is easier for you to > proceed (I don't mind either way). Could you clarify why the oom killer in vmalloc matters actually? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 7:43 ` Michal Hocko @ 2017-01-26 9:36 ` Daniel Borkmann -1 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 9:36 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 08:43 AM, Michal Hocko wrote: > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: >> On 01/25/2017 07:14 PM, Alexei Starovoitov wrote: >>> On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote: >>>> On Wed 25-01-17 14:10:06, Michal Hocko wrote: >>>>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote: >> [...] >>>>>>> Are there any more comments? I would really appreciate to hear from >>>>>>> networking folks before I resubmit the series. >>>>>> >>>>>> while this patchset was baking the bpf side switched to use bpf_map_area_alloc() >>>>>> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly. >>>>>> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc") >>>>>> it covers all kmalloc/vmalloc pairs instead of just one place as in this set. >>>>>> So please rebase and switch bpf_map_area_alloc() to use kvmalloc(). >>>>> >>>>> OK, will do. Thanks for the heads up. >>>> >>>> Just for the record, I will fold the following into the patch 1 >>>> --- >>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >>>> index 19b6129eab23..8697f43cf93c 100644 >>>> --- a/kernel/bpf/syscall.c >>>> +++ b/kernel/bpf/syscall.c >>>> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) >>>> >>>> void *bpf_map_area_alloc(size_t size) >>>> { >>>> - /* We definitely need __GFP_NORETRY, so OOM killer doesn't >>>> - * trigger under memory pressure as we really just want to >>>> - * fail instead. >>>> - */ >>>> - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; >>>> - void *area; >>>> - >>>> - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { >>>> - area = kmalloc(size, GFP_USER | flags); >>>> - if (area != NULL) >>>> - return area; >>>> - } >>>> - >>>> - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, >>>> - PAGE_KERNEL); >>>> + return kvzalloc(size, GFP_USER); >>>> } >>>> >>>> void bpf_map_area_free(void *area) >>> >>> Looks fine by me. >>> Daniel, thoughts? >> >> I assume that kvzalloc() is still the same from [1], right? If so, then >> it would unfortunately (partially) reintroduce the issue that was fixed. >> If you look above at flags, they're also passed to __vmalloc() to not >> trigger OOM in these situations I've experienced. > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > think it would. It can still trigger the OOM killer becauset the flags > are no propagated all the way down to all allocations requests (e.g. > page tables). This is the same reason why GFP_NOFS is not supported in > vmalloc. Ok, good to know, is that somewhere clearly documented (like for the case with kmalloc())? If not, could we do that for non-mm folks, or at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make it obvious to users that a given flag combination is not supported all the way down? >> This is effectively the >> same requirement as in other networking areas f.e. that 5bad87348c70 >> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. >> In your comment in kvzalloc() you eventually say that some of the above >> modifiers are not supported. So there would be two options, i) just leave >> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle >> it later (along with similar code from 5bad87348c70), or ii) implement >> support for these modifiers as well to your original set. I guess it's not >> too urgent, so we could also proceed with i) if that is easier for you to >> proceed (I don't mind either way). > > Could you clarify why the oom killer in vmalloc matters actually? For both mentioned commits, (privileged) user space can potentially create large allocation requests, where we thus switch to vmalloc() flavor eventually and then OOM starts killing processes to try to satisfy the allocation request. This is bad, because we want the request to just fail instead as it's non-critical and f.e. not kill ssh connection et al. Failing is totally fine in this case, whereas triggering OOM is not. In my testing, __GFP_NORETRY did satisfy this just fine, but as you say it seems it's not enough. Given there are multiple places like these in the kernel, could we instead add an option such as __GFP_NOOOM, or just make __GFP_NORETRY supported? Thanks, Daniel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 9:36 ` Daniel Borkmann 0 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 9:36 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 08:43 AM, Michal Hocko wrote: > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: >> On 01/25/2017 07:14 PM, Alexei Starovoitov wrote: >>> On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote: >>>> On Wed 25-01-17 14:10:06, Michal Hocko wrote: >>>>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote: >> [...] >>>>>>> Are there any more comments? I would really appreciate to hear from >>>>>>> networking folks before I resubmit the series. >>>>>> >>>>>> while this patchset was baking the bpf side switched to use bpf_map_area_alloc() >>>>>> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly. >>>>>> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc") >>>>>> it covers all kmalloc/vmalloc pairs instead of just one place as in this set. >>>>>> So please rebase and switch bpf_map_area_alloc() to use kvmalloc(). >>>>> >>>>> OK, will do. Thanks for the heads up. >>>> >>>> Just for the record, I will fold the following into the patch 1 >>>> --- >>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >>>> index 19b6129eab23..8697f43cf93c 100644 >>>> --- a/kernel/bpf/syscall.c >>>> +++ b/kernel/bpf/syscall.c >>>> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) >>>> >>>> void *bpf_map_area_alloc(size_t size) >>>> { >>>> - /* We definitely need __GFP_NORETRY, so OOM killer doesn't >>>> - * trigger under memory pressure as we really just want to >>>> - * fail instead. >>>> - */ >>>> - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; >>>> - void *area; >>>> - >>>> - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { >>>> - area = kmalloc(size, GFP_USER | flags); >>>> - if (area != NULL) >>>> - return area; >>>> - } >>>> - >>>> - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, >>>> - PAGE_KERNEL); >>>> + return kvzalloc(size, GFP_USER); >>>> } >>>> >>>> void bpf_map_area_free(void *area) >>> >>> Looks fine by me. >>> Daniel, thoughts? >> >> I assume that kvzalloc() is still the same from [1], right? If so, then >> it would unfortunately (partially) reintroduce the issue that was fixed. >> If you look above at flags, they're also passed to __vmalloc() to not >> trigger OOM in these situations I've experienced. > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > think it would. It can still trigger the OOM killer becauset the flags > are no propagated all the way down to all allocations requests (e.g. > page tables). This is the same reason why GFP_NOFS is not supported in > vmalloc. Ok, good to know, is that somewhere clearly documented (like for the case with kmalloc())? If not, could we do that for non-mm folks, or at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make it obvious to users that a given flag combination is not supported all the way down? >> This is effectively the >> same requirement as in other networking areas f.e. that 5bad87348c70 >> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. >> In your comment in kvzalloc() you eventually say that some of the above >> modifiers are not supported. So there would be two options, i) just leave >> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle >> it later (along with similar code from 5bad87348c70), or ii) implement >> support for these modifiers as well to your original set. I guess it's not >> too urgent, so we could also proceed with i) if that is easier for you to >> proceed (I don't mind either way). > > Could you clarify why the oom killer in vmalloc matters actually? For both mentioned commits, (privileged) user space can potentially create large allocation requests, where we thus switch to vmalloc() flavor eventually and then OOM starts killing processes to try to satisfy the allocation request. This is bad, because we want the request to just fail instead as it's non-critical and f.e. not kill ssh connection et al. Failing is totally fine in this case, whereas triggering OOM is not. In my testing, __GFP_NORETRY did satisfy this just fine, but as you say it seems it's not enough. Given there are multiple places like these in the kernel, could we instead add an option such as __GFP_NOOOM, or just make __GFP_NORETRY supported? Thanks, Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* RE: [PATCH 0/6 v3] kvmalloc 2017-01-26 9:36 ` Daniel Borkmann @ 2017-01-26 9:48 ` David Laight -1 siblings, 0 replies; 49+ messages in thread From: David Laight @ 2017-01-26 9:48 UTC (permalink / raw) To: 'Daniel Borkmann', Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner From: Daniel Borkmann > Sent: 26 January 2017 09:37 ... > >> I assume that kvzalloc() is still the same from [1], right? If so, then > >> it would unfortunately (partially) reintroduce the issue that was fixed. > >> If you look above at flags, they're also passed to __vmalloc() to not > >> trigger OOM in these situations I've experienced. > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > > think it would. It can still trigger the OOM killer becauset the flags > > are no propagated all the way down to all allocations requests (e.g. > > page tables). This is the same reason why GFP_NOFS is not supported in > > vmalloc. > > Ok, good to know, is that somewhere clearly documented (like for the > case with kmalloc())? If not, could we do that for non-mm folks, or > at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make > it obvious to users that a given flag combination is not supported all > the way down? ISTM that requests for the relatively small memory blocks needed for page tables aren't really likely to invoke the OOM killer when it isn't already being invoked by other actions. So that isn't really a problem. More of a problem is that requests that you really don't mind failing can use the last 'reasonably available' memory. This will cause the next allocate to fail when it would be better for the earlier one to fail instead. David ^ permalink raw reply [flat|nested] 49+ messages in thread
* RE: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 9:48 ` David Laight 0 siblings, 0 replies; 49+ messages in thread From: David Laight @ 2017-01-26 9:48 UTC (permalink / raw) To: 'Daniel Borkmann', Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner From: Daniel Borkmann > Sent: 26 January 2017 09:37 ... > >> I assume that kvzalloc() is still the same from [1], right? If so, then > >> it would unfortunately (partially) reintroduce the issue that was fixed. > >> If you look above at flags, they're also passed to __vmalloc() to not > >> trigger OOM in these situations I've experienced. > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > > think it would. It can still trigger the OOM killer becauset the flags > > are no propagated all the way down to all allocations requests (e.g. > > page tables). This is the same reason why GFP_NOFS is not supported in > > vmalloc. > > Ok, good to know, is that somewhere clearly documented (like for the > case with kmalloc())? If not, could we do that for non-mm folks, or > at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make > it obvious to users that a given flag combination is not supported all > the way down? ISTM that requests for the relatively small memory blocks needed for page tables aren't really likely to invoke the OOM killer when it isn't already being invoked by other actions. So that isn't really a problem. More of a problem is that requests that you really don't mind failing can use the last 'reasonably available' memory. This will cause the next allocate to fail when it would be better for the earlier one to fail instead. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 9:36 ` Daniel Borkmann @ 2017-01-26 10:08 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 10:08 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: > On 01/26/2017 08:43 AM, Michal Hocko wrote: > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: [...] > > > I assume that kvzalloc() is still the same from [1], right? If so, then > > > it would unfortunately (partially) reintroduce the issue that was fixed. > > > If you look above at flags, they're also passed to __vmalloc() to not > > > trigger OOM in these situations I've experienced. > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > > think it would. It can still trigger the OOM killer becauset the flags > > are no propagated all the way down to all allocations requests (e.g. > > page tables). This is the same reason why GFP_NOFS is not supported in > > vmalloc. > > Ok, good to know, is that somewhere clearly documented (like for the > case with kmalloc())? I am afraid that we really suck on this front. I will add something. > If not, could we do that for non-mm folks, or > at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make > it obvious to users that a given flag combination is not supported all > the way down? I am not sure that triggering a warning that somebody has used __GFP_NOWARN is very helpful ;). I also do not think that covering all the supported flags is really feasible. Most of them will not have bad side effects. I have added the warning because this API is new and I wanted to catch new abusers. Old ones would have to die slowly. > > > This is effectively the > > > same requirement as in other networking areas f.e. that 5bad87348c70 > > > ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. > > > In your comment in kvzalloc() you eventually say that some of the above > > > modifiers are not supported. So there would be two options, i) just leave > > > out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle > > > it later (along with similar code from 5bad87348c70), or ii) implement > > > support for these modifiers as well to your original set. I guess it's not > > > too urgent, so we could also proceed with i) if that is easier for you to > > > proceed (I don't mind either way). > > > > Could you clarify why the oom killer in vmalloc matters actually? > > For both mentioned commits, (privileged) user space can potentially > create large allocation requests, where we thus switch to vmalloc() > flavor eventually and then OOM starts killing processes to try to > satisfy the allocation request. This is bad, because we want the > request to just fail instead as it's non-critical and f.e. not kill > ssh connection et al. Failing is totally fine in this case, whereas > triggering OOM is not. I see your intention but does it really make any real difference? Consider you would back off right before you would have OOMed. Any parallel request would just hit the OOM for you. You are (almost) never doing an allocation in an isolation. > In my testing, __GFP_NORETRY did satisfy this > just fine, but as you say it seems it's not enough. Yeah, ptes have been most probably popullated already. > Given there are > multiple places like these in the kernel, could we instead add an > option such as __GFP_NOOOM, or just make __GFP_NORETRY supported? As said above I do not really think that suppressing the OOM killer makes any difference because it might be just somebody else doing that for you. Also the OOM killer is the MM internal implementation "detail" users shouldn't really care. I agree that callers should have a way to say they do not want to try really hard and that is not that simple for vmalloc unfortunatelly. The main problem here is that gfp mask propagation is not that easy to fix without a lot of code churn as some of those hardcoded allocation requests are deep in call chains. I know this sucks and it would be great to support __GFP_NORETRY to [k]vmalloc and maybe we will get there eventually. But for the mean time I really think that using kvmalloc wherever possible is much better than open coded variants whith expectations which do not hold sometimes. If you disagree I can drop the bpf part of course... -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 10:08 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 10:08 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: > On 01/26/2017 08:43 AM, Michal Hocko wrote: > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: [...] > > > I assume that kvzalloc() is still the same from [1], right? If so, then > > > it would unfortunately (partially) reintroduce the issue that was fixed. > > > If you look above at flags, they're also passed to __vmalloc() to not > > > trigger OOM in these situations I've experienced. > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > > think it would. It can still trigger the OOM killer becauset the flags > > are no propagated all the way down to all allocations requests (e.g. > > page tables). This is the same reason why GFP_NOFS is not supported in > > vmalloc. > > Ok, good to know, is that somewhere clearly documented (like for the > case with kmalloc())? I am afraid that we really suck on this front. I will add something. > If not, could we do that for non-mm folks, or > at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make > it obvious to users that a given flag combination is not supported all > the way down? I am not sure that triggering a warning that somebody has used __GFP_NOWARN is very helpful ;). I also do not think that covering all the supported flags is really feasible. Most of them will not have bad side effects. I have added the warning because this API is new and I wanted to catch new abusers. Old ones would have to die slowly. > > > This is effectively the > > > same requirement as in other networking areas f.e. that 5bad87348c70 > > > ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. > > > In your comment in kvzalloc() you eventually say that some of the above > > > modifiers are not supported. So there would be two options, i) just leave > > > out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle > > > it later (along with similar code from 5bad87348c70), or ii) implement > > > support for these modifiers as well to your original set. I guess it's not > > > too urgent, so we could also proceed with i) if that is easier for you to > > > proceed (I don't mind either way). > > > > Could you clarify why the oom killer in vmalloc matters actually? > > For both mentioned commits, (privileged) user space can potentially > create large allocation requests, where we thus switch to vmalloc() > flavor eventually and then OOM starts killing processes to try to > satisfy the allocation request. This is bad, because we want the > request to just fail instead as it's non-critical and f.e. not kill > ssh connection et al. Failing is totally fine in this case, whereas > triggering OOM is not. I see your intention but does it really make any real difference? Consider you would back off right before you would have OOMed. Any parallel request would just hit the OOM for you. You are (almost) never doing an allocation in an isolation. > In my testing, __GFP_NORETRY did satisfy this > just fine, but as you say it seems it's not enough. Yeah, ptes have been most probably popullated already. > Given there are > multiple places like these in the kernel, could we instead add an > option such as __GFP_NOOOM, or just make __GFP_NORETRY supported? As said above I do not really think that suppressing the OOM killer makes any difference because it might be just somebody else doing that for you. Also the OOM killer is the MM internal implementation "detail" users shouldn't really care. I agree that callers should have a way to say they do not want to try really hard and that is not that simple for vmalloc unfortunatelly. The main problem here is that gfp mask propagation is not that easy to fix without a lot of code churn as some of those hardcoded allocation requests are deep in call chains. I know this sucks and it would be great to support __GFP_NORETRY to [k]vmalloc and maybe we will get there eventually. But for the mean time I really think that using kvmalloc wherever possible is much better than open coded variants whith expectations which do not hold sometimes. If you disagree I can drop the bpf part of course... -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 10:08 ` Michal Hocko @ 2017-01-26 10:32 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 10:32 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 11:08:02, Michal Hocko wrote: > On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: > > On 01/26/2017 08:43 AM, Michal Hocko wrote: > > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: > [...] > > > > I assume that kvzalloc() is still the same from [1], right? If so, then > > > > it would unfortunately (partially) reintroduce the issue that was fixed. > > > > If you look above at flags, they're also passed to __vmalloc() to not > > > > trigger OOM in these situations I've experienced. > > > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > > > think it would. It can still trigger the OOM killer becauset the flags > > > are no propagated all the way down to all allocations requests (e.g. > > > page tables). This is the same reason why GFP_NOFS is not supported in > > > vmalloc. > > > > Ok, good to know, is that somewhere clearly documented (like for the > > case with kmalloc())? > > I am afraid that we really suck on this front. I will add something. So I have folded the following to the patch 1. It is in line with kvmalloc and hopefully at least tell more than the current code. --- diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d89034a393f2..6c1aa2c68887 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, * Allocate enough pages to cover @size from the page level * allocator with @gfp_mask flags. Map them into contiguous * kernel virtual space, using a pagetable protection of @prot. + * + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT + * and __GFP_NOFAIL are not supported + * + * Any use of gfp flags outside of GFP_KERNEL should be consulted + * with mm people. + * */ static void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, pgprot_t prot, -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 10:32 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 10:32 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 11:08:02, Michal Hocko wrote: > On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: > > On 01/26/2017 08:43 AM, Michal Hocko wrote: > > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: > [...] > > > > I assume that kvzalloc() is still the same from [1], right? If so, then > > > > it would unfortunately (partially) reintroduce the issue that was fixed. > > > > If you look above at flags, they're also passed to __vmalloc() to not > > > > trigger OOM in these situations I've experienced. > > > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > > > think it would. It can still trigger the OOM killer becauset the flags > > > are no propagated all the way down to all allocations requests (e.g. > > > page tables). This is the same reason why GFP_NOFS is not supported in > > > vmalloc. > > > > Ok, good to know, is that somewhere clearly documented (like for the > > case with kmalloc())? > > I am afraid that we really suck on this front. I will add something. So I have folded the following to the patch 1. It is in line with kvmalloc and hopefully at least tell more than the current code. --- diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d89034a393f2..6c1aa2c68887 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, * Allocate enough pages to cover @size from the page level * allocator with @gfp_mask flags. Map them into contiguous * kernel virtual space, using a pagetable protection of @prot. + * + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT + * and __GFP_NOFAIL are not supported + * + * Any use of gfp flags outside of GFP_KERNEL should be consulted + * with mm people. + * */ static void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, pgprot_t prot, -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 10:32 ` Michal Hocko @ 2017-01-26 11:04 ` Daniel Borkmann -1 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 11:04 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 11:32 AM, Michal Hocko wrote: > On Thu 26-01-17 11:08:02, Michal Hocko wrote: >> On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: >>> On 01/26/2017 08:43 AM, Michal Hocko wrote: >>>> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: >> [...] >>>>> I assume that kvzalloc() is still the same from [1], right? If so, then >>>>> it would unfortunately (partially) reintroduce the issue that was fixed. >>>>> If you look above at flags, they're also passed to __vmalloc() to not >>>>> trigger OOM in these situations I've experienced. >>>> >>>> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might >>>> think it would. It can still trigger the OOM killer becauset the flags >>>> are no propagated all the way down to all allocations requests (e.g. >>>> page tables). This is the same reason why GFP_NOFS is not supported in >>>> vmalloc. >>> >>> Ok, good to know, is that somewhere clearly documented (like for the >>> case with kmalloc())? >> >> I am afraid that we really suck on this front. I will add something. > > So I have folded the following to the patch 1. It is in line with > kvmalloc and hopefully at least tell more than the current code. > --- > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d89034a393f2..6c1aa2c68887 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > * Allocate enough pages to cover @size from the page level > * allocator with @gfp_mask flags. Map them into contiguous > * kernel virtual space, using a pagetable protection of @prot. > + * > + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT > + * and __GFP_NOFAIL are not supported We could probably also mention that __GFP_ZERO in @gfp_mask is supported, though. > + * Any use of gfp flags outside of GFP_KERNEL should be consulted > + * with mm people. Just a question: should that read 'GFP_KERNEL | __GFP_HIGHMEM' as that is what vmalloc() resp. vzalloc() and others pass as flags? > + * > */ Sounds good otherwise, thanks Michal! > static void *__vmalloc_node(unsigned long size, unsigned long align, > gfp_t gfp_mask, pgprot_t prot, ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 11:04 ` Daniel Borkmann 0 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 11:04 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 11:32 AM, Michal Hocko wrote: > On Thu 26-01-17 11:08:02, Michal Hocko wrote: >> On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: >>> On 01/26/2017 08:43 AM, Michal Hocko wrote: >>>> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: >> [...] >>>>> I assume that kvzalloc() is still the same from [1], right? If so, then >>>>> it would unfortunately (partially) reintroduce the issue that was fixed. >>>>> If you look above at flags, they're also passed to __vmalloc() to not >>>>> trigger OOM in these situations I've experienced. >>>> >>>> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might >>>> think it would. It can still trigger the OOM killer becauset the flags >>>> are no propagated all the way down to all allocations requests (e.g. >>>> page tables). This is the same reason why GFP_NOFS is not supported in >>>> vmalloc. >>> >>> Ok, good to know, is that somewhere clearly documented (like for the >>> case with kmalloc())? >> >> I am afraid that we really suck on this front. I will add something. > > So I have folded the following to the patch 1. It is in line with > kvmalloc and hopefully at least tell more than the current code. > --- > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d89034a393f2..6c1aa2c68887 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > * Allocate enough pages to cover @size from the page level > * allocator with @gfp_mask flags. Map them into contiguous > * kernel virtual space, using a pagetable protection of @prot. > + * > + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT > + * and __GFP_NOFAIL are not supported We could probably also mention that __GFP_ZERO in @gfp_mask is supported, though. > + * Any use of gfp flags outside of GFP_KERNEL should be consulted > + * with mm people. Just a question: should that read 'GFP_KERNEL | __GFP_HIGHMEM' as that is what vmalloc() resp. vzalloc() and others pass as flags? > + * > */ Sounds good otherwise, thanks Michal! > static void *__vmalloc_node(unsigned long size, unsigned long align, > gfp_t gfp_mask, pgprot_t prot, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 11:04 ` Daniel Borkmann @ 2017-01-26 11:49 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 11:49 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 12:04:13, Daniel Borkmann wrote: > On 01/26/2017 11:32 AM, Michal Hocko wrote: > > On Thu 26-01-17 11:08:02, Michal Hocko wrote: > > > On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: > > > > On 01/26/2017 08:43 AM, Michal Hocko wrote: > > > > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: > > > [...] > > > > > > I assume that kvzalloc() is still the same from [1], right? If so, then > > > > > > it would unfortunately (partially) reintroduce the issue that was fixed. > > > > > > If you look above at flags, they're also passed to __vmalloc() to not > > > > > > trigger OOM in these situations I've experienced. > > > > > > > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > > > > > think it would. It can still trigger the OOM killer becauset the flags > > > > > are no propagated all the way down to all allocations requests (e.g. > > > > > page tables). This is the same reason why GFP_NOFS is not supported in > > > > > vmalloc. > > > > > > > > Ok, good to know, is that somewhere clearly documented (like for the > > > > case with kmalloc())? > > > > > > I am afraid that we really suck on this front. I will add something. > > > > So I have folded the following to the patch 1. It is in line with > > kvmalloc and hopefully at least tell more than the current code. > > --- > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index d89034a393f2..6c1aa2c68887 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > > * Allocate enough pages to cover @size from the page level > > * allocator with @gfp_mask flags. Map them into contiguous > > * kernel virtual space, using a pagetable protection of @prot. > > + * > > + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT > > + * and __GFP_NOFAIL are not supported > > We could probably also mention that __GFP_ZERO in @gfp_mask is > supported, though. There are others which would be supported so I would rather stay with explicit unsupported. > > > + * Any use of gfp flags outside of GFP_KERNEL should be consulted > > + * with mm people. > > Just a question: should that read 'GFP_KERNEL | __GFP_HIGHMEM' as > that is what vmalloc() resp. vzalloc() and others pass as flags? yes, even though I think that specifying __GFP_HIGHMEM shouldn't be really necessary. Are there any users who would really insist on vmalloc pages in lowmem? Anyway this made me recheck kvmalloc_node implementation and I am not adding this flags which would mean a regression from the current state. Will fix it up. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 11:49 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 11:49 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 12:04:13, Daniel Borkmann wrote: > On 01/26/2017 11:32 AM, Michal Hocko wrote: > > On Thu 26-01-17 11:08:02, Michal Hocko wrote: > > > On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: > > > > On 01/26/2017 08:43 AM, Michal Hocko wrote: > > > > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: > > > [...] > > > > > > I assume that kvzalloc() is still the same from [1], right? If so, then > > > > > > it would unfortunately (partially) reintroduce the issue that was fixed. > > > > > > If you look above at flags, they're also passed to __vmalloc() to not > > > > > > trigger OOM in these situations I've experienced. > > > > > > > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might > > > > > think it would. It can still trigger the OOM killer becauset the flags > > > > > are no propagated all the way down to all allocations requests (e.g. > > > > > page tables). This is the same reason why GFP_NOFS is not supported in > > > > > vmalloc. > > > > > > > > Ok, good to know, is that somewhere clearly documented (like for the > > > > case with kmalloc())? > > > > > > I am afraid that we really suck on this front. I will add something. > > > > So I have folded the following to the patch 1. It is in line with > > kvmalloc and hopefully at least tell more than the current code. > > --- > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index d89034a393f2..6c1aa2c68887 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > > * Allocate enough pages to cover @size from the page level > > * allocator with @gfp_mask flags. Map them into contiguous > > * kernel virtual space, using a pagetable protection of @prot. > > + * > > + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT > > + * and __GFP_NOFAIL are not supported > > We could probably also mention that __GFP_ZERO in @gfp_mask is > supported, though. There are others which would be supported so I would rather stay with explicit unsupported. > > > + * Any use of gfp flags outside of GFP_KERNEL should be consulted > > + * with mm people. > > Just a question: should that read 'GFP_KERNEL | __GFP_HIGHMEM' as > that is what vmalloc() resp. vzalloc() and others pass as flags? yes, even though I think that specifying __GFP_HIGHMEM shouldn't be really necessary. Are there any users who would really insist on vmalloc pages in lowmem? Anyway this made me recheck kvmalloc_node implementation and I am not adding this flags which would mean a regression from the current state. Will fix it up. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 10:32 ` Michal Hocko @ 2017-01-26 12:14 ` Joe Perches -1 siblings, 0 replies; 49+ messages in thread From: Joe Perches @ 2017-01-26 12:14 UTC (permalink / raw) To: Michal Hocko, Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu, 2017-01-26 at 11:32 +0100, Michal Hocko wrote: > So I have folded the following to the patch 1. It is in line with > kvmalloc and hopefully at least tell more than the current code. [] > diff --git a/mm/vmalloc.c b/mm/vmalloc.c [] > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > * Allocate enough pages to cover @size from the page level > * allocator with @gfp_mask flags. Map them into contiguous > * kernel virtual space, using a pagetable protection of @prot. > + * > + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT > + * and __GFP_NOFAIL are not supported Maybe add a BUILD_BUG or a WARN_ON_ONCE to catch new occurrences? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 12:14 ` Joe Perches 0 siblings, 0 replies; 49+ messages in thread From: Joe Perches @ 2017-01-26 12:14 UTC (permalink / raw) To: Michal Hocko, Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu, 2017-01-26 at 11:32 +0100, Michal Hocko wrote: > So I have folded the following to the patch 1. It is in line with > kvmalloc and hopefully at least tell more than the current code. [] > diff --git a/mm/vmalloc.c b/mm/vmalloc.c [] > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > * Allocate enough pages to cover @size from the page level > * allocator with @gfp_mask flags. Map them into contiguous > * kernel virtual space, using a pagetable protection of @prot. > + * > + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT > + * and __GFP_NOFAIL are not supported Maybe add a BUILD_BUG or a WARN_ON_ONCE to catch new occurrences? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 12:14 ` Joe Perches @ 2017-01-26 12:27 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 12:27 UTC (permalink / raw) To: Joe Perches Cc: Daniel Borkmann, Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 04:14:37, Joe Perches wrote: > On Thu, 2017-01-26 at 11:32 +0100, Michal Hocko wrote: > > So I have folded the following to the patch 1. It is in line with > > kvmalloc and hopefully at least tell more than the current code. > [] > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > [] > > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > > * Allocate enough pages to cover @size from the page level > > * allocator with @gfp_mask flags. Map them into contiguous > > * kernel virtual space, using a pagetable protection of @prot. > > + * > > + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT > > + * and __GFP_NOFAIL are not supported > > Maybe add a BUILD_BUG or a WARN_ON_ONCE to catch new occurrences? I would really like to not touch vmalloc in this series. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 12:27 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 12:27 UTC (permalink / raw) To: Joe Perches Cc: Daniel Borkmann, Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 04:14:37, Joe Perches wrote: > On Thu, 2017-01-26 at 11:32 +0100, Michal Hocko wrote: > > So I have folded the following to the patch 1. It is in line with > > kvmalloc and hopefully at least tell more than the current code. > [] > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > [] > > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, > > * Allocate enough pages to cover @size from the page level > > * allocator with @gfp_mask flags. Map them into contiguous > > * kernel virtual space, using a pagetable protection of @prot. > > + * > > + * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT > > + * and __GFP_NOFAIL are not supported > > Maybe add a BUILD_BUG or a WARN_ON_ONCE to catch new occurrences? I would really like to not touch vmalloc in this series. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 10:08 ` Michal Hocko @ 2017-01-26 11:33 ` Daniel Borkmann -1 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 11:33 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 11:08 AM, Michal Hocko wrote: > On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: >> On 01/26/2017 08:43 AM, Michal Hocko wrote: >>> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: > [...] >>>> I assume that kvzalloc() is still the same from [1], right? If so, then >>>> it would unfortunately (partially) reintroduce the issue that was fixed. >>>> If you look above at flags, they're also passed to __vmalloc() to not >>>> trigger OOM in these situations I've experienced. >>> >>> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might >>> think it would. It can still trigger the OOM killer becauset the flags >>> are no propagated all the way down to all allocations requests (e.g. >>> page tables). This is the same reason why GFP_NOFS is not supported in >>> vmalloc. >> >> Ok, good to know, is that somewhere clearly documented (like for the >> case with kmalloc())? > > I am afraid that we really suck on this front. I will add something. Thanks for doing that, much appreciated! >> If not, could we do that for non-mm folks, or >> at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make >> it obvious to users that a given flag combination is not supported all >> the way down? > > I am not sure that triggering a warning that somebody has used > __GFP_NOWARN is very helpful ;). I also do not think that covering all the > supported flags is really feasible. Most of them will not have bad side > effects. I have added the warning because this API is new and I wanted > to catch new abusers. Old ones would have to die slowly. Okay, makes sense then. Just the kdoc comment from your other mail should help fine already. >>>> This is effectively the >>>> same requirement as in other networking areas f.e. that 5bad87348c70 >>>> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. >>>> In your comment in kvzalloc() you eventually say that some of the above >>>> modifiers are not supported. So there would be two options, i) just leave >>>> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle >>>> it later (along with similar code from 5bad87348c70), or ii) implement >>>> support for these modifiers as well to your original set. I guess it's not >>>> too urgent, so we could also proceed with i) if that is easier for you to >>>> proceed (I don't mind either way). >>> >>> Could you clarify why the oom killer in vmalloc matters actually? >> >> For both mentioned commits, (privileged) user space can potentially >> create large allocation requests, where we thus switch to vmalloc() >> flavor eventually and then OOM starts killing processes to try to >> satisfy the allocation request. This is bad, because we want the >> request to just fail instead as it's non-critical and f.e. not kill >> ssh connection et al. Failing is totally fine in this case, whereas >> triggering OOM is not. > > I see your intention but does it really make any real difference? > Consider you would back off right before you would have OOMed. Any > parallel request would just hit the OOM for you. You are (almost) never > doing an allocation in an isolation. > >> In my testing, __GFP_NORETRY did satisfy this >> just fine, but as you say it seems it's not enough. > > Yeah, ptes have been most probably popullated already. > >> Given there are >> multiple places like these in the kernel, could we instead add an >> option such as __GFP_NOOOM, or just make __GFP_NORETRY supported? > > As said above I do not really think that suppressing the OOM killer > makes any difference because it might be just somebody else doing that > for you. Also the OOM killer is the MM internal implementation "detail" > users shouldn't really care. I agree that callers should have a way to > say they do not want to try really hard and that is not that simple > for vmalloc unfortunatelly. The main problem here is that gfp mask > propagation is not that easy to fix without a lot of code churn as some > of those hardcoded allocation requests are deep in call chains. I see, that's unfortunate. I understand that there are requests in parallel and that we might end up with OOM eventually if we're unlucky, but having some way to tell vmalloc to just not try as hard as usual would be nice. > I know this sucks and it would be great to support __GFP_NORETRY to > [k]vmalloc and maybe we will get there eventually. But for the mean time > I really think that using kvmalloc wherever possible is much better than > open coded variants whith expectations which do not hold sometimes. I totally agree with you that having kvmalloc() as helper is awesome and probably long overdue as well. :) > If you disagree I can drop the bpf part of course... If we could consolidate these spots with kvmalloc() eventually, I'm all for it. But even if __GFP_NORETRY is not covered down to all possible paths, it kind of does have an effect already of saying 'don't try too hard', so would it be harmful to still keep that for now? If it's not, I'd personally prefer to just leave it as is until there's some form of support by kvmalloc() and friends. Thanks for your input, Michal! Cheers, Daniel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 11:33 ` Daniel Borkmann 0 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 11:33 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 11:08 AM, Michal Hocko wrote: > On Thu 26-01-17 10:36:49, Daniel Borkmann wrote: >> On 01/26/2017 08:43 AM, Michal Hocko wrote: >>> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote: > [...] >>>> I assume that kvzalloc() is still the same from [1], right? If so, then >>>> it would unfortunately (partially) reintroduce the issue that was fixed. >>>> If you look above at flags, they're also passed to __vmalloc() to not >>>> trigger OOM in these situations I've experienced. >>> >>> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might >>> think it would. It can still trigger the OOM killer becauset the flags >>> are no propagated all the way down to all allocations requests (e.g. >>> page tables). This is the same reason why GFP_NOFS is not supported in >>> vmalloc. >> >> Ok, good to know, is that somewhere clearly documented (like for the >> case with kmalloc())? > > I am afraid that we really suck on this front. I will add something. Thanks for doing that, much appreciated! >> If not, could we do that for non-mm folks, or >> at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make >> it obvious to users that a given flag combination is not supported all >> the way down? > > I am not sure that triggering a warning that somebody has used > __GFP_NOWARN is very helpful ;). I also do not think that covering all the > supported flags is really feasible. Most of them will not have bad side > effects. I have added the warning because this API is new and I wanted > to catch new abusers. Old ones would have to die slowly. Okay, makes sense then. Just the kdoc comment from your other mail should help fine already. >>>> This is effectively the >>>> same requirement as in other networking areas f.e. that 5bad87348c70 >>>> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has. >>>> In your comment in kvzalloc() you eventually say that some of the above >>>> modifiers are not supported. So there would be two options, i) just leave >>>> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle >>>> it later (along with similar code from 5bad87348c70), or ii) implement >>>> support for these modifiers as well to your original set. I guess it's not >>>> too urgent, so we could also proceed with i) if that is easier for you to >>>> proceed (I don't mind either way). >>> >>> Could you clarify why the oom killer in vmalloc matters actually? >> >> For both mentioned commits, (privileged) user space can potentially >> create large allocation requests, where we thus switch to vmalloc() >> flavor eventually and then OOM starts killing processes to try to >> satisfy the allocation request. This is bad, because we want the >> request to just fail instead as it's non-critical and f.e. not kill >> ssh connection et al. Failing is totally fine in this case, whereas >> triggering OOM is not. > > I see your intention but does it really make any real difference? > Consider you would back off right before you would have OOMed. Any > parallel request would just hit the OOM for you. You are (almost) never > doing an allocation in an isolation. > >> In my testing, __GFP_NORETRY did satisfy this >> just fine, but as you say it seems it's not enough. > > Yeah, ptes have been most probably popullated already. > >> Given there are >> multiple places like these in the kernel, could we instead add an >> option such as __GFP_NOOOM, or just make __GFP_NORETRY supported? > > As said above I do not really think that suppressing the OOM killer > makes any difference because it might be just somebody else doing that > for you. Also the OOM killer is the MM internal implementation "detail" > users shouldn't really care. I agree that callers should have a way to > say they do not want to try really hard and that is not that simple > for vmalloc unfortunatelly. The main problem here is that gfp mask > propagation is not that easy to fix without a lot of code churn as some > of those hardcoded allocation requests are deep in call chains. I see, that's unfortunate. I understand that there are requests in parallel and that we might end up with OOM eventually if we're unlucky, but having some way to tell vmalloc to just not try as hard as usual would be nice. > I know this sucks and it would be great to support __GFP_NORETRY to > [k]vmalloc and maybe we will get there eventually. But for the mean time > I really think that using kvmalloc wherever possible is much better than > open coded variants whith expectations which do not hold sometimes. I totally agree with you that having kvmalloc() as helper is awesome and probably long overdue as well. :) > If you disagree I can drop the bpf part of course... If we could consolidate these spots with kvmalloc() eventually, I'm all for it. But even if __GFP_NORETRY is not covered down to all possible paths, it kind of does have an effect already of saying 'don't try too hard', so would it be harmful to still keep that for now? If it's not, I'd personally prefer to just leave it as is until there's some form of support by kvmalloc() and friends. Thanks for your input, Michal! Cheers, Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 11:33 ` Daniel Borkmann @ 2017-01-26 11:58 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 11:58 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: > On 01/26/2017 11:08 AM, Michal Hocko wrote: [...] > > If you disagree I can drop the bpf part of course... > > If we could consolidate these spots with kvmalloc() eventually, I'm > all for it. But even if __GFP_NORETRY is not covered down to all > possible paths, it kind of does have an effect already of saying > 'don't try too hard', so would it be harmful to still keep that for > now? If it's not, I'd personally prefer to just leave it as is until > there's some form of support by kvmalloc() and friends. Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not disallowed. It is not _supported_ which means that if it doesn't work as you expect you are on your own. Which is actually the situation right now as well. But I still think that this is just not right thing to do. Even though it might happen to work in some cases it gives a false impression of a solution. So I would rather go with diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 8697f43cf93c..a6dc4d596f14 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) void *bpf_map_area_alloc(size_t size) { + /* + * FIXME: we would really like to not trigger the OOM killer and rather + * fail instead. This is not supported right now. Please nag MM people + * if these OOM start bothering people. + */ return kvzalloc(size, GFP_USER); } -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 11:58 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 11:58 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: > On 01/26/2017 11:08 AM, Michal Hocko wrote: [...] > > If you disagree I can drop the bpf part of course... > > If we could consolidate these spots with kvmalloc() eventually, I'm > all for it. But even if __GFP_NORETRY is not covered down to all > possible paths, it kind of does have an effect already of saying > 'don't try too hard', so would it be harmful to still keep that for > now? If it's not, I'd personally prefer to just leave it as is until > there's some form of support by kvmalloc() and friends. Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not disallowed. It is not _supported_ which means that if it doesn't work as you expect you are on your own. Which is actually the situation right now as well. But I still think that this is just not right thing to do. Even though it might happen to work in some cases it gives a false impression of a solution. So I would rather go with diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 8697f43cf93c..a6dc4d596f14 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) void *bpf_map_area_alloc(size_t size) { + /* + * FIXME: we would really like to not trigger the OOM killer and rather + * fail instead. This is not supported right now. Please nag MM people + * if these OOM start bothering people. + */ return kvzalloc(size, GFP_USER); } -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 11:58 ` Michal Hocko @ 2017-01-26 13:10 ` Daniel Borkmann -1 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 13:10 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 12:58 PM, Michal Hocko wrote: > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: >> On 01/26/2017 11:08 AM, Michal Hocko wrote: > [...] >>> If you disagree I can drop the bpf part of course... >> >> If we could consolidate these spots with kvmalloc() eventually, I'm >> all for it. But even if __GFP_NORETRY is not covered down to all >> possible paths, it kind of does have an effect already of saying >> 'don't try too hard', so would it be harmful to still keep that for >> now? If it's not, I'd personally prefer to just leave it as is until >> there's some form of support by kvmalloc() and friends. > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not > disallowed. It is not _supported_ which means that if it doesn't work as > you expect you are on your own. Which is actually the situation right > now as well. But I still think that this is just not right thing to do. > Even though it might happen to work in some cases it gives a false > impression of a solution. So I would rather go with Hmm. 'On my own' means, we could potentially BUG somewhere down the vmalloc implementation, etc, presumably? So it might in-fact be harmful to pass that, right? > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 8697f43cf93c..a6dc4d596f14 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) > > void *bpf_map_area_alloc(size_t size) > { > + /* > + * FIXME: we would really like to not trigger the OOM killer and rather > + * fail instead. This is not supported right now. Please nag MM people > + * if these OOM start bothering people. > + */ Ok, I know this is out of scope for this series, but since i) this is _not_ the _only_ spot right now which has such a construct and ii) I am already kind of nagging a bit ;), my question would be, what would it take to start supporting it? > return kvzalloc(size, GFP_USER); > } Thanks, Daniel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 13:10 ` Daniel Borkmann 0 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 13:10 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 12:58 PM, Michal Hocko wrote: > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: >> On 01/26/2017 11:08 AM, Michal Hocko wrote: > [...] >>> If you disagree I can drop the bpf part of course... >> >> If we could consolidate these spots with kvmalloc() eventually, I'm >> all for it. But even if __GFP_NORETRY is not covered down to all >> possible paths, it kind of does have an effect already of saying >> 'don't try too hard', so would it be harmful to still keep that for >> now? If it's not, I'd personally prefer to just leave it as is until >> there's some form of support by kvmalloc() and friends. > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not > disallowed. It is not _supported_ which means that if it doesn't work as > you expect you are on your own. Which is actually the situation right > now as well. But I still think that this is just not right thing to do. > Even though it might happen to work in some cases it gives a false > impression of a solution. So I would rather go with Hmm. 'On my own' means, we could potentially BUG somewhere down the vmalloc implementation, etc, presumably? So it might in-fact be harmful to pass that, right? > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 8697f43cf93c..a6dc4d596f14 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) > > void *bpf_map_area_alloc(size_t size) > { > + /* > + * FIXME: we would really like to not trigger the OOM killer and rather > + * fail instead. This is not supported right now. Please nag MM people > + * if these OOM start bothering people. > + */ Ok, I know this is out of scope for this series, but since i) this is _not_ the _only_ spot right now which has such a construct and ii) I am already kind of nagging a bit ;), my question would be, what would it take to start supporting it? > return kvzalloc(size, GFP_USER); > } Thanks, Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 13:10 ` Daniel Borkmann @ 2017-01-26 13:40 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 13:40 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 14:10:06, Daniel Borkmann wrote: > On 01/26/2017 12:58 PM, Michal Hocko wrote: > > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: > > > On 01/26/2017 11:08 AM, Michal Hocko wrote: > > [...] > > > > If you disagree I can drop the bpf part of course... > > > > > > If we could consolidate these spots with kvmalloc() eventually, I'm > > > all for it. But even if __GFP_NORETRY is not covered down to all > > > possible paths, it kind of does have an effect already of saying > > > 'don't try too hard', so would it be harmful to still keep that for > > > now? If it's not, I'd personally prefer to just leave it as is until > > > there's some form of support by kvmalloc() and friends. > > > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not > > disallowed. It is not _supported_ which means that if it doesn't work as > > you expect you are on your own. Which is actually the situation right > > now as well. But I still think that this is just not right thing to do. > > Even though it might happen to work in some cases it gives a false > > impression of a solution. So I would rather go with > > Hmm. 'On my own' means, we could potentially BUG somewhere down the > vmalloc implementation, etc, presumably? So it might in-fact be > harmful to pass that, right? No it would mean that it might eventually hit the behavior which you are trying to avoid - in other words it may invoke OOM killer even though __GFP_NORETRY means giving up before any system wide disruptive actions a re taken. > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index 8697f43cf93c..a6dc4d596f14 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) > > > > void *bpf_map_area_alloc(size_t size) > > { > > + /* > > + * FIXME: we would really like to not trigger the OOM killer and rather > > + * fail instead. This is not supported right now. Please nag MM people > > + * if these OOM start bothering people. > > + */ > > Ok, I know this is out of scope for this series, but since i) this > is _not_ the _only_ spot right now which has such a construct and ii) > I am already kind of nagging a bit ;), my question would be, what > would it take to start supporting it? propagate gfp mask all the way down from vmalloc to all places which might allocate down the path and especially page table allocation function are PITA because they are really deep. This is a lot of work... But realistically, how big is this problem really? Is it really worth it? You said this is an admin only interface and admin can kill the machine by OOM and other means already. Moreover and I should probably mention it explicitly, your d407bd25a204b reduced the likelyhood of oom for other reason. kmalloc used GPF_USER previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this could indeed hit the OOM e.g. due to memory fragmentation. It would be much harder to hit the OOM killer from vmalloc which doesn't issue higher order allocation requests. Or have you ever seen the OOM killer pointing to the vmalloc fallback path? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 13:40 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 13:40 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 14:10:06, Daniel Borkmann wrote: > On 01/26/2017 12:58 PM, Michal Hocko wrote: > > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: > > > On 01/26/2017 11:08 AM, Michal Hocko wrote: > > [...] > > > > If you disagree I can drop the bpf part of course... > > > > > > If we could consolidate these spots with kvmalloc() eventually, I'm > > > all for it. But even if __GFP_NORETRY is not covered down to all > > > possible paths, it kind of does have an effect already of saying > > > 'don't try too hard', so would it be harmful to still keep that for > > > now? If it's not, I'd personally prefer to just leave it as is until > > > there's some form of support by kvmalloc() and friends. > > > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not > > disallowed. It is not _supported_ which means that if it doesn't work as > > you expect you are on your own. Which is actually the situation right > > now as well. But I still think that this is just not right thing to do. > > Even though it might happen to work in some cases it gives a false > > impression of a solution. So I would rather go with > > Hmm. 'On my own' means, we could potentially BUG somewhere down the > vmalloc implementation, etc, presumably? So it might in-fact be > harmful to pass that, right? No it would mean that it might eventually hit the behavior which you are trying to avoid - in other words it may invoke OOM killer even though __GFP_NORETRY means giving up before any system wide disruptive actions a re taken. > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index 8697f43cf93c..a6dc4d596f14 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) > > > > void *bpf_map_area_alloc(size_t size) > > { > > + /* > > + * FIXME: we would really like to not trigger the OOM killer and rather > > + * fail instead. This is not supported right now. Please nag MM people > > + * if these OOM start bothering people. > > + */ > > Ok, I know this is out of scope for this series, but since i) this > is _not_ the _only_ spot right now which has such a construct and ii) > I am already kind of nagging a bit ;), my question would be, what > would it take to start supporting it? propagate gfp mask all the way down from vmalloc to all places which might allocate down the path and especially page table allocation function are PITA because they are really deep. This is a lot of work... But realistically, how big is this problem really? Is it really worth it? You said this is an admin only interface and admin can kill the machine by OOM and other means already. Moreover and I should probably mention it explicitly, your d407bd25a204b reduced the likelyhood of oom for other reason. kmalloc used GPF_USER previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this could indeed hit the OOM e.g. due to memory fragmentation. It would be much harder to hit the OOM killer from vmalloc which doesn't issue higher order allocation requests. Or have you ever seen the OOM killer pointing to the vmalloc fallback path? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 13:40 ` Michal Hocko (?) @ 2017-01-26 14:13 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 14:13 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 14:40:04, Michal Hocko wrote: > On Thu 26-01-17 14:10:06, Daniel Borkmann wrote: > > On 01/26/2017 12:58 PM, Michal Hocko wrote: > > > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: > > > > On 01/26/2017 11:08 AM, Michal Hocko wrote: > > > [...] > > > > > If you disagree I can drop the bpf part of course... > > > > > > > > If we could consolidate these spots with kvmalloc() eventually, I'm > > > > all for it. But even if __GFP_NORETRY is not covered down to all > > > > possible paths, it kind of does have an effect already of saying > > > > 'don't try too hard', so would it be harmful to still keep that for > > > > now? If it's not, I'd personally prefer to just leave it as is until > > > > there's some form of support by kvmalloc() and friends. > > > > > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not > > > disallowed. It is not _supported_ which means that if it doesn't work as > > > you expect you are on your own. Which is actually the situation right > > > now as well. But I still think that this is just not right thing to do. > > > Even though it might happen to work in some cases it gives a false > > > impression of a solution. So I would rather go with > > > > Hmm. 'On my own' means, we could potentially BUG somewhere down the > > vmalloc implementation, etc, presumably? So it might in-fact be > > harmful to pass that, right? > > No it would mean that it might eventually hit the behavior which you are > trying to avoid - in other words it may invoke OOM killer even though > __GFP_NORETRY means giving up before any system wide disruptive actions > a re taken. I will separate both bpf and netfilter hunks into its own patch with the clarification. Does the following look better? --- >From ab6b2d724228e4abcc69c44f5ab1ce91009aa91d Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.com> Date: Thu, 26 Jan 2017 14:59:21 +0100 Subject: [PATCH] net, bpf: use kvzalloc helper both bpf_map_area_alloc and xt_alloc_table_info try really hard to play nicely with large memory requests which can be triggered from the userspace (by an admin). See 5bad87348c70 ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") resp. d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc"). The current allocation pattern strongly resembles kvmalloc helper except for one thing __GFP_NORETRY is not used for the vmalloc fallback. The main reason why kvmalloc doesn't really support __GFP_NORETRY is because vmalloc doesn't support this flag properly and it is far from straightforward to make it understand it because there are some hard coded GFP_KERNEL allocation deep in the call chains. This patch simply replaces the open coded variants with kvmalloc and puts a note to push on MM people to support __GFP_NORETRY in kvmalloc it this turns out to be really needed along with OOM report pointing at vmalloc. If there is an immediate need and no full support yet then kvmalloc(size, gfp | __GFP_NORETRY) will work as good as __vmalloc(gfp | __GFP_NORETRY) - in other words it might trigger the OOM in some cases. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Michal Hocko <mhocko@suse.com> --- kernel/bpf/syscall.c | 19 +++++-------------- net/netfilter/x_tables.c | 16 ++++++---------- 2 files changed, 11 insertions(+), 24 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 19b6129eab23..a6dc4d596f14 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -53,21 +53,12 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) void *bpf_map_area_alloc(size_t size) { - /* We definitely need __GFP_NORETRY, so OOM killer doesn't - * trigger under memory pressure as we really just want to - * fail instead. + /* + * FIXME: we would really like to not trigger the OOM killer and rather + * fail instead. This is not supported right now. Please nag MM people + * if these OOM start bothering people. */ - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; - void *area; - - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { - area = kmalloc(size, GFP_USER | flags); - if (area != NULL) - return area; - } - - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, - PAGE_KERNEL); + return kvzalloc(size, GFP_USER); } void bpf_map_area_free(void *area) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index d529989f5791..ba8ba633da72 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -995,16 +995,12 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) return NULL; - if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) - info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY); - if (!info) { - info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN | - __GFP_NORETRY | __GFP_HIGHMEM, - PAGE_KERNEL); - if (!info) - return NULL; - } - memset(info, 0, sizeof(*info)); + /* + * FIXME: we would really like to not trigger the OOM killer and rather + * fail instead. This is not supported right now. Please nag MM people + * if these OOM start bothering people. + */ + info = kvzalloc(sz, GFP_KERNEL); info->size = size; return info; } -- 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 14:13 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 14:13 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 14:40:04, Michal Hocko wrote: > On Thu 26-01-17 14:10:06, Daniel Borkmann wrote: > > On 01/26/2017 12:58 PM, Michal Hocko wrote: > > > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: > > > > On 01/26/2017 11:08 AM, Michal Hocko wrote: > > > [...] > > > > > If you disagree I can drop the bpf part of course... > > > > > > > > If we could consolidate these spots with kvmalloc() eventually, I'm > > > > all for it. But even if __GFP_NORETRY is not covered down to all > > > > possible paths, it kind of does have an effect already of saying > > > > 'don't try too hard', so would it be harmful to still keep that for > > > > now? If it's not, I'd personally prefer to just leave it as is until > > > > there's some form of support by kvmalloc() and friends. > > > > > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not > > > disallowed. It is not _supported_ which means that if it doesn't work as > > > you expect you are on your own. Which is actually the situation right > > > now as well. But I still think that this is just not right thing to do. > > > Even though it might happen to work in some cases it gives a false > > > impression of a solution. So I would rather go with > > > > Hmm. 'On my own' means, we could potentially BUG somewhere down the > > vmalloc implementation, etc, presumably? So it might in-fact be > > harmful to pass that, right? > > No it would mean that it might eventually hit the behavior which you are > trying to avoid - in other words it may invoke OOM killer even though > __GFP_NORETRY means giving up before any system wide disruptive actions > a re taken. I will separate both bpf and netfilter hunks into its own patch with the clarification. Does the following look better? --- ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 14:13 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-26 14:13 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 14:40:04, Michal Hocko wrote: > On Thu 26-01-17 14:10:06, Daniel Borkmann wrote: > > On 01/26/2017 12:58 PM, Michal Hocko wrote: > > > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: > > > > On 01/26/2017 11:08 AM, Michal Hocko wrote: > > > [...] > > > > > If you disagree I can drop the bpf part of course... > > > > > > > > If we could consolidate these spots with kvmalloc() eventually, I'm > > > > all for it. But even if __GFP_NORETRY is not covered down to all > > > > possible paths, it kind of does have an effect already of saying > > > > 'don't try too hard', so would it be harmful to still keep that for > > > > now? If it's not, I'd personally prefer to just leave it as is until > > > > there's some form of support by kvmalloc() and friends. > > > > > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not > > > disallowed. It is not _supported_ which means that if it doesn't work as > > > you expect you are on your own. Which is actually the situation right > > > now as well. But I still think that this is just not right thing to do. > > > Even though it might happen to work in some cases it gives a false > > > impression of a solution. So I would rather go with > > > > Hmm. 'On my own' means, we could potentially BUG somewhere down the > > vmalloc implementation, etc, presumably? So it might in-fact be > > harmful to pass that, right? > > No it would mean that it might eventually hit the behavior which you are > trying to avoid - in other words it may invoke OOM killer even though > __GFP_NORETRY means giving up before any system wide disruptive actions > a re taken. I will separate both bpf and netfilter hunks into its own patch with the clarification. Does the following look better? --- >From ab6b2d724228e4abcc69c44f5ab1ce91009aa91d Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.com> Date: Thu, 26 Jan 2017 14:59:21 +0100 Subject: [PATCH] net, bpf: use kvzalloc helper both bpf_map_area_alloc and xt_alloc_table_info try really hard to play nicely with large memory requests which can be triggered from the userspace (by an admin). See 5bad87348c70 ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") resp. d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc"). The current allocation pattern strongly resembles kvmalloc helper except for one thing __GFP_NORETRY is not used for the vmalloc fallback. The main reason why kvmalloc doesn't really support __GFP_NORETRY is because vmalloc doesn't support this flag properly and it is far from straightforward to make it understand it because there are some hard coded GFP_KERNEL allocation deep in the call chains. This patch simply replaces the open coded variants with kvmalloc and puts a note to push on MM people to support __GFP_NORETRY in kvmalloc it this turns out to be really needed along with OOM report pointing at vmalloc. If there is an immediate need and no full support yet then kvmalloc(size, gfp | __GFP_NORETRY) will work as good as __vmalloc(gfp | __GFP_NORETRY) - in other words it might trigger the OOM in some cases. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Michal Hocko <mhocko@suse.com> --- kernel/bpf/syscall.c | 19 +++++-------------- net/netfilter/x_tables.c | 16 ++++++---------- 2 files changed, 11 insertions(+), 24 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 19b6129eab23..a6dc4d596f14 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -53,21 +53,12 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) void *bpf_map_area_alloc(size_t size) { - /* We definitely need __GFP_NORETRY, so OOM killer doesn't - * trigger under memory pressure as we really just want to - * fail instead. + /* + * FIXME: we would really like to not trigger the OOM killer and rather + * fail instead. This is not supported right now. Please nag MM people + * if these OOM start bothering people. */ - const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO; - void *area; - - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { - area = kmalloc(size, GFP_USER | flags); - if (area != NULL) - return area; - } - - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags, - PAGE_KERNEL); + return kvzalloc(size, GFP_USER); } void bpf_map_area_free(void *area) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index d529989f5791..ba8ba633da72 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -995,16 +995,12 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) return NULL; - if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) - info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY); - if (!info) { - info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN | - __GFP_NORETRY | __GFP_HIGHMEM, - PAGE_KERNEL); - if (!info) - return NULL; - } - memset(info, 0, sizeof(*info)); + /* + * FIXME: we would really like to not trigger the OOM killer and rather + * fail instead. This is not supported right now. Please nag MM people + * if these OOM start bothering people. + */ + info = kvzalloc(sz, GFP_KERNEL); info->size = size; return info; } -- 2.11.0 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] net, bpf: use kvzalloc helper 2017-01-26 14:13 ` Michal Hocko (?) (?) @ 2017-01-26 14:37 ` kbuild test robot -1 siblings, 0 replies; 49+ messages in thread From: kbuild test robot @ 2017-01-26 14:37 UTC (permalink / raw) To: Michal Hocko Cc: kbuild-all, Daniel Borkmann, Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner [-- Attachment #1: Type: text/plain, Size: 1612 bytes --] Hi Michal, [auto build test ERROR on next-20170125] [cannot apply to linus/master linux/master nf-next/master v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.10-rc5] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michal-Hocko/net-bpf-use-kvzalloc-helper/20170126-221904 config: x86_64-randconfig-x017-201704 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All error/warnings (new ones prefixed by >>): net/netfilter/x_tables.c: In function 'xt_alloc_table_info': >> net/netfilter/x_tables.c:1012:9: error: implicit declaration of function 'kvzalloc' [-Werror=implicit-function-declaration] info = kvzalloc(sz, GFP_KERNEL); ^~~~~~~~ >> net/netfilter/x_tables.c:1012:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion] info = kvzalloc(sz, GFP_KERNEL); ^ cc1: some warnings being treated as errors vim +/kvzalloc +1012 net/netfilter/x_tables.c 1006 1007 /* 1008 * FIXME: we would really like to not trigger the OOM killer and rather 1009 * fail instead. This is not supported right now. Please nag MM people 1010 * if these OOM start bothering people. 1011 */ > 1012 info = kvzalloc(sz, GFP_KERNEL); 1013 info->size = size; 1014 return info; 1015 } --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 34703 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] net, bpf: use kvzalloc helper 2017-01-26 14:13 ` Michal Hocko ` (2 preceding siblings ...) (?) @ 2017-01-26 14:58 ` kbuild test robot -1 siblings, 0 replies; 49+ messages in thread From: kbuild test robot @ 2017-01-26 14:58 UTC (permalink / raw) To: Michal Hocko Cc: kbuild-all, Daniel Borkmann, Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner [-- Attachment #1: Type: text/plain, Size: 1613 bytes --] Hi Michal, [auto build test ERROR on next-20170125] [cannot apply to linus/master linux/master nf-next/master v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.10-rc5] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michal-Hocko/net-bpf-use-kvzalloc-helper/20170126-221904 config: x86_64-randconfig-x014-201704 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All error/warnings (new ones prefixed by >>): kernel/bpf/syscall.c: In function 'bpf_map_area_alloc': >> kernel/bpf/syscall.c:61:9: error: implicit declaration of function 'kvzalloc' [-Werror=implicit-function-declaration] return kvzalloc(size, GFP_USER); ^~~~~~~~ >> kernel/bpf/syscall.c:61:9: warning: return makes pointer from integer without a cast [-Wint-conversion] return kvzalloc(size, GFP_USER); ^~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/kvzalloc +61 kernel/bpf/syscall.c 55 { 56 /* 57 * FIXME: we would really like to not trigger the OOM killer and rather 58 * fail instead. This is not supported right now. Please nag MM people 59 * if these OOM start bothering people. 60 */ > 61 return kvzalloc(size, GFP_USER); 62 } 63 64 void bpf_map_area_free(void *area) --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 24846 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 13:40 ` Michal Hocko @ 2017-01-26 20:34 ` Daniel Borkmann -1 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 20:34 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 02:40 PM, Michal Hocko wrote: > On Thu 26-01-17 14:10:06, Daniel Borkmann wrote: >> On 01/26/2017 12:58 PM, Michal Hocko wrote: >>> On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: >>>> On 01/26/2017 11:08 AM, Michal Hocko wrote: >>> [...] >>>>> If you disagree I can drop the bpf part of course... >>>> >>>> If we could consolidate these spots with kvmalloc() eventually, I'm >>>> all for it. But even if __GFP_NORETRY is not covered down to all >>>> possible paths, it kind of does have an effect already of saying >>>> 'don't try too hard', so would it be harmful to still keep that for >>>> now? If it's not, I'd personally prefer to just leave it as is until >>>> there's some form of support by kvmalloc() and friends. >>> >>> Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not >>> disallowed. It is not _supported_ which means that if it doesn't work as >>> you expect you are on your own. Which is actually the situation right >>> now as well. But I still think that this is just not right thing to do. >>> Even though it might happen to work in some cases it gives a false >>> impression of a solution. So I would rather go with >> >> Hmm. 'On my own' means, we could potentially BUG somewhere down the >> vmalloc implementation, etc, presumably? So it might in-fact be >> harmful to pass that, right? > > No it would mean that it might eventually hit the behavior which you are > trying to avoid - in other words it may invoke OOM killer even though > __GFP_NORETRY means giving up before any system wide disruptive actions > a re taken. Ok, thanks for clarifying, more on that further below. >>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >>> index 8697f43cf93c..a6dc4d596f14 100644 >>> --- a/kernel/bpf/syscall.c >>> +++ b/kernel/bpf/syscall.c >>> @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) >>> >>> void *bpf_map_area_alloc(size_t size) >>> { >>> + /* >>> + * FIXME: we would really like to not trigger the OOM killer and rather >>> + * fail instead. This is not supported right now. Please nag MM people >>> + * if these OOM start bothering people. >>> + */ >> >> Ok, I know this is out of scope for this series, but since i) this >> is _not_ the _only_ spot right now which has such a construct and ii) >> I am already kind of nagging a bit ;), my question would be, what >> would it take to start supporting it? > > propagate gfp mask all the way down from vmalloc to all places which > might allocate down the path and especially page table allocation > function are PITA because they are really deep. This is a lot of work... > > But realistically, how big is this problem really? Is it really worth > it? You said this is an admin only interface and admin can kill the > machine by OOM and other means already. > > Moreover and I should probably mention it explicitly, your d407bd25a204b > reduced the likelyhood of oom for other reason. kmalloc used GPF_USER > previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this > could indeed hit the OOM e.g. due to memory fragmentation. It would be > much harder to hit the OOM killer from vmalloc which doesn't issue > higher order allocation requests. Or have you ever seen the OOM killer > pointing to the vmalloc fallback path? The case I was concerned about was from vmalloc() path, not kmalloc(). That was where the stack trace indicating OOM pointed to. As an example, there could be really large allocation requests for maps where the map has pre-allocated memory for its elements. Thus, if we get to the point where we need to kill others due to shortage of mem for satisfying this, I'd much much rather prefer to just not let vmalloc() work really hard and fail early on instead. In my (crafted) test case, I was connected via ssh and it each time reliably killed my connection, which is really suboptimal. F.e., I could also imagine a buggy or miscalculated map definition for a prog that is provisioned to multiple places, which then accidentally triggers this. Or if large on purpose, but we crossed the line, it could be handled more gracefully, f.e. I could imagine an option to falling back to a non-pre-allocated map flavor from the application loading the program. Trade-off for sure, but still allowing it to operate up to a certain extend. Granted, if vmalloc() succeeded without trying hard and we then OOM elsewhere, too bad, but we don't have much control over that one anyway, only about our own request. Reason I asked above was whether having __GFP_NORETRY in would be fatal somewhere down the path, but seems not as you say. So to answer your second email with the bpf and netfilter hunks, why not replacing them with kvmalloc() and __GFP_NORETRY flag and add that big fat FIXME comment above there, saying explicitly that __GFP_NORETRY is not harmful though has only /partial/ effect right now and that full support needs to be implemented in future. That would still be better that not having it, imo, and the FIXME would make expectations clear to anyone reading that code. Thanks, Daniel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-26 20:34 ` Daniel Borkmann 0 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-26 20:34 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/26/2017 02:40 PM, Michal Hocko wrote: > On Thu 26-01-17 14:10:06, Daniel Borkmann wrote: >> On 01/26/2017 12:58 PM, Michal Hocko wrote: >>> On Thu 26-01-17 12:33:55, Daniel Borkmann wrote: >>>> On 01/26/2017 11:08 AM, Michal Hocko wrote: >>> [...] >>>>> If you disagree I can drop the bpf part of course... >>>> >>>> If we could consolidate these spots with kvmalloc() eventually, I'm >>>> all for it. But even if __GFP_NORETRY is not covered down to all >>>> possible paths, it kind of does have an effect already of saying >>>> 'don't try too hard', so would it be harmful to still keep that for >>>> now? If it's not, I'd personally prefer to just leave it as is until >>>> there's some form of support by kvmalloc() and friends. >>> >>> Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not >>> disallowed. It is not _supported_ which means that if it doesn't work as >>> you expect you are on your own. Which is actually the situation right >>> now as well. But I still think that this is just not right thing to do. >>> Even though it might happen to work in some cases it gives a false >>> impression of a solution. So I would rather go with >> >> Hmm. 'On my own' means, we could potentially BUG somewhere down the >> vmalloc implementation, etc, presumably? So it might in-fact be >> harmful to pass that, right? > > No it would mean that it might eventually hit the behavior which you are > trying to avoid - in other words it may invoke OOM killer even though > __GFP_NORETRY means giving up before any system wide disruptive actions > a re taken. Ok, thanks for clarifying, more on that further below. >>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >>> index 8697f43cf93c..a6dc4d596f14 100644 >>> --- a/kernel/bpf/syscall.c >>> +++ b/kernel/bpf/syscall.c >>> @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl) >>> >>> void *bpf_map_area_alloc(size_t size) >>> { >>> + /* >>> + * FIXME: we would really like to not trigger the OOM killer and rather >>> + * fail instead. This is not supported right now. Please nag MM people >>> + * if these OOM start bothering people. >>> + */ >> >> Ok, I know this is out of scope for this series, but since i) this >> is _not_ the _only_ spot right now which has such a construct and ii) >> I am already kind of nagging a bit ;), my question would be, what >> would it take to start supporting it? > > propagate gfp mask all the way down from vmalloc to all places which > might allocate down the path and especially page table allocation > function are PITA because they are really deep. This is a lot of work... > > But realistically, how big is this problem really? Is it really worth > it? You said this is an admin only interface and admin can kill the > machine by OOM and other means already. > > Moreover and I should probably mention it explicitly, your d407bd25a204b > reduced the likelyhood of oom for other reason. kmalloc used GPF_USER > previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this > could indeed hit the OOM e.g. due to memory fragmentation. It would be > much harder to hit the OOM killer from vmalloc which doesn't issue > higher order allocation requests. Or have you ever seen the OOM killer > pointing to the vmalloc fallback path? The case I was concerned about was from vmalloc() path, not kmalloc(). That was where the stack trace indicating OOM pointed to. As an example, there could be really large allocation requests for maps where the map has pre-allocated memory for its elements. Thus, if we get to the point where we need to kill others due to shortage of mem for satisfying this, I'd much much rather prefer to just not let vmalloc() work really hard and fail early on instead. In my (crafted) test case, I was connected via ssh and it each time reliably killed my connection, which is really suboptimal. F.e., I could also imagine a buggy or miscalculated map definition for a prog that is provisioned to multiple places, which then accidentally triggers this. Or if large on purpose, but we crossed the line, it could be handled more gracefully, f.e. I could imagine an option to falling back to a non-pre-allocated map flavor from the application loading the program. Trade-off for sure, but still allowing it to operate up to a certain extend. Granted, if vmalloc() succeeded without trying hard and we then OOM elsewhere, too bad, but we don't have much control over that one anyway, only about our own request. Reason I asked above was whether having __GFP_NORETRY in would be fatal somewhere down the path, but seems not as you say. So to answer your second email with the bpf and netfilter hunks, why not replacing them with kvmalloc() and __GFP_NORETRY flag and add that big fat FIXME comment above there, saying explicitly that __GFP_NORETRY is not harmful though has only /partial/ effect right now and that full support needs to be implemented in future. That would still be better that not having it, imo, and the FIXME would make expectations clear to anyone reading that code. Thanks, Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-26 20:34 ` Daniel Borkmann @ 2017-01-27 10:05 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-27 10:05 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: > On 01/26/2017 02:40 PM, Michal Hocko wrote: [...] > > But realistically, how big is this problem really? Is it really worth > > it? You said this is an admin only interface and admin can kill the > > machine by OOM and other means already. > > > > Moreover and I should probably mention it explicitly, your d407bd25a204b > > reduced the likelyhood of oom for other reason. kmalloc used GPF_USER > > previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this > > could indeed hit the OOM e.g. due to memory fragmentation. It would be > > much harder to hit the OOM killer from vmalloc which doesn't issue > > higher order allocation requests. Or have you ever seen the OOM killer > > pointing to the vmalloc fallback path? > > The case I was concerned about was from vmalloc() path, not kmalloc(). > That was where the stack trace indicating OOM pointed to. As an example, > there could be really large allocation requests for maps where the map > has pre-allocated memory for its elements. Thus, if we get to the point > where we need to kill others due to shortage of mem for satisfying this, > I'd much much rather prefer to just not let vmalloc() work really hard > and fail early on instead. I see, but as already mentioned, chances are that by the time you get close to the OOM somebody else will hit the OOM before the vmalloc path manages to free the allocated memory. > In my (crafted) test case, I was connected > via ssh and it each time reliably killed my connection, which is really > suboptimal. > > F.e., I could also imagine a buggy or miscalculated map definition for > a prog that is provisioned to multiple places, which then accidentally > triggers this. Or if large on purpose, but we crossed the line, it > could be handled more gracefully, f.e. I could imagine an option to > falling back to a non-pre-allocated map flavor from the application > loading the program. Trade-off for sure, but still allowing it to > operate up to a certain extend. Granted, if vmalloc() succeeded without > trying hard and we then OOM elsewhere, too bad, but we don't have much > control over that one anyway, only about our own request. Reason I > asked above was whether having __GFP_NORETRY in would be fatal > somewhere down the path, but seems not as you say. > > So to answer your second email with the bpf and netfilter hunks, why > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY > is not harmful though has only /partial/ effect right now and that full > support needs to be implemented in future. That would still be better > that not having it, imo, and the FIXME would make expectations clear > to anyone reading that code. Well, we can do that, I just would like to prevent from this (ab)use if there is no _real_ and _sensible_ usecase for it. Having a real bug report or a fallback mechanism you are mentioning above would justify the (ab)use IMHO. But that abuse would be documented properly and have a real reason to exist. That sounds like a better approach to me. But if you absolutely _insist_ I can change that. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-27 10:05 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-27 10:05 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: > On 01/26/2017 02:40 PM, Michal Hocko wrote: [...] > > But realistically, how big is this problem really? Is it really worth > > it? You said this is an admin only interface and admin can kill the > > machine by OOM and other means already. > > > > Moreover and I should probably mention it explicitly, your d407bd25a204b > > reduced the likelyhood of oom for other reason. kmalloc used GPF_USER > > previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this > > could indeed hit the OOM e.g. due to memory fragmentation. It would be > > much harder to hit the OOM killer from vmalloc which doesn't issue > > higher order allocation requests. Or have you ever seen the OOM killer > > pointing to the vmalloc fallback path? > > The case I was concerned about was from vmalloc() path, not kmalloc(). > That was where the stack trace indicating OOM pointed to. As an example, > there could be really large allocation requests for maps where the map > has pre-allocated memory for its elements. Thus, if we get to the point > where we need to kill others due to shortage of mem for satisfying this, > I'd much much rather prefer to just not let vmalloc() work really hard > and fail early on instead. I see, but as already mentioned, chances are that by the time you get close to the OOM somebody else will hit the OOM before the vmalloc path manages to free the allocated memory. > In my (crafted) test case, I was connected > via ssh and it each time reliably killed my connection, which is really > suboptimal. > > F.e., I could also imagine a buggy or miscalculated map definition for > a prog that is provisioned to multiple places, which then accidentally > triggers this. Or if large on purpose, but we crossed the line, it > could be handled more gracefully, f.e. I could imagine an option to > falling back to a non-pre-allocated map flavor from the application > loading the program. Trade-off for sure, but still allowing it to > operate up to a certain extend. Granted, if vmalloc() succeeded without > trying hard and we then OOM elsewhere, too bad, but we don't have much > control over that one anyway, only about our own request. Reason I > asked above was whether having __GFP_NORETRY in would be fatal > somewhere down the path, but seems not as you say. > > So to answer your second email with the bpf and netfilter hunks, why > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY > is not harmful though has only /partial/ effect right now and that full > support needs to be implemented in future. That would still be better > that not having it, imo, and the FIXME would make expectations clear > to anyone reading that code. Well, we can do that, I just would like to prevent from this (ab)use if there is no _real_ and _sensible_ usecase for it. Having a real bug report or a fallback mechanism you are mentioning above would justify the (ab)use IMHO. But that abuse would be documented properly and have a real reason to exist. That sounds like a better approach to me. But if you absolutely _insist_ I can change that. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-27 10:05 ` Michal Hocko @ 2017-01-27 20:12 ` Daniel Borkmann -1 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-27 20:12 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/27/2017 11:05 AM, Michal Hocko wrote: > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: >> On 01/26/2017 02:40 PM, Michal Hocko wrote: > [...] >>> But realistically, how big is this problem really? Is it really worth >>> it? You said this is an admin only interface and admin can kill the >>> machine by OOM and other means already. >>> >>> Moreover and I should probably mention it explicitly, your d407bd25a204b >>> reduced the likelyhood of oom for other reason. kmalloc used GPF_USER >>> previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this >>> could indeed hit the OOM e.g. due to memory fragmentation. It would be >>> much harder to hit the OOM killer from vmalloc which doesn't issue >>> higher order allocation requests. Or have you ever seen the OOM killer >>> pointing to the vmalloc fallback path? >> >> The case I was concerned about was from vmalloc() path, not kmalloc(). >> That was where the stack trace indicating OOM pointed to. As an example, >> there could be really large allocation requests for maps where the map >> has pre-allocated memory for its elements. Thus, if we get to the point >> where we need to kill others due to shortage of mem for satisfying this, >> I'd much much rather prefer to just not let vmalloc() work really hard >> and fail early on instead. > > I see, but as already mentioned, chances are that by the time you get > close to the OOM somebody else will hit the OOM before the vmalloc path > manages to free the allocated memory. > >> In my (crafted) test case, I was connected >> via ssh and it each time reliably killed my connection, which is really >> suboptimal. >> >> F.e., I could also imagine a buggy or miscalculated map definition for >> a prog that is provisioned to multiple places, which then accidentally >> triggers this. Or if large on purpose, but we crossed the line, it >> could be handled more gracefully, f.e. I could imagine an option to >> falling back to a non-pre-allocated map flavor from the application >> loading the program. Trade-off for sure, but still allowing it to >> operate up to a certain extend. Granted, if vmalloc() succeeded without >> trying hard and we then OOM elsewhere, too bad, but we don't have much >> control over that one anyway, only about our own request. Reason I >> asked above was whether having __GFP_NORETRY in would be fatal >> somewhere down the path, but seems not as you say. >> >> So to answer your second email with the bpf and netfilter hunks, why >> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that >> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY >> is not harmful though has only /partial/ effect right now and that full >> support needs to be implemented in future. That would still be better >> that not having it, imo, and the FIXME would make expectations clear >> to anyone reading that code. > > Well, we can do that, I just would like to prevent from this (ab)use > if there is no _real_ and _sensible_ usecase for it. Having a real bug Understandable. > report or a fallback mechanism you are mentioning above would justify > the (ab)use IMHO. But that abuse would be documented properly and have a > real reason to exist. That sounds like a better approach to me. > > But if you absolutely _insist_ I can change that. Yeah, please do (with a big FIXME comment as mentioned), this originally came from a real bug report. Anyway, feel free to add my Acked-by then. Thanks again, Daniel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-27 20:12 ` Daniel Borkmann 0 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-27 20:12 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/27/2017 11:05 AM, Michal Hocko wrote: > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: >> On 01/26/2017 02:40 PM, Michal Hocko wrote: > [...] >>> But realistically, how big is this problem really? Is it really worth >>> it? You said this is an admin only interface and admin can kill the >>> machine by OOM and other means already. >>> >>> Moreover and I should probably mention it explicitly, your d407bd25a204b >>> reduced the likelyhood of oom for other reason. kmalloc used GPF_USER >>> previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this >>> could indeed hit the OOM e.g. due to memory fragmentation. It would be >>> much harder to hit the OOM killer from vmalloc which doesn't issue >>> higher order allocation requests. Or have you ever seen the OOM killer >>> pointing to the vmalloc fallback path? >> >> The case I was concerned about was from vmalloc() path, not kmalloc(). >> That was where the stack trace indicating OOM pointed to. As an example, >> there could be really large allocation requests for maps where the map >> has pre-allocated memory for its elements. Thus, if we get to the point >> where we need to kill others due to shortage of mem for satisfying this, >> I'd much much rather prefer to just not let vmalloc() work really hard >> and fail early on instead. > > I see, but as already mentioned, chances are that by the time you get > close to the OOM somebody else will hit the OOM before the vmalloc path > manages to free the allocated memory. > >> In my (crafted) test case, I was connected >> via ssh and it each time reliably killed my connection, which is really >> suboptimal. >> >> F.e., I could also imagine a buggy or miscalculated map definition for >> a prog that is provisioned to multiple places, which then accidentally >> triggers this. Or if large on purpose, but we crossed the line, it >> could be handled more gracefully, f.e. I could imagine an option to >> falling back to a non-pre-allocated map flavor from the application >> loading the program. Trade-off for sure, but still allowing it to >> operate up to a certain extend. Granted, if vmalloc() succeeded without >> trying hard and we then OOM elsewhere, too bad, but we don't have much >> control over that one anyway, only about our own request. Reason I >> asked above was whether having __GFP_NORETRY in would be fatal >> somewhere down the path, but seems not as you say. >> >> So to answer your second email with the bpf and netfilter hunks, why >> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that >> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY >> is not harmful though has only /partial/ effect right now and that full >> support needs to be implemented in future. That would still be better >> that not having it, imo, and the FIXME would make expectations clear >> to anyone reading that code. > > Well, we can do that, I just would like to prevent from this (ab)use > if there is no _real_ and _sensible_ usecase for it. Having a real bug Understandable. > report or a fallback mechanism you are mentioning above would justify > the (ab)use IMHO. But that abuse would be documented properly and have a > real reason to exist. That sounds like a better approach to me. > > But if you absolutely _insist_ I can change that. Yeah, please do (with a big FIXME comment as mentioned), this originally came from a real bug report. Anyway, feel free to add my Acked-by then. Thanks again, Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-27 20:12 ` Daniel Borkmann @ 2017-01-30 7:56 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-30 7:56 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Fri 27-01-17 21:12:26, Daniel Borkmann wrote: > On 01/27/2017 11:05 AM, Michal Hocko wrote: > > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: [...] > > > So to answer your second email with the bpf and netfilter hunks, why > > > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that > > > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY > > > is not harmful though has only /partial/ effect right now and that full > > > support needs to be implemented in future. That would still be better > > > that not having it, imo, and the FIXME would make expectations clear > > > to anyone reading that code. > > > > Well, we can do that, I just would like to prevent from this (ab)use > > if there is no _real_ and _sensible_ usecase for it. Having a real bug > > Understandable. > > > report or a fallback mechanism you are mentioning above would justify > > the (ab)use IMHO. But that abuse would be documented properly and have a > > real reason to exist. That sounds like a better approach to me. > > > > But if you absolutely _insist_ I can change that. > > Yeah, please do (with a big FIXME comment as mentioned), this originally > came from a real bug report. Anyway, feel free to add my Acked-by then. Thanks! I will repost the whole series today. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-30 7:56 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-30 7:56 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Fri 27-01-17 21:12:26, Daniel Borkmann wrote: > On 01/27/2017 11:05 AM, Michal Hocko wrote: > > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: [...] > > > So to answer your second email with the bpf and netfilter hunks, why > > > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that > > > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY > > > is not harmful though has only /partial/ effect right now and that full > > > support needs to be implemented in future. That would still be better > > > that not having it, imo, and the FIXME would make expectations clear > > > to anyone reading that code. > > > > Well, we can do that, I just would like to prevent from this (ab)use > > if there is no _real_ and _sensible_ usecase for it. Having a real bug > > Understandable. > > > report or a fallback mechanism you are mentioning above would justify > > the (ab)use IMHO. But that abuse would be documented properly and have a > > real reason to exist. That sounds like a better approach to me. > > > > But if you absolutely _insist_ I can change that. > > Yeah, please do (with a big FIXME comment as mentioned), this originally > came from a real bug report. Anyway, feel free to add my Acked-by then. Thanks! I will repost the whole series today. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-30 7:56 ` Michal Hocko @ 2017-01-30 16:15 ` Daniel Borkmann -1 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-30 16:15 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/30/2017 08:56 AM, Michal Hocko wrote: > On Fri 27-01-17 21:12:26, Daniel Borkmann wrote: >> On 01/27/2017 11:05 AM, Michal Hocko wrote: >>> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: > [...] >>>> So to answer your second email with the bpf and netfilter hunks, why >>>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that >>>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY >>>> is not harmful though has only /partial/ effect right now and that full >>>> support needs to be implemented in future. That would still be better >>>> that not having it, imo, and the FIXME would make expectations clear >>>> to anyone reading that code. >>> >>> Well, we can do that, I just would like to prevent from this (ab)use >>> if there is no _real_ and _sensible_ usecase for it. Having a real bug >> >> Understandable. >> >>> report or a fallback mechanism you are mentioning above would justify >>> the (ab)use IMHO. But that abuse would be documented properly and have a >>> real reason to exist. That sounds like a better approach to me. >>> >>> But if you absolutely _insist_ I can change that. >> >> Yeah, please do (with a big FIXME comment as mentioned), this originally >> came from a real bug report. Anyway, feel free to add my Acked-by then. > > Thanks! I will repost the whole series today. Looks like I got only Cc'ed on the cover letter of your v3 from today (should have been v4 actually?). Anyway, I looked up the last patch on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about? At least that was what was discussed above (insisting on __GFP_NORETRY plus FIXME comment) for providing my Acked-by then. Can you still fix that up in a final respin? Thanks again, Daniel [1] https://lkml.org/lkml/2017/1/30/129 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-30 16:15 ` Daniel Borkmann 0 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-30 16:15 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/30/2017 08:56 AM, Michal Hocko wrote: > On Fri 27-01-17 21:12:26, Daniel Borkmann wrote: >> On 01/27/2017 11:05 AM, Michal Hocko wrote: >>> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: > [...] >>>> So to answer your second email with the bpf and netfilter hunks, why >>>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that >>>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY >>>> is not harmful though has only /partial/ effect right now and that full >>>> support needs to be implemented in future. That would still be better >>>> that not having it, imo, and the FIXME would make expectations clear >>>> to anyone reading that code. >>> >>> Well, we can do that, I just would like to prevent from this (ab)use >>> if there is no _real_ and _sensible_ usecase for it. Having a real bug >> >> Understandable. >> >>> report or a fallback mechanism you are mentioning above would justify >>> the (ab)use IMHO. But that abuse would be documented properly and have a >>> real reason to exist. That sounds like a better approach to me. >>> >>> But if you absolutely _insist_ I can change that. >> >> Yeah, please do (with a big FIXME comment as mentioned), this originally >> came from a real bug report. Anyway, feel free to add my Acked-by then. > > Thanks! I will repost the whole series today. Looks like I got only Cc'ed on the cover letter of your v3 from today (should have been v4 actually?). Anyway, I looked up the last patch on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about? At least that was what was discussed above (insisting on __GFP_NORETRY plus FIXME comment) for providing my Acked-by then. Can you still fix that up in a final respin? Thanks again, Daniel [1] https://lkml.org/lkml/2017/1/30/129 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-30 16:15 ` Daniel Borkmann @ 2017-01-30 16:28 ` Michal Hocko -1 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-30 16:28 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Mon 30-01-17 17:15:08, Daniel Borkmann wrote: > On 01/30/2017 08:56 AM, Michal Hocko wrote: > > On Fri 27-01-17 21:12:26, Daniel Borkmann wrote: > > > On 01/27/2017 11:05 AM, Michal Hocko wrote: > > > > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: > > [...] > > > > > So to answer your second email with the bpf and netfilter hunks, why > > > > > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that > > > > > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY > > > > > is not harmful though has only /partial/ effect right now and that full > > > > > support needs to be implemented in future. That would still be better > > > > > that not having it, imo, and the FIXME would make expectations clear > > > > > to anyone reading that code. > > > > > > > > Well, we can do that, I just would like to prevent from this (ab)use > > > > if there is no _real_ and _sensible_ usecase for it. Having a real bug > > > > > > Understandable. > > > > > > > report or a fallback mechanism you are mentioning above would justify > > > > the (ab)use IMHO. But that abuse would be documented properly and have a > > > > real reason to exist. That sounds like a better approach to me. > > > > > > > > But if you absolutely _insist_ I can change that. > > > > > > Yeah, please do (with a big FIXME comment as mentioned), this originally > > > came from a real bug report. Anyway, feel free to add my Acked-by then. > > > > Thanks! I will repost the whole series today. > > Looks like I got only Cc'ed on the cover letter of your v3 from today > (should have been v4 actually?). Yes > Anyway, I looked up the last patch > on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about? I misread your response. I thought you were OK with the FIXME explanation. > At least that was what was discussed above (insisting on __GFP_NORETRY > plus FIXME comment) for providing my Acked-by then. Can you still fix > that up in a final respin? I will probably just drop that last patch instead. I am not convinced that we should bend the new API over and let people mimic that throughout the code. I have just seen too many examples of this pattern already. I would also like to prevent the next rebase, unless there any issues with some patches of course. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-30 16:28 ` Michal Hocko 0 siblings, 0 replies; 49+ messages in thread From: Michal Hocko @ 2017-01-30 16:28 UTC (permalink / raw) To: Daniel Borkmann Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On Mon 30-01-17 17:15:08, Daniel Borkmann wrote: > On 01/30/2017 08:56 AM, Michal Hocko wrote: > > On Fri 27-01-17 21:12:26, Daniel Borkmann wrote: > > > On 01/27/2017 11:05 AM, Michal Hocko wrote: > > > > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: > > [...] > > > > > So to answer your second email with the bpf and netfilter hunks, why > > > > > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that > > > > > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY > > > > > is not harmful though has only /partial/ effect right now and that full > > > > > support needs to be implemented in future. That would still be better > > > > > that not having it, imo, and the FIXME would make expectations clear > > > > > to anyone reading that code. > > > > > > > > Well, we can do that, I just would like to prevent from this (ab)use > > > > if there is no _real_ and _sensible_ usecase for it. Having a real bug > > > > > > Understandable. > > > > > > > report or a fallback mechanism you are mentioning above would justify > > > > the (ab)use IMHO. But that abuse would be documented properly and have a > > > > real reason to exist. That sounds like a better approach to me. > > > > > > > > But if you absolutely _insist_ I can change that. > > > > > > Yeah, please do (with a big FIXME comment as mentioned), this originally > > > came from a real bug report. Anyway, feel free to add my Acked-by then. > > > > Thanks! I will repost the whole series today. > > Looks like I got only Cc'ed on the cover letter of your v3 from today > (should have been v4 actually?). Yes > Anyway, I looked up the last patch > on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about? I misread your response. I thought you were OK with the FIXME explanation. > At least that was what was discussed above (insisting on __GFP_NORETRY > plus FIXME comment) for providing my Acked-by then. Can you still fix > that up in a final respin? I will probably just drop that last patch instead. I am not convinced that we should bend the new API over and let people mimic that throughout the code. I have just seen too many examples of this pattern already. I would also like to prevent the next rebase, unless there any issues with some patches of course. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc 2017-01-30 16:28 ` Michal Hocko @ 2017-01-30 16:45 ` Daniel Borkmann -1 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-30 16:45 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/30/2017 05:28 PM, Michal Hocko wrote: > On Mon 30-01-17 17:15:08, Daniel Borkmann wrote: >> On 01/30/2017 08:56 AM, Michal Hocko wrote: >>> On Fri 27-01-17 21:12:26, Daniel Borkmann wrote: >>>> On 01/27/2017 11:05 AM, Michal Hocko wrote: >>>>> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: >>> [...] >>>>>> So to answer your second email with the bpf and netfilter hunks, why >>>>>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that >>>>>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY >>>>>> is not harmful though has only /partial/ effect right now and that full >>>>>> support needs to be implemented in future. That would still be better >>>>>> that not having it, imo, and the FIXME would make expectations clear >>>>>> to anyone reading that code. >>>>> >>>>> Well, we can do that, I just would like to prevent from this (ab)use >>>>> if there is no _real_ and _sensible_ usecase for it. Having a real bug >>>> >>>> Understandable. >>>> >>>>> report or a fallback mechanism you are mentioning above would justify >>>>> the (ab)use IMHO. But that abuse would be documented properly and have a >>>>> real reason to exist. That sounds like a better approach to me. >>>>> >>>>> But if you absolutely _insist_ I can change that. >>>> >>>> Yeah, please do (with a big FIXME comment as mentioned), this originally >>>> came from a real bug report. Anyway, feel free to add my Acked-by then. >>> >>> Thanks! I will repost the whole series today. >> >> Looks like I got only Cc'ed on the cover letter of your v3 from today >> (should have been v4 actually?). > > Yes > >> Anyway, I looked up the last patch >> on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about? > > I misread your response. I thought you were OK with the FIXME > explanation. > >> At least that was what was discussed above (insisting on __GFP_NORETRY >> plus FIXME comment) for providing my Acked-by then. Can you still fix >> that up in a final respin? > > I will probably just drop that last patch instead. I am not convinced > that we should bend the new API over and let people mimic that > throughout the code. I have just seen too many examples of this pattern > already. > > I would also like to prevent the next rebase, unless there any issues > with some patches of course. Ok, I'm fine with that as well. Thanks, Daniel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 0/6 v3] kvmalloc @ 2017-01-30 16:45 ` Daniel Borkmann 0 siblings, 0 replies; 49+ messages in thread From: Daniel Borkmann @ 2017-01-30 16:45 UTC (permalink / raw) To: Michal Hocko Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner On 01/30/2017 05:28 PM, Michal Hocko wrote: > On Mon 30-01-17 17:15:08, Daniel Borkmann wrote: >> On 01/30/2017 08:56 AM, Michal Hocko wrote: >>> On Fri 27-01-17 21:12:26, Daniel Borkmann wrote: >>>> On 01/27/2017 11:05 AM, Michal Hocko wrote: >>>>> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote: >>> [...] >>>>>> So to answer your second email with the bpf and netfilter hunks, why >>>>>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that >>>>>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY >>>>>> is not harmful though has only /partial/ effect right now and that full >>>>>> support needs to be implemented in future. That would still be better >>>>>> that not having it, imo, and the FIXME would make expectations clear >>>>>> to anyone reading that code. >>>>> >>>>> Well, we can do that, I just would like to prevent from this (ab)use >>>>> if there is no _real_ and _sensible_ usecase for it. Having a real bug >>>> >>>> Understandable. >>>> >>>>> report or a fallback mechanism you are mentioning above would justify >>>>> the (ab)use IMHO. But that abuse would be documented properly and have a >>>>> real reason to exist. That sounds like a better approach to me. >>>>> >>>>> But if you absolutely _insist_ I can change that. >>>> >>>> Yeah, please do (with a big FIXME comment as mentioned), this originally >>>> came from a real bug report. Anyway, feel free to add my Acked-by then. >>> >>> Thanks! I will repost the whole series today. >> >> Looks like I got only Cc'ed on the cover letter of your v3 from today >> (should have been v4 actually?). > > Yes > >> Anyway, I looked up the last patch >> on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about? > > I misread your response. I thought you were OK with the FIXME > explanation. > >> At least that was what was discussed above (insisting on __GFP_NORETRY >> plus FIXME comment) for providing my Acked-by then. Can you still fix >> that up in a final respin? > > I will probably just drop that last patch instead. I am not convinced > that we should bend the new API over and let people mimic that > throughout the code. I have just seen too many examples of this pattern > already. > > I would also like to prevent the next rebase, unless there any issues > with some patches of course. Ok, I'm fine with that as well. Thanks, Daniel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2017-01-30 17:13 UTC | newest] Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-25 18:14 [PATCH 0/6 v3] kvmalloc Alexei Starovoitov 2017-01-25 18:14 ` Alexei Starovoitov 2017-01-25 20:16 ` Daniel Borkmann 2017-01-25 20:16 ` Daniel Borkmann 2017-01-26 7:43 ` Michal Hocko 2017-01-26 7:43 ` Michal Hocko 2017-01-26 9:36 ` Daniel Borkmann 2017-01-26 9:36 ` Daniel Borkmann 2017-01-26 9:48 ` David Laight 2017-01-26 9:48 ` David Laight 2017-01-26 10:08 ` Michal Hocko 2017-01-26 10:08 ` Michal Hocko 2017-01-26 10:32 ` Michal Hocko 2017-01-26 10:32 ` Michal Hocko 2017-01-26 11:04 ` Daniel Borkmann 2017-01-26 11:04 ` Daniel Borkmann 2017-01-26 11:49 ` Michal Hocko 2017-01-26 11:49 ` Michal Hocko 2017-01-26 12:14 ` Joe Perches 2017-01-26 12:14 ` Joe Perches 2017-01-26 12:27 ` Michal Hocko 2017-01-26 12:27 ` Michal Hocko 2017-01-26 11:33 ` Daniel Borkmann 2017-01-26 11:33 ` Daniel Borkmann 2017-01-26 11:58 ` Michal Hocko 2017-01-26 11:58 ` Michal Hocko 2017-01-26 13:10 ` Daniel Borkmann 2017-01-26 13:10 ` Daniel Borkmann 2017-01-26 13:40 ` Michal Hocko 2017-01-26 13:40 ` Michal Hocko 2017-01-26 14:13 ` Michal Hocko 2017-01-26 14:13 ` Michal Hocko 2017-01-26 14:13 ` Michal Hocko 2017-01-26 14:37 ` [PATCH] net, bpf: use kvzalloc helper kbuild test robot 2017-01-26 14:58 ` kbuild test robot 2017-01-26 20:34 ` [PATCH 0/6 v3] kvmalloc Daniel Borkmann 2017-01-26 20:34 ` Daniel Borkmann 2017-01-27 10:05 ` Michal Hocko 2017-01-27 10:05 ` Michal Hocko 2017-01-27 20:12 ` Daniel Borkmann 2017-01-27 20:12 ` Daniel Borkmann 2017-01-30 7:56 ` Michal Hocko 2017-01-30 7:56 ` Michal Hocko 2017-01-30 16:15 ` Daniel Borkmann 2017-01-30 16:15 ` Daniel Borkmann 2017-01-30 16:28 ` Michal Hocko 2017-01-30 16:28 ` Michal Hocko 2017-01-30 16:45 ` Daniel Borkmann 2017-01-30 16:45 ` Daniel Borkmann
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.