Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] mm/kvmalloc: do not confuse kmalloc with page order over MAX_ORDER
@ 2018-11-01  9:12 Konstantin Khlebnikov
  2018-11-01  9:33 ` Michal Hocko
  2018-11-01 10:09 ` [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE Konstantin Khlebnikov
  0 siblings, 2 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2018-11-01  9:12 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Michal Hocko, linux-kernel

Allocations over PAGE_SIZE << MAX_ORDER could be served only by vmalloc.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

---

[Thu Nov  1 08:43:56 2018] ------------[ cut here ]------------
[Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
[Thu Nov  1 08:43:56 2018] Modules linked in: ipmi_devintf ipmi_ssif ipmi_si ipmi_msghandler netconsole configfs ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_u32 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp xt_mark xt_owner xt_conntrack xt_multiport iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc cls_u32 sch_fq sch_prio intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp 8021q kvm_intel garp mrp stp i2c_algo_bit llc drm_kms_helper kvm syscopyarea sysfillrect sysimgblt irqbypass fb_sys_fops ghash_clmulni_intel ttm wdat_wdt drm mei_me lpc_ich mei shpchp mfd_core acpi_power_meter acpi_pad
[Thu Nov  1 08:43:56 2018]  ip6_tunnel tunnel6 ipip tunnel4 ip_tunnel tcp_nv mlx4_en ptp pps_core xfs btrfs zstd_decompress zstd_compress xxhash raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid10 mlx4_core nvme nvme_core devlink raid6_pq libcrc32c raid1 raid0 multipath linear [last unloaded: ipmi_msghandler]
[Thu Nov  1 08:43:56 2018] CPU: 0 PID: 6676 Comm: ip6tables Not tainted 4.14.78-31 #1
[Thu Nov  1 08:43:56 2018] Hardware name: AIC Inc. 21S-B312-B8/MB-DPHW1R AIDOS-M, BIOS AIDOS052 03/09/2017
[Thu Nov  1 08:43:56 2018] task: ffff881e909b8e40 task.stack: ffffc90023034000
[Thu Nov  1 08:43:56 2018] RIP: 0010:__fragmentation_index+0x54/0x60
[Thu Nov  1 08:43:56 2018] RSP: 0018:ffffc90023037b30 EFLAGS: 00010206
[Thu Nov  1 08:43:56 2018] RAX: 000000000000000b RBX: 0000000000064800 RCX: 000000000000000a
[Thu Nov  1 08:43:56 2018] RDX: 0000000000000192 RSI: ffffc90023037b38 RDI: 000000000000000d
[Thu Nov  1 08:43:56 2018] RBP: 000000000000000d R08: 0000000000065850 R09: 00000000000001c7
[Thu Nov  1 08:43:56 2018] R10: 000000000000000d R11: 0000000000000000 R12: ffff88207fffb5c0
[Thu Nov  1 08:43:56 2018] R13: 0000000000000004 R14: 0000000000000000 R15: ffffc90023037c10
[Thu Nov  1 08:43:56 2018] FS:  00007f6cf6ec9740(0000) GS:ffff881fffa00000(0000) knlGS:0000000000000000
[Thu Nov  1 08:43:56 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Nov  1 08:43:56 2018] CR2: 00007f6cf6e4f000 CR3: 0000001e91762002 CR4: 00000000003606f0
[Thu Nov  1 08:43:56 2018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Thu Nov  1 08:43:56 2018] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Thu Nov  1 08:43:56 2018] Call Trace:
[Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
[Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
[Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
[Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
[Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
[Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
[Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
[Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
[Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
[Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
[Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
[Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
[Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
[Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
[Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
[Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
[Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
[Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Thu Nov  1 08:43:56 2018] RIP: 0033:0x7f6cf63d121a
[Thu Nov  1 08:43:56 2018] RSP: 002b:00007ffe2b3568e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000036
[Thu Nov  1 08:43:56 2018] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007f6cf63d121a
[Thu Nov  1 08:43:56 2018] RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000008
[Thu Nov  1 08:43:56 2018] RBP: 00007f6cf4074070 R08: 000000000102c208 R09: ffff80930d5a91b0
[Thu Nov  1 08:43:56 2018] R10: 00007f6cf4074010 R11: 0000000000000206 R12: 00000000015e5018
[Thu Nov  1 08:43:56 2018] R13: 00000000015e5018 R14: 0000000000000000 R15: 00000000015e5010
[Thu Nov  1 08:43:56 2018] Code: 89 c0 48 89 c1 48 69 06 e8 03 00 00 48 f7 f1 31 d2 48 05 e8 03 00 00 49 f7 f0 ba e8 03 00 00 29 c2 89 d0 c3 b8 18 fc ff ff f3 c3 <0f> 0b 31 c0 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 41 57 41 56
[Thu Nov  1 08:43:56 2018] ---[ end trace 344fe97463e06220 ]---
---
 mm/util.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/util.c b/mm/util.c
index 8bf08b5b5760..9b15f846c281 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	gfp_t kmalloc_flags = flags;
 	void *ret;
 
+	if (size > (PAGE_SIZE << MAX_ORDER))
+		goto fallback;
+
 	/*
 	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
 	 * so the given set of flags has to be compatible.
@@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
+fallback:
 	return __vmalloc_node_flags_caller(size, node, flags,
 			__builtin_return_address(0));
 }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/kvmalloc: do not confuse kmalloc with page order over MAX_ORDER
  2018-11-01  9:12 [PATCH] mm/kvmalloc: do not confuse kmalloc with page order over MAX_ORDER Konstantin Khlebnikov
@ 2018-11-01  9:33 ` Michal Hocko
  2018-11-01 10:09 ` [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE Konstantin Khlebnikov
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-11-01  9:33 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm, Andrew Morton, linux-kernel

On Thu 01-11-18 12:12:40, Konstantin Khlebnikov wrote:
> Allocations over PAGE_SIZE << MAX_ORDER could be served only by vmalloc.

Checking against KMALLOC_MAX_SIZE makes more sense IMHO. Other than that
this makes sense to me.

> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> ---
> 
> [Thu Nov  1 08:43:56 2018] ------------[ cut here ]------------
> [Thu Nov  1 08:43:56 2018] WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
> [Thu Nov  1 08:43:56 2018] Modules linked in: ipmi_devintf ipmi_ssif ipmi_si ipmi_msghandler netconsole configfs ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_u32 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp xt_mark xt_owner xt_conntrack xt_multiport iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc cls_u32 sch_fq sch_prio intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp 8021q kvm_intel garp mrp stp i2c_algo_bit llc drm_kms_helper kvm syscopyarea sysfillrect sysimgblt irqbypass fb_sys_fops ghash_clmulni_intel ttm wdat_wdt drm mei_me lpc_ich mei shpchp mfd_core acpi_power_meter acpi_pad
> [Thu Nov  1 08:43:56 2018]  ip6_tunnel tunnel6 ipip tunnel4 ip_tunnel tcp_nv mlx4_en ptp pps_core xfs btrfs zstd_decompress zstd_compress xxhash raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid10 mlx4_core nvme nvme_core devlink raid6_pq libcrc32c raid1 raid0 multipath linear [last unloaded: ipmi_msghandler]
> [Thu Nov  1 08:43:56 2018] CPU: 0 PID: 6676 Comm: ip6tables Not tainted 4.14.78-31 #1
> [Thu Nov  1 08:43:56 2018] Hardware name: AIC Inc. 21S-B312-B8/MB-DPHW1R AIDOS-M, BIOS AIDOS052 03/09/2017
> [Thu Nov  1 08:43:56 2018] task: ffff881e909b8e40 task.stack: ffffc90023034000
> [Thu Nov  1 08:43:56 2018] RIP: 0010:__fragmentation_index+0x54/0x60
> [Thu Nov  1 08:43:56 2018] RSP: 0018:ffffc90023037b30 EFLAGS: 00010206
> [Thu Nov  1 08:43:56 2018] RAX: 000000000000000b RBX: 0000000000064800 RCX: 000000000000000a
> [Thu Nov  1 08:43:56 2018] RDX: 0000000000000192 RSI: ffffc90023037b38 RDI: 000000000000000d
> [Thu Nov  1 08:43:56 2018] RBP: 000000000000000d R08: 0000000000065850 R09: 00000000000001c7
> [Thu Nov  1 08:43:56 2018] R10: 000000000000000d R11: 0000000000000000 R12: ffff88207fffb5c0
> [Thu Nov  1 08:43:56 2018] R13: 0000000000000004 R14: 0000000000000000 R15: ffffc90023037c10
> [Thu Nov  1 08:43:56 2018] FS:  00007f6cf6ec9740(0000) GS:ffff881fffa00000(0000) knlGS:0000000000000000
> [Thu Nov  1 08:43:56 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Thu Nov  1 08:43:56 2018] CR2: 00007f6cf6e4f000 CR3: 0000001e91762002 CR4: 00000000003606f0
> [Thu Nov  1 08:43:56 2018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [Thu Nov  1 08:43:56 2018] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [Thu Nov  1 08:43:56 2018] Call Trace:
> [Thu Nov  1 08:43:56 2018]  fragmentation_index+0x76/0x90
> [Thu Nov  1 08:43:56 2018]  compaction_suitable+0x4f/0xf0
> [Thu Nov  1 08:43:56 2018]  shrink_node+0x295/0x310
> [Thu Nov  1 08:43:56 2018]  node_reclaim+0x205/0x250
> [Thu Nov  1 08:43:56 2018]  get_page_from_freelist+0x649/0xad0
> [Thu Nov  1 08:43:56 2018]  ? get_page_from_freelist+0x2d4/0xad0
> [Thu Nov  1 08:43:56 2018]  ? release_sock+0x19/0x90
> [Thu Nov  1 08:43:56 2018]  ? do_ipv6_setsockopt.isra.5+0x10da/0x1290
> [Thu Nov  1 08:43:56 2018]  __alloc_pages_nodemask+0x12a/0x2a0
> [Thu Nov  1 08:43:56 2018]  kmalloc_large_node+0x47/0x90
> [Thu Nov  1 08:43:56 2018]  __kmalloc_node+0x22b/0x2e0
> [Thu Nov  1 08:43:56 2018]  kvmalloc_node+0x3e/0x70
> [Thu Nov  1 08:43:56 2018]  xt_alloc_table_info+0x3a/0x80 [x_tables]
> [Thu Nov  1 08:43:56 2018]  do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
> [Thu Nov  1 08:43:56 2018]  nf_setsockopt+0x44/0x60
> [Thu Nov  1 08:43:56 2018]  SyS_setsockopt+0x6f/0xc0
> [Thu Nov  1 08:43:56 2018]  do_syscall_64+0x67/0x120
> [Thu Nov  1 08:43:56 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [Thu Nov  1 08:43:56 2018] RIP: 0033:0x7f6cf63d121a
> [Thu Nov  1 08:43:56 2018] RSP: 002b:00007ffe2b3568e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000036
> [Thu Nov  1 08:43:56 2018] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007f6cf63d121a
> [Thu Nov  1 08:43:56 2018] RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000008
> [Thu Nov  1 08:43:56 2018] RBP: 00007f6cf4074070 R08: 000000000102c208 R09: ffff80930d5a91b0
> [Thu Nov  1 08:43:56 2018] R10: 00007f6cf4074010 R11: 0000000000000206 R12: 00000000015e5018
> [Thu Nov  1 08:43:56 2018] R13: 00000000015e5018 R14: 0000000000000000 R15: 00000000015e5010
> [Thu Nov  1 08:43:56 2018] Code: 89 c0 48 89 c1 48 69 06 e8 03 00 00 48 f7 f1 31 d2 48 05 e8 03 00 00 49 f7 f0 ba e8 03 00 00 29 c2 89 d0 c3 b8 18 fc ff ff f3 c3 <0f> 0b 31 c0 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 41 57 41 56
> [Thu Nov  1 08:43:56 2018] ---[ end trace 344fe97463e06220 ]---
> ---
>  mm/util.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 8bf08b5b5760..9b15f846c281 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
>  
> +	if (size > (PAGE_SIZE << MAX_ORDER))
> +		goto fallback;
> +
>  	/*
>  	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>  	 * so the given set of flags has to be compatible.
> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
>  
> +fallback:
>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));
>  }
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-01  9:12 [PATCH] mm/kvmalloc: do not confuse kmalloc with page order over MAX_ORDER Konstantin Khlebnikov
  2018-11-01  9:33 ` Michal Hocko
@ 2018-11-01 10:09 ` Konstantin Khlebnikov
  2018-11-01 10:24   ` Michal Hocko
  2018-11-05 13:03   ` Vlastimil Babka
  1 sibling, 2 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2018-11-01 10:09 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Michal Hocko, linux-kernel

Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 mm/util.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/util.c b/mm/util.c
index 8bf08b5b5760..f5f04fa22814 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	gfp_t kmalloc_flags = flags;
 	void *ret;
 
+	if (size > KMALLOC_MAX_SIZE)
+		goto fallback;
+
 	/*
 	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
 	 * so the given set of flags has to be compatible.
@@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
+fallback:
 	return __vmalloc_node_flags_caller(size, node, flags,
 			__builtin_return_address(0));
 }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-01 10:09 ` [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE Konstantin Khlebnikov
@ 2018-11-01 10:24   ` Michal Hocko
  2018-11-01 10:48     ` Konstantin Khlebnikov
  2018-11-05 13:03   ` Vlastimil Babka
  1 sibling, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2018-11-01 10:24 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm, Andrew Morton, linux-kernel

On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.

I would go on and say that allocations with sizes too large can actually
trigger a warning (once you have posted in the previous version outside
of the changelog area) because that might be interesting to people -
there are deployments to panic on warning and then a warning is much
more important.

> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/util.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 8bf08b5b5760..f5f04fa22814 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
>  
> +	if (size > KMALLOC_MAX_SIZE)
> +		goto fallback;
> +
>  	/*
>  	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>  	 * so the given set of flags has to be compatible.
> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
>  
> +fallback:
>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));
>  }
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-01 10:24   ` Michal Hocko
@ 2018-11-01 10:48     ` Konstantin Khlebnikov
  2018-11-01 12:55       ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2018-11-01 10:48 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Andrew Morton, linux-kernel



On 01.11.2018 13:24, Michal Hocko wrote:
> On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
>> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> 
> I would go on and say that allocations with sizes too large can actually
> trigger a warning (once you have posted in the previous version outside
> of the changelog area) because that might be interesting to people -
> there are deployments to panic on warning and then a warning is much
> more important.

It seems that warning isn't completely valid.


__alloc_pages_slowpath() handles this more gracefully:

	/*
	 * In the slowpath, we sanity check order to avoid ever trying to
	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
	 * be using allocators in order of preference for an area that is
	 * too large.
	 */
	if (order >= MAX_ORDER) {
		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
		return NULL;
	}


Fast path is ready for order >= MAX_ORDER


Problem is in node_reclaim() which is called earlier than __alloc_pages_slowpath()
from surprising place - get_page_from_freelist()


Probably node_reclaim() simply needs something like this:

	if (order >= MAX_ORDER)
		return NODE_RECLAIM_NOSCAN;


> 
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thanks!
> 
>> ---
>>   mm/util.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/mm/util.c b/mm/util.c
>> index 8bf08b5b5760..f5f04fa22814 100644
>> --- a/mm/util.c
>> +++ b/mm/util.c
>> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>   	gfp_t kmalloc_flags = flags;
>>   	void *ret;
>>   
>> +	if (size > KMALLOC_MAX_SIZE)
>> +		goto fallback;
>> +
>>   	/*
>>   	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>   	 * so the given set of flags has to be compatible.
>> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>   	if (ret || size <= PAGE_SIZE)
>>   		return ret;
>>   
>> +fallback:
>>   	return __vmalloc_node_flags_caller(size, node, flags,
>>   			__builtin_return_address(0));
>>   }
>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-01 10:48     ` Konstantin Khlebnikov
@ 2018-11-01 12:55       ` Michal Hocko
  2018-11-01 16:42         ` Konstantin Khlebnikov
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2018-11-01 12:55 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm, Andrew Morton, linux-kernel

On Thu 01-11-18 13:48:17, Konstantin Khlebnikov wrote:
> 
> 
> On 01.11.2018 13:24, Michal Hocko wrote:
> > On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
> > > Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> > 
> > I would go on and say that allocations with sizes too large can actually
> > trigger a warning (once you have posted in the previous version outside
> > of the changelog area) because that might be interesting to people -
> > there are deployments to panic on warning and then a warning is much
> > more important.
> 
> It seems that warning isn't completely valid.
> 
> 
> __alloc_pages_slowpath() handles this more gracefully:
> 
> 	/*
> 	 * In the slowpath, we sanity check order to avoid ever trying to
> 	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> 	 * be using allocators in order of preference for an area that is
> 	 * too large.
> 	 */
> 	if (order >= MAX_ORDER) {
> 		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> 		return NULL;
> 	}
> 
> 
> Fast path is ready for order >= MAX_ORDER
> 
> 
> Problem is in node_reclaim() which is called earlier than __alloc_pages_slowpath()
> from surprising place - get_page_from_freelist()
> 
> 
> Probably node_reclaim() simply needs something like this:
> 
> 	if (order >= MAX_ORDER)
> 		return NODE_RECLAIM_NOSCAN;

Maybe but the point is that triggering this warning is possible. Even if
the warning is bogus it doesn't really make much sense to even try
kmalloc if the size is not supported by the allocator.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-01 12:55       ` Michal Hocko
@ 2018-11-01 16:42         ` Konstantin Khlebnikov
  2018-11-01 16:55           ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2018-11-01 16:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Andrew Morton, linux-kernel

On 01.11.2018 15:55, Michal Hocko wrote:
> On Thu 01-11-18 13:48:17, Konstantin Khlebnikov wrote:
>>
>>
>> On 01.11.2018 13:24, Michal Hocko wrote:
>>> On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
>>>> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
>>>
>>> I would go on and say that allocations with sizes too large can actually
>>> trigger a warning (once you have posted in the previous version outside
>>> of the changelog area) because that might be interesting to people -
>>> there are deployments to panic on warning and then a warning is much
>>> more important.
>>
>> It seems that warning isn't completely valid.
>>
>>
>> __alloc_pages_slowpath() handles this more gracefully:
>>
>> 	/*
>> 	 * In the slowpath, we sanity check order to avoid ever trying to
>> 	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
>> 	 * be using allocators in order of preference for an area that is
>> 	 * too large.
>> 	 */
>> 	if (order >= MAX_ORDER) {
>> 		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>> 		return NULL;
>> 	}
>>
>>
>> Fast path is ready for order >= MAX_ORDER
>>
>>
>> Problem is in node_reclaim() which is called earlier than __alloc_pages_slowpath()
>> from surprising place - get_page_from_freelist()
>>
>>
>> Probably node_reclaim() simply needs something like this:
>>
>> 	if (order >= MAX_ORDER)
>> 		return NODE_RECLAIM_NOSCAN;
> 
> Maybe but the point is that triggering this warning is possible. Even if
> the warning is bogus it doesn't really make much sense to even try
> kmalloc if the size is not supported by the allocator.
> 

But __GFP_NOWARN allocation (like in this case) should just fail silently
without warnings regardless of reason because caller can deal with that.

Without __GFP_NOWARN allocator should print standard warning.

Caller anyway must handle NULL\ENOMEM result - this error path
should be used for handling impossible sizes too.
Of course it could check size first, just as optimization.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-01 16:42         ` Konstantin Khlebnikov
@ 2018-11-01 16:55           ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-11-01 16:55 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: linux-mm, Andrew Morton, linux-kernel

On Thu 01-11-18 19:42:48, Konstantin Khlebnikov wrote:
> On 01.11.2018 15:55, Michal Hocko wrote:
> > On Thu 01-11-18 13:48:17, Konstantin Khlebnikov wrote:
> > > 
> > > 
> > > On 01.11.2018 13:24, Michal Hocko wrote:
> > > > On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
> > > > > Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> > > > 
> > > > I would go on and say that allocations with sizes too large can actually
> > > > trigger a warning (once you have posted in the previous version outside
> > > > of the changelog area) because that might be interesting to people -
> > > > there are deployments to panic on warning and then a warning is much
> > > > more important.
> > > 
> > > It seems that warning isn't completely valid.
> > > 
> > > 
> > > __alloc_pages_slowpath() handles this more gracefully:
> > > 
> > > 	/*
> > > 	 * In the slowpath, we sanity check order to avoid ever trying to
> > > 	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > > 	 * be using allocators in order of preference for an area that is
> > > 	 * too large.
> > > 	 */
> > > 	if (order >= MAX_ORDER) {
> > > 		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> > > 		return NULL;
> > > 	}
> > > 
> > > 
> > > Fast path is ready for order >= MAX_ORDER
> > > 
> > > 
> > > Problem is in node_reclaim() which is called earlier than __alloc_pages_slowpath()
> > > from surprising place - get_page_from_freelist()
> > > 
> > > 
> > > Probably node_reclaim() simply needs something like this:
> > > 
> > > 	if (order >= MAX_ORDER)
> > > 		return NODE_RECLAIM_NOSCAN;
> > 
> > Maybe but the point is that triggering this warning is possible. Even if
> > the warning is bogus it doesn't really make much sense to even try
> > kmalloc if the size is not supported by the allocator.
> > 
> 
> But __GFP_NOWARN allocation (like in this case) should just fail silently
> without warnings regardless of reason because caller can deal with that.

__GFP_NOWARN is not about no warning to be triggered from the allocation
context. It is more about not complaining about the allocation failure.
I do not think we want to check the gfp mask in all possible paths
triggered from the allocator/reclaim.

I have just looked at the original warning you have hit and it came from
88d6ac40c1c6 ("mm/vmstat: fix divide error at __fragmentation_index"). I
would argue that the warning is a bit of an over-reaction. Regardless of
the gfp_mask.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-01 10:09 ` [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE Konstantin Khlebnikov
  2018-11-01 10:24   ` Michal Hocko
@ 2018-11-05 13:03   ` Vlastimil Babka
  2018-11-05 16:19     ` Konstantin Khlebnikov
  1 sibling, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2018-11-05 13:03 UTC (permalink / raw)
  To: Konstantin Khlebnikov, linux-mm, Andrew Morton, Michal Hocko,
	linux-kernel

On 11/1/18 11:09 AM, Konstantin Khlebnikov wrote:
> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Makes sense regardless of warnings stuff.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

But it must be moved below the GFP_KERNEL check!

> ---
>  mm/util.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 8bf08b5b5760..f5f04fa22814 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
>  
> +	if (size > KMALLOC_MAX_SIZE)
> +		goto fallback;
> +
>  	/*
>  	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>  	 * so the given set of flags has to be compatible.
> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
>  
> +fallback:
>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));
>  }
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-05 13:03   ` Vlastimil Babka
@ 2018-11-05 16:19     ` Konstantin Khlebnikov
  2018-11-05 16:52       ` Vlastimil Babka
  2018-11-05 16:57       ` Michal Hocko
  0 siblings, 2 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2018-11-05 16:19 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Andrew Morton, Michal Hocko, linux-kernel



On 05.11.2018 16:03, Vlastimil Babka wrote:
> On 11/1/18 11:09 AM, Konstantin Khlebnikov wrote:
>> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> Makes sense regardless of warnings stuff.
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> But it must be moved below the GFP_KERNEL check!

But kmalloc cannot handle it regardless of GFP.

Ok maybe write something like this

if (size > KMALLOC_MAX_SIZE) {
	if (WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL)
		return NULL;
	goto do_vmalloc;
}

or fix that uncertainty right in vmalloc

For now comment in vmalloc declares

  *	Any use of gfp flags outside of GFP_KERNEL should be consulted
  *	with mm people.

=)

> 
>> ---
>>   mm/util.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/mm/util.c b/mm/util.c
>> index 8bf08b5b5760..f5f04fa22814 100644
>> --- a/mm/util.c
>> +++ b/mm/util.c
>> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>   	gfp_t kmalloc_flags = flags;
>>   	void *ret;
>>   
>> +	if (size > KMALLOC_MAX_SIZE)
>> +		goto fallback;
>> +
>>   	/*
>>   	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>   	 * so the given set of flags has to be compatible.
>> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>   	if (ret || size <= PAGE_SIZE)
>>   		return ret;
>>   
>> +fallback:
>>   	return __vmalloc_node_flags_caller(size, node, flags,
>>   			__builtin_return_address(0));
>>   }
>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-05 16:19     ` Konstantin Khlebnikov
@ 2018-11-05 16:52       ` Vlastimil Babka
  2018-11-05 16:57       ` Michal Hocko
  1 sibling, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2018-11-05 16:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov, linux-mm, Andrew Morton, Michal Hocko,
	linux-kernel

On 11/5/18 5:19 PM, Konstantin Khlebnikov wrote:
> 
> 
> On 05.11.2018 16:03, Vlastimil Babka wrote:
>> On 11/1/18 11:09 AM, Konstantin Khlebnikov wrote:
>>> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>
>> Makes sense regardless of warnings stuff.
>>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>>
>> But it must be moved below the GFP_KERNEL check!
> 
> But kmalloc cannot handle it regardless of GFP.

Sure, but that's less problematic than skipping to vmalloc() for
!GFP_KERNEL. Especially for large sizes where it's likely that page
tables might get allocated (with GFP_KERNEL).

> Ok maybe write something like this
> 
> if (size > KMALLOC_MAX_SIZE) {
> 	if (WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL)
> 		return NULL;
> 	goto do_vmalloc;
> }

Probably should check also for __GFP_NOWARN.

> or fix that uncertainty right in vmalloc
> 
> For now comment in vmalloc declares
> 
>   *	Any use of gfp flags outside of GFP_KERNEL should be consulted
>   *	with mm people.

Dunno, what does Michal think?

> =)
> 
>>
>>> ---
>>>   mm/util.c |    4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/mm/util.c b/mm/util.c
>>> index 8bf08b5b5760..f5f04fa22814 100644
>>> --- a/mm/util.c
>>> +++ b/mm/util.c
>>> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>>   	gfp_t kmalloc_flags = flags;
>>>   	void *ret;
>>>   
>>> +	if (size > KMALLOC_MAX_SIZE)
>>> +		goto fallback;
>>> +
>>>   	/*
>>>   	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>>   	 * so the given set of flags has to be compatible.
>>> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>>   	if (ret || size <= PAGE_SIZE)
>>>   		return ret;
>>>   
>>> +fallback:
>>>   	return __vmalloc_node_flags_caller(size, node, flags,
>>>   			__builtin_return_address(0));
>>>   }
>>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE
  2018-11-05 16:19     ` Konstantin Khlebnikov
  2018-11-05 16:52       ` Vlastimil Babka
@ 2018-11-05 16:57       ` Michal Hocko
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-11-05 16:57 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Vlastimil Babka, linux-mm, Andrew Morton, linux-kernel

On Mon 05-11-18 19:19:28, Konstantin Khlebnikov wrote:
> 
> 
> On 05.11.2018 16:03, Vlastimil Babka wrote:
> > On 11/1/18 11:09 AM, Konstantin Khlebnikov wrote:
> > > Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> > > 
> > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > 
> > Makes sense regardless of warnings stuff.
> > 
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > But it must be moved below the GFP_KERNEL check!
> 
> But kmalloc cannot handle it regardless of GFP.
> 
> Ok maybe write something like this
> 
> if (size > KMALLOC_MAX_SIZE) {
> 	if (WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL)
> 		return NULL;
> 	goto do_vmalloc;
> }

Do we really have to be so defensive? I agree with Vlastimil that the
check should be done after GFP_KERNEL check (I should have noticed that).
kmalloc should already complain on the allocation size request.

> or fix that uncertainty right in vmalloc
> 
> For now comment in vmalloc declares
> 
>  *	Any use of gfp flags outside of GFP_KERNEL should be consulted
>  *	with mm people.

Which is what we want. There are some exceptional cases where using a
subset of GFP_KERNEL works fine (e.g. scope nofs/noio context).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01  9:12 [PATCH] mm/kvmalloc: do not confuse kmalloc with page order over MAX_ORDER Konstantin Khlebnikov
2018-11-01  9:33 ` Michal Hocko
2018-11-01 10:09 ` [PATCH 2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE Konstantin Khlebnikov
2018-11-01 10:24   ` Michal Hocko
2018-11-01 10:48     ` Konstantin Khlebnikov
2018-11-01 12:55       ` Michal Hocko
2018-11-01 16:42         ` Konstantin Khlebnikov
2018-11-01 16:55           ` Michal Hocko
2018-11-05 13:03   ` Vlastimil Babka
2018-11-05 16:19     ` Konstantin Khlebnikov
2018-11-05 16:52       ` Vlastimil Babka
2018-11-05 16:57       ` Michal Hocko

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git