All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6 v3] kvmalloc
@ 2017-01-12 15:37 ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Alexei Starovoitov, Anatoly Stepanov,
	Andreas Dilger, Andreas Dilger, Anton Vorontsov, Ben Skeggs,
	Boris Ostrovsky, Colin Cross, Dan Williams, David Sterba,
	Eric Dumazet, Eric Dumazet, Hariprasad S, Heiko Carstens,
	Herbert Xu, Ilya Dryomov, Kees Cook, Kent Overstreet,
	Martin Schwidefsky, Michael S. Tsirkin, Michal Hocko,
	Mike Snitzer, Oleg Drokin, Paolo Bonzini, Rafael J. Wysocki,
	Santosh Raspatur, Tariq Toukan, Theodore Ts'o, Tom Herbert,
	Tony Luck, Yan, Zheng, Yishai Hadas

Hi,
this has been previously posted as a single patch [1] but later on more
built on top. It turned out that there are users who would like to have
__GFP_REPEAT semantic. This is currently implemented for costly >64B
requests. Doing the same for smaller requests would require to redefine
__GFP_REPEAT semantic in the page allocator which is out of scope of
this series.

There are many open coded kmalloc with vmalloc fallback instances in
the tree.  Most of them are not careful enough or simply do not care
about the underlying semantic of the kmalloc/page allocator which means
that a) some vmalloc fallbacks are basically unreachable because the
kmalloc part will keep retrying until it succeeds b) the page allocator
can invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers I could find have been converted to use the helper instead.
This is patch 5. There are some more relying on __GFP_REPEAT in the
networking stack which I have converted as well but considering we do
not have a support for __GFP_REPEAT for requests smaller than 64kB I
have marked it RFC.

[1] http://lkml.kernel.org/r/20170102133700.1734-1-mhocko@kernel.org

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 0/6 v3] kvmalloc
@ 2017-01-12 15:37 ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Alexei Starovoitov, Anatoly Stepanov,
	Andreas Dilger, Andreas Dilger, Anton Vorontsov, Ben Skeggs,
	Boris Ostrovsky, Colin Cross, Dan Williams, David Sterba,
	Eric Dumazet, Eric Dumazet, Hariprasad S, Heiko Carstens,
	Herbert Xu, Ilya Dryomov, Kees Cook, Kent Overstreet,
	Martin Schwidefsky, Michael S. Tsirkin, Michal Hocko,
	Mike Snitzer, Oleg Drokin, Paolo Bonzini, Rafael J. Wysocki,
	Santosh Raspatur, Tariq Toukan, Theodore Ts'o, Tom Herbert,
	Tony Luck, Yan, Zheng, Yishai Hadas

Hi,
this has been previously posted as a single patch [1] but later on more
built on top. It turned out that there are users who would like to have
__GFP_REPEAT semantic. This is currently implemented for costly >64B
requests. Doing the same for smaller requests would require to redefine
__GFP_REPEAT semantic in the page allocator which is out of scope of
this series.

There are many open coded kmalloc with vmalloc fallback instances in
the tree.  Most of them are not careful enough or simply do not care
about the underlying semantic of the kmalloc/page allocator which means
that a) some vmalloc fallbacks are basically unreachable because the
kmalloc part will keep retrying until it succeeds b) the page allocator
can invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers I could find have been converted to use the helper instead.
This is patch 5. There are some more relying on __GFP_REPEAT in the
networking stack which I have converted as well but considering we do
not have a support for __GFP_REPEAT for requests smaller than 64kB I
have marked it RFC.

[1] http://lkml.kernel.org/r/20170102133700.1734-1-mhocko@kernel.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-12 15:37 ` Michal Hocko
@ 2017-01-12 15:37   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

From: Michal Hocko <mhocko@suse.com>

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
in general (note that the page table allocation is GFP_KERNEL). Those
need to be fixed separately.

apparmor has already claimed kv[mz]alloc so remove those and use
__aa_kvmalloc instead to prevent from the naming clashes.

Changes since v3
- add ipc_alloc

Changes since v2
- s@WARN_ON@WARN_ON_ONCE@ as per Vlastimil
- do not fallback to vmalloc for size = PAGE_SIZE as per Vlastimil

Changes since v1
- define __vmalloc_node_flags for CONFIG_MMU=n

Cc: Anatoly Stepanov <astepanov@cloudlinux.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> # ext4 part
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/kvm/lapic.c                 |  4 ++--
 arch/x86/kvm/page_track.c            |  4 ++--
 arch/x86/kvm/x86.c                   |  4 ++--
 drivers/md/dm-stats.c                |  7 +-----
 fs/ext4/mballoc.c                    |  2 +-
 fs/ext4/super.c                      |  4 ++--
 fs/f2fs/f2fs.h                       | 20 -----------------
 fs/f2fs/file.c                       |  4 ++--
 fs/f2fs/segment.c                    | 14 ++++++------
 fs/seq_file.c                        | 16 +-------------
 include/linux/kvm_host.h             |  2 --
 include/linux/mm.h                   | 14 ++++++++++++
 include/linux/vmalloc.h              |  1 +
 ipc/util.c                           |  7 +-----
 mm/nommu.c                           |  5 +++++
 mm/util.c                            | 42 ++++++++++++++++++++++++++++++++++++
 mm/vmalloc.c                         |  2 +-
 security/apparmor/apparmorfs.c       |  2 +-
 security/apparmor/include/apparmor.h | 10 ---------
 security/apparmor/match.c            |  2 +-
 virt/kvm/kvm_main.c                  | 18 +++-------------
 21 files changed, 89 insertions(+), 95 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5fe290c1b7d8..daf114c3b8ad 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -167,8 +167,8 @@ static void recalculate_apic_map(struct kvm *kvm)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
 
-	new = kvm_kvzalloc(sizeof(struct kvm_apic_map) +
-	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1));
+	new = kvzalloc(sizeof(struct kvm_apic_map) +
+	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1), GFP_KERNEL);
 
 	if (!new)
 		goto out;
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index 4a1c13eaa518..d46663e655b0 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -38,8 +38,8 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
 	int  i;
 
 	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		slot->arch.gfn_track[i] = kvm_kvzalloc(npages *
-					    sizeof(*slot->arch.gfn_track[i]));
+		slot->arch.gfn_track[i] = kvzalloc(npages *
+					    sizeof(*slot->arch.gfn_track[i]), GFP_KERNEL);
 		if (!slot->arch.gfn_track[i])
 			goto track_free;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 51ccfe08e32f..ba55bc338f25 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8082,13 +8082,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 				      slot->base_gfn, level) + 1;
 
 		slot->arch.rmap[i] =
-			kvm_kvzalloc(lpages * sizeof(*slot->arch.rmap[i]));
+			kvzalloc(lpages * sizeof(*slot->arch.rmap[i]), GFP_KERNEL);
 		if (!slot->arch.rmap[i])
 			goto out_free;
 		if (i == 0)
 			continue;
 
-		linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
+		linfo = kvzalloc(lpages * sizeof(*linfo), GFP_KERNEL);
 		if (!linfo)
 			goto out_free;
 
diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 38b05f23b96c..674f9a1686f7 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -146,12 +146,7 @@ static void *dm_kvzalloc(size_t alloc_size, int node)
 	if (!claim_shared_memory(alloc_size))
 		return NULL;
 
-	if (alloc_size <= KMALLOC_MAX_SIZE) {
-		p = kzalloc_node(alloc_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, node);
-		if (p)
-			return p;
-	}
-	p = vzalloc_node(alloc_size, node);
+	p = kvzalloc_node(alloc_size, GFP_KERNEL | __GFP_NOMEMALLOC, node);
 	if (p)
 		return p;
 
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d9fd184b049e..31a761dd76f5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2381,7 +2381,7 @@ int ext4_mb_alloc_groupinfo(struct super_block *sb, ext4_group_t ngroups)
 		return 0;
 
 	size = roundup_pow_of_two(sizeof(*sbi->s_group_info) * size);
-	new_groupinfo = ext4_kvzalloc(size, GFP_KERNEL);
+	new_groupinfo = kvzalloc(size, GFP_KERNEL);
 	if (!new_groupinfo) {
 		ext4_msg(sb, KERN_ERR, "can't allocate buddy meta group");
 		return -ENOMEM;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 66845a08a87a..c65fe19a2a4f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2116,7 +2116,7 @@ int ext4_alloc_flex_bg_array(struct super_block *sb, ext4_group_t ngroup)
 		return 0;
 
 	size = roundup_pow_of_two(size * sizeof(struct flex_groups));
-	new_groups = ext4_kvzalloc(size, GFP_KERNEL);
+	new_groups = kvzalloc(size, GFP_KERNEL);
 	if (!new_groups) {
 		ext4_msg(sb, KERN_ERR, "not enough memory for %d flex groups",
 			 size / (int) sizeof(struct flex_groups));
@@ -3850,7 +3850,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 			goto failed_mount;
 		}
 	}
-	sbi->s_group_desc = ext4_kvmalloc(db_count *
+	sbi->s_group_desc = kvmalloc(db_count *
 					  sizeof(struct buffer_head *),
 					  GFP_KERNEL);
 	if (sbi->s_group_desc == NULL) {
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2da8c3aa0ce5..4130df0a8e64 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1929,26 +1929,6 @@ static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi,
 	return kmalloc(size, flags);
 }
 
-static inline void *f2fs_kvmalloc(size_t size, gfp_t flags)
-{
-	void *ret;
-
-	ret = kmalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags, PAGE_KERNEL);
-	return ret;
-}
-
-static inline void *f2fs_kvzalloc(size_t size, gfp_t flags)
-{
-	void *ret;
-
-	ret = kzalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags | __GFP_ZERO, PAGE_KERNEL);
-	return ret;
-}
-
 #define get_inode_mode(i) \
 	((is_inode_flag_set(i, FI_ACL_MODE)) ? \
 	 (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 49f10dce817d..fb2e0c156135 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1013,11 +1013,11 @@ static int __exchange_data_block(struct inode *src_inode,
 	while (len) {
 		olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len);
 
-		src_blkaddr = f2fs_kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
+		src_blkaddr = kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
 		if (!src_blkaddr)
 			return -ENOMEM;
 
-		do_replace = f2fs_kvzalloc(sizeof(int) * olen, GFP_KERNEL);
+		do_replace = kvzalloc(sizeof(int) * olen, GFP_KERNEL);
 		if (!do_replace) {
 			kvfree(src_blkaddr);
 			return -ENOMEM;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0738f48293cc..c50c883bfc1a 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2286,13 +2286,13 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
 
 	SM_I(sbi)->sit_info = sit_i;
 
-	sit_i->sentries = f2fs_kvzalloc(MAIN_SEGS(sbi) *
+	sit_i->sentries = kvzalloc(MAIN_SEGS(sbi) *
 					sizeof(struct seg_entry), GFP_KERNEL);
 	if (!sit_i->sentries)
 		return -ENOMEM;
 
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
-	sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+	sit_i->dirty_sentries_bitmap = kvzalloc(bitmap_size, GFP_KERNEL);
 	if (!sit_i->dirty_sentries_bitmap)
 		return -ENOMEM;
 
@@ -2318,7 +2318,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
 		return -ENOMEM;
 
 	if (sbi->segs_per_sec > 1) {
-		sit_i->sec_entries = f2fs_kvzalloc(MAIN_SECS(sbi) *
+		sit_i->sec_entries = kvzalloc(MAIN_SECS(sbi) *
 					sizeof(struct sec_entry), GFP_KERNEL);
 		if (!sit_i->sec_entries)
 			return -ENOMEM;
@@ -2364,12 +2364,12 @@ static int build_free_segmap(struct f2fs_sb_info *sbi)
 	SM_I(sbi)->free_info = free_i;
 
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
-	free_i->free_segmap = f2fs_kvmalloc(bitmap_size, GFP_KERNEL);
+	free_i->free_segmap = kvmalloc(bitmap_size, GFP_KERNEL);
 	if (!free_i->free_segmap)
 		return -ENOMEM;
 
 	sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
-	free_i->free_secmap = f2fs_kvmalloc(sec_bitmap_size, GFP_KERNEL);
+	free_i->free_secmap = kvmalloc(sec_bitmap_size, GFP_KERNEL);
 	if (!free_i->free_secmap)
 		return -ENOMEM;
 
@@ -2537,7 +2537,7 @@ static int init_victim_secmap(struct f2fs_sb_info *sbi)
 	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
 	unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
 
-	dirty_i->victim_secmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+	dirty_i->victim_secmap = kvzalloc(bitmap_size, GFP_KERNEL);
 	if (!dirty_i->victim_secmap)
 		return -ENOMEM;
 	return 0;
@@ -2559,7 +2559,7 @@ static int build_dirty_segmap(struct f2fs_sb_info *sbi)
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
 
 	for (i = 0; i < NR_DIRTY_TYPE; i++) {
-		dirty_i->dirty_segmap[i] = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+		dirty_i->dirty_segmap[i] = kvzalloc(bitmap_size, GFP_KERNEL);
 		if (!dirty_i->dirty_segmap[i])
 			return -ENOMEM;
 	}
diff --git a/fs/seq_file.c b/fs/seq_file.c
index ca69fb99e41a..dc7c2be963ed 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -25,21 +25,7 @@ static void seq_set_overflow(struct seq_file *m)
 
 static void *seq_buf_alloc(unsigned long size)
 {
-	void *buf;
-	gfp_t gfp = GFP_KERNEL;
-
-	/*
-	 * For high order allocations, use __GFP_NORETRY to avoid oom-killing -
-	 * it's better to fall back to vmalloc() than to kill things.  For small
-	 * allocations, just use GFP_KERNEL which will oom kill, thus no need
-	 * for vmalloc fallback.
-	 */
-	if (size > PAGE_SIZE)
-		gfp |= __GFP_NORETRY | __GFP_NOWARN;
-	buf = kmalloc(size, gfp);
-	if (!buf && size > PAGE_SIZE)
-		buf = vmalloc(size);
-	return buf;
+	return kvmalloc(size, GFP_KERNEL);
 }
 
 /**
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1c5190dab2c1..00e6f93d1ee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -768,8 +768,6 @@ void kvm_arch_check_processor_compat(void *rtn);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
 
-void *kvm_kvzalloc(unsigned long size);
-
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fe6b4036664a..55fd570c3e1e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -484,6 +484,20 @@ static inline int is_vmalloc_or_module_addr(const void *x)
 }
 #endif
 
+extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
+static inline void *kvmalloc(size_t size, gfp_t flags)
+{
+	return kvmalloc_node(size, flags, NUMA_NO_NODE);
+}
+static inline void *kvzalloc_node(size_t size, gfp_t flags, int node)
+{
+	return kvmalloc_node(size, flags | __GFP_ZERO, node);
+}
+static inline void *kvzalloc(size_t size, gfp_t flags)
+{
+	return kvmalloc(size, flags | __GFP_ZERO);
+}
+
 extern void kvfree(const void *addr);
 
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index d68edffbf142..46991ad3ddd5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -80,6 +80,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			unsigned long start, unsigned long end, gfp_t gfp_mask,
 			pgprot_t prot, unsigned long vm_flags, int node,
 			const void *caller);
+extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
diff --git a/ipc/util.c b/ipc/util.c
index 798cad18dd87..74c2adc62086 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -403,12 +403,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
  */
 void *ipc_alloc(int size)
 {
-	void *out;
-	if (size > PAGE_SIZE)
-		out = vmalloc(size);
-	else
-		out = kmalloc(size, GFP_KERNEL);
-	return out;
+	return kvmalloc(size, GFP_KERNEL);
 }
 
 /**
diff --git a/mm/nommu.c b/mm/nommu.c
index 24f9f5f39145..f1927890f75e 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -236,6 +236,11 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
 }
 EXPORT_SYMBOL(__vmalloc);
 
+void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags)
+{
+	return __vmalloc(size, flags, PAGE_KERNEL);
+}
+
 void *vmalloc_user(unsigned long size)
 {
 	void *ret;
diff --git a/mm/util.c b/mm/util.c
index 3cb2164f4099..7e0c240b5760 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_mmap);
 
+/**
+ * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
+ * @size: size of the request.
+ * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
+ * @node: numa node to allocate from
+ *
+ * Uses kmalloc to get the memory but if the allocation fails then falls back
+ * to the vmalloc allocator. Use kvfree for freeing the memory.
+ *
+ * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
+ */
+void *kvmalloc_node(size_t size, gfp_t flags, int node)
+{
+	gfp_t kmalloc_flags = flags;
+	void *ret;
+
+	/*
+	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
+	 * so the given set of flags has to be compatible.
+	 */
+	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
+
+	/*
+	 * Make sure that larger requests are not too disruptive - no OOM
+	 * killer and no allocation failure warnings as we have a fallback
+	 */
+	if (size > PAGE_SIZE)
+		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
+
+	ret = kmalloc_node(size, kmalloc_flags, node);
+
+	/*
+	 * It doesn't really make sense to fallback to vmalloc for sub page
+	 * requests
+	 */
+	if (ret || size <= PAGE_SIZE)
+		return ret;
+
+	return __vmalloc_node_flags(size, node, flags);
+}
+EXPORT_SYMBOL(kvmalloc_node);
+
 void kvfree(const void *addr)
 {
 	if (is_vmalloc_addr(addr))
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3ca82d44edd3..1039b1230889 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1757,7 +1757,7 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-static inline void *__vmalloc_node_flags(unsigned long size,
+void *__vmalloc_node_flags(unsigned long size,
 					int node, gfp_t flags)
 {
 	return __vmalloc_node(size, 1, flags, PAGE_KERNEL,
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 5923d5665209..83789a03379f 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -100,7 +100,7 @@ static char *aa_simple_write_to_buffer(int op, const char __user *userbuf,
 		return ERR_PTR(-EACCES);
 
 	/* freed by caller to simple_write_to_buffer */
-	data = kvmalloc(alloc_size);
+	data = __aa_kvmalloc(alloc_size, 0);
 	if (data == NULL)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/security/apparmor/include/apparmor.h b/security/apparmor/include/apparmor.h
index 5d721e990876..c88fb0ebc756 100644
--- a/security/apparmor/include/apparmor.h
+++ b/security/apparmor/include/apparmor.h
@@ -68,16 +68,6 @@ char *aa_split_fqname(char *args, char **ns_name);
 void aa_info_message(const char *str);
 void *__aa_kvmalloc(size_t size, gfp_t flags);
 
-static inline void *kvmalloc(size_t size)
-{
-	return __aa_kvmalloc(size, 0);
-}
-
-static inline void *kvzalloc(size_t size)
-{
-	return __aa_kvmalloc(size, __GFP_ZERO);
-}
-
 /* returns 0 if kref not incremented */
 static inline int kref_get_not0(struct kref *kref)
 {
diff --git a/security/apparmor/match.c b/security/apparmor/match.c
index 3f900fcca8fb..55f6ae0067a3 100644
--- a/security/apparmor/match.c
+++ b/security/apparmor/match.c
@@ -61,7 +61,7 @@ static struct table_header *unpack_table(char *blob, size_t bsize)
 	if (bsize < tsize)
 		goto out;
 
-	table = kvzalloc(tsize);
+	table = __aa_kvmalloc(tsize, __GFP_ZERO);
 	if (table) {
 		table->td_id = th.td_id;
 		table->td_flags = th.td_flags;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 482612b4e496..dbfe0e79232d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -502,7 +502,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
 	int i;
 	struct kvm_memslots *slots;
 
-	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
+	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!slots)
 		return NULL;
 
@@ -685,18 +685,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	return ERR_PTR(r);
 }
 
-/*
- * Avoid using vmalloc for a small buffer.
- * Should not be used when the size is statically known.
- */
-void *kvm_kvzalloc(unsigned long size)
-{
-	if (size > PAGE_SIZE)
-		return vzalloc(size);
-	else
-		return kzalloc(size, GFP_KERNEL);
-}
-
 static void kvm_destroy_devices(struct kvm *kvm)
 {
 	struct kvm_device *dev, *tmp;
@@ -775,7 +763,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
 	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
 
-	memslot->dirty_bitmap = kvm_kvzalloc(dirty_bytes);
+	memslot->dirty_bitmap = kvzalloc(dirty_bytes, GFP_KERNEL);
 	if (!memslot->dirty_bitmap)
 		return -ENOMEM;
 
@@ -995,7 +983,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			goto out_free;
 	}
 
-	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
+	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!slots)
 		goto out_free;
 	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-12 15:37   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

From: Michal Hocko <mhocko@suse.com>

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
in general (note that the page table allocation is GFP_KERNEL). Those
need to be fixed separately.

apparmor has already claimed kv[mz]alloc so remove those and use
__aa_kvmalloc instead to prevent from the naming clashes.

Changes since v3
- add ipc_alloc

Changes since v2
- s@WARN_ON@WARN_ON_ONCE@ as per Vlastimil
- do not fallback to vmalloc for size = PAGE_SIZE as per Vlastimil

Changes since v1
- define __vmalloc_node_flags for CONFIG_MMU=n

Cc: Anatoly Stepanov <astepanov@cloudlinux.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> # ext4 part
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/kvm/lapic.c                 |  4 ++--
 arch/x86/kvm/page_track.c            |  4 ++--
 arch/x86/kvm/x86.c                   |  4 ++--
 drivers/md/dm-stats.c                |  7 +-----
 fs/ext4/mballoc.c                    |  2 +-
 fs/ext4/super.c                      |  4 ++--
 fs/f2fs/f2fs.h                       | 20 -----------------
 fs/f2fs/file.c                       |  4 ++--
 fs/f2fs/segment.c                    | 14 ++++++------
 fs/seq_file.c                        | 16 +-------------
 include/linux/kvm_host.h             |  2 --
 include/linux/mm.h                   | 14 ++++++++++++
 include/linux/vmalloc.h              |  1 +
 ipc/util.c                           |  7 +-----
 mm/nommu.c                           |  5 +++++
 mm/util.c                            | 42 ++++++++++++++++++++++++++++++++++++
 mm/vmalloc.c                         |  2 +-
 security/apparmor/apparmorfs.c       |  2 +-
 security/apparmor/include/apparmor.h | 10 ---------
 security/apparmor/match.c            |  2 +-
 virt/kvm/kvm_main.c                  | 18 +++-------------
 21 files changed, 89 insertions(+), 95 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5fe290c1b7d8..daf114c3b8ad 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -167,8 +167,8 @@ static void recalculate_apic_map(struct kvm *kvm)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
 
-	new = kvm_kvzalloc(sizeof(struct kvm_apic_map) +
-	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1));
+	new = kvzalloc(sizeof(struct kvm_apic_map) +
+	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1), GFP_KERNEL);
 
 	if (!new)
 		goto out;
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index 4a1c13eaa518..d46663e655b0 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -38,8 +38,8 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
 	int  i;
 
 	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		slot->arch.gfn_track[i] = kvm_kvzalloc(npages *
-					    sizeof(*slot->arch.gfn_track[i]));
+		slot->arch.gfn_track[i] = kvzalloc(npages *
+					    sizeof(*slot->arch.gfn_track[i]), GFP_KERNEL);
 		if (!slot->arch.gfn_track[i])
 			goto track_free;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 51ccfe08e32f..ba55bc338f25 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8082,13 +8082,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 				      slot->base_gfn, level) + 1;
 
 		slot->arch.rmap[i] =
-			kvm_kvzalloc(lpages * sizeof(*slot->arch.rmap[i]));
+			kvzalloc(lpages * sizeof(*slot->arch.rmap[i]), GFP_KERNEL);
 		if (!slot->arch.rmap[i])
 			goto out_free;
 		if (i == 0)
 			continue;
 
-		linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
+		linfo = kvzalloc(lpages * sizeof(*linfo), GFP_KERNEL);
 		if (!linfo)
 			goto out_free;
 
diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 38b05f23b96c..674f9a1686f7 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -146,12 +146,7 @@ static void *dm_kvzalloc(size_t alloc_size, int node)
 	if (!claim_shared_memory(alloc_size))
 		return NULL;
 
-	if (alloc_size <= KMALLOC_MAX_SIZE) {
-		p = kzalloc_node(alloc_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, node);
-		if (p)
-			return p;
-	}
-	p = vzalloc_node(alloc_size, node);
+	p = kvzalloc_node(alloc_size, GFP_KERNEL | __GFP_NOMEMALLOC, node);
 	if (p)
 		return p;
 
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d9fd184b049e..31a761dd76f5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2381,7 +2381,7 @@ int ext4_mb_alloc_groupinfo(struct super_block *sb, ext4_group_t ngroups)
 		return 0;
 
 	size = roundup_pow_of_two(sizeof(*sbi->s_group_info) * size);
-	new_groupinfo = ext4_kvzalloc(size, GFP_KERNEL);
+	new_groupinfo = kvzalloc(size, GFP_KERNEL);
 	if (!new_groupinfo) {
 		ext4_msg(sb, KERN_ERR, "can't allocate buddy meta group");
 		return -ENOMEM;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 66845a08a87a..c65fe19a2a4f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2116,7 +2116,7 @@ int ext4_alloc_flex_bg_array(struct super_block *sb, ext4_group_t ngroup)
 		return 0;
 
 	size = roundup_pow_of_two(size * sizeof(struct flex_groups));
-	new_groups = ext4_kvzalloc(size, GFP_KERNEL);
+	new_groups = kvzalloc(size, GFP_KERNEL);
 	if (!new_groups) {
 		ext4_msg(sb, KERN_ERR, "not enough memory for %d flex groups",
 			 size / (int) sizeof(struct flex_groups));
@@ -3850,7 +3850,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 			goto failed_mount;
 		}
 	}
-	sbi->s_group_desc = ext4_kvmalloc(db_count *
+	sbi->s_group_desc = kvmalloc(db_count *
 					  sizeof(struct buffer_head *),
 					  GFP_KERNEL);
 	if (sbi->s_group_desc == NULL) {
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2da8c3aa0ce5..4130df0a8e64 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1929,26 +1929,6 @@ static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi,
 	return kmalloc(size, flags);
 }
 
-static inline void *f2fs_kvmalloc(size_t size, gfp_t flags)
-{
-	void *ret;
-
-	ret = kmalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags, PAGE_KERNEL);
-	return ret;
-}
-
-static inline void *f2fs_kvzalloc(size_t size, gfp_t flags)
-{
-	void *ret;
-
-	ret = kzalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags | __GFP_ZERO, PAGE_KERNEL);
-	return ret;
-}
-
 #define get_inode_mode(i) \
 	((is_inode_flag_set(i, FI_ACL_MODE)) ? \
 	 (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 49f10dce817d..fb2e0c156135 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1013,11 +1013,11 @@ static int __exchange_data_block(struct inode *src_inode,
 	while (len) {
 		olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len);
 
-		src_blkaddr = f2fs_kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
+		src_blkaddr = kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
 		if (!src_blkaddr)
 			return -ENOMEM;
 
-		do_replace = f2fs_kvzalloc(sizeof(int) * olen, GFP_KERNEL);
+		do_replace = kvzalloc(sizeof(int) * olen, GFP_KERNEL);
 		if (!do_replace) {
 			kvfree(src_blkaddr);
 			return -ENOMEM;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0738f48293cc..c50c883bfc1a 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2286,13 +2286,13 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
 
 	SM_I(sbi)->sit_info = sit_i;
 
-	sit_i->sentries = f2fs_kvzalloc(MAIN_SEGS(sbi) *
+	sit_i->sentries = kvzalloc(MAIN_SEGS(sbi) *
 					sizeof(struct seg_entry), GFP_KERNEL);
 	if (!sit_i->sentries)
 		return -ENOMEM;
 
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
-	sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+	sit_i->dirty_sentries_bitmap = kvzalloc(bitmap_size, GFP_KERNEL);
 	if (!sit_i->dirty_sentries_bitmap)
 		return -ENOMEM;
 
@@ -2318,7 +2318,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
 		return -ENOMEM;
 
 	if (sbi->segs_per_sec > 1) {
-		sit_i->sec_entries = f2fs_kvzalloc(MAIN_SECS(sbi) *
+		sit_i->sec_entries = kvzalloc(MAIN_SECS(sbi) *
 					sizeof(struct sec_entry), GFP_KERNEL);
 		if (!sit_i->sec_entries)
 			return -ENOMEM;
@@ -2364,12 +2364,12 @@ static int build_free_segmap(struct f2fs_sb_info *sbi)
 	SM_I(sbi)->free_info = free_i;
 
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
-	free_i->free_segmap = f2fs_kvmalloc(bitmap_size, GFP_KERNEL);
+	free_i->free_segmap = kvmalloc(bitmap_size, GFP_KERNEL);
 	if (!free_i->free_segmap)
 		return -ENOMEM;
 
 	sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
-	free_i->free_secmap = f2fs_kvmalloc(sec_bitmap_size, GFP_KERNEL);
+	free_i->free_secmap = kvmalloc(sec_bitmap_size, GFP_KERNEL);
 	if (!free_i->free_secmap)
 		return -ENOMEM;
 
@@ -2537,7 +2537,7 @@ static int init_victim_secmap(struct f2fs_sb_info *sbi)
 	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
 	unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
 
-	dirty_i->victim_secmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+	dirty_i->victim_secmap = kvzalloc(bitmap_size, GFP_KERNEL);
 	if (!dirty_i->victim_secmap)
 		return -ENOMEM;
 	return 0;
@@ -2559,7 +2559,7 @@ static int build_dirty_segmap(struct f2fs_sb_info *sbi)
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
 
 	for (i = 0; i < NR_DIRTY_TYPE; i++) {
-		dirty_i->dirty_segmap[i] = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+		dirty_i->dirty_segmap[i] = kvzalloc(bitmap_size, GFP_KERNEL);
 		if (!dirty_i->dirty_segmap[i])
 			return -ENOMEM;
 	}
diff --git a/fs/seq_file.c b/fs/seq_file.c
index ca69fb99e41a..dc7c2be963ed 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -25,21 +25,7 @@ static void seq_set_overflow(struct seq_file *m)
 
 static void *seq_buf_alloc(unsigned long size)
 {
-	void *buf;
-	gfp_t gfp = GFP_KERNEL;
-
-	/*
-	 * For high order allocations, use __GFP_NORETRY to avoid oom-killing -
-	 * it's better to fall back to vmalloc() than to kill things.  For small
-	 * allocations, just use GFP_KERNEL which will oom kill, thus no need
-	 * for vmalloc fallback.
-	 */
-	if (size > PAGE_SIZE)
-		gfp |= __GFP_NORETRY | __GFP_NOWARN;
-	buf = kmalloc(size, gfp);
-	if (!buf && size > PAGE_SIZE)
-		buf = vmalloc(size);
-	return buf;
+	return kvmalloc(size, GFP_KERNEL);
 }
 
 /**
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1c5190dab2c1..00e6f93d1ee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -768,8 +768,6 @@ void kvm_arch_check_processor_compat(void *rtn);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
 
-void *kvm_kvzalloc(unsigned long size);
-
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fe6b4036664a..55fd570c3e1e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -484,6 +484,20 @@ static inline int is_vmalloc_or_module_addr(const void *x)
 }
 #endif
 
+extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
+static inline void *kvmalloc(size_t size, gfp_t flags)
+{
+	return kvmalloc_node(size, flags, NUMA_NO_NODE);
+}
+static inline void *kvzalloc_node(size_t size, gfp_t flags, int node)
+{
+	return kvmalloc_node(size, flags | __GFP_ZERO, node);
+}
+static inline void *kvzalloc(size_t size, gfp_t flags)
+{
+	return kvmalloc(size, flags | __GFP_ZERO);
+}
+
 extern void kvfree(const void *addr);
 
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index d68edffbf142..46991ad3ddd5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -80,6 +80,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			unsigned long start, unsigned long end, gfp_t gfp_mask,
 			pgprot_t prot, unsigned long vm_flags, int node,
 			const void *caller);
+extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
diff --git a/ipc/util.c b/ipc/util.c
index 798cad18dd87..74c2adc62086 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -403,12 +403,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
  */
 void *ipc_alloc(int size)
 {
-	void *out;
-	if (size > PAGE_SIZE)
-		out = vmalloc(size);
-	else
-		out = kmalloc(size, GFP_KERNEL);
-	return out;
+	return kvmalloc(size, GFP_KERNEL);
 }
 
 /**
diff --git a/mm/nommu.c b/mm/nommu.c
index 24f9f5f39145..f1927890f75e 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -236,6 +236,11 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
 }
 EXPORT_SYMBOL(__vmalloc);
 
+void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags)
+{
+	return __vmalloc(size, flags, PAGE_KERNEL);
+}
+
 void *vmalloc_user(unsigned long size)
 {
 	void *ret;
diff --git a/mm/util.c b/mm/util.c
index 3cb2164f4099..7e0c240b5760 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_mmap);
 
+/**
+ * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
+ * @size: size of the request.
+ * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
+ * @node: numa node to allocate from
+ *
+ * Uses kmalloc to get the memory but if the allocation fails then falls back
+ * to the vmalloc allocator. Use kvfree for freeing the memory.
+ *
+ * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
+ */
+void *kvmalloc_node(size_t size, gfp_t flags, int node)
+{
+	gfp_t kmalloc_flags = flags;
+	void *ret;
+
+	/*
+	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
+	 * so the given set of flags has to be compatible.
+	 */
+	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
+
+	/*
+	 * Make sure that larger requests are not too disruptive - no OOM
+	 * killer and no allocation failure warnings as we have a fallback
+	 */
+	if (size > PAGE_SIZE)
+		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
+
+	ret = kmalloc_node(size, kmalloc_flags, node);
+
+	/*
+	 * It doesn't really make sense to fallback to vmalloc for sub page
+	 * requests
+	 */
+	if (ret || size <= PAGE_SIZE)
+		return ret;
+
+	return __vmalloc_node_flags(size, node, flags);
+}
+EXPORT_SYMBOL(kvmalloc_node);
+
 void kvfree(const void *addr)
 {
 	if (is_vmalloc_addr(addr))
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3ca82d44edd3..1039b1230889 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1757,7 +1757,7 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-static inline void *__vmalloc_node_flags(unsigned long size,
+void *__vmalloc_node_flags(unsigned long size,
 					int node, gfp_t flags)
 {
 	return __vmalloc_node(size, 1, flags, PAGE_KERNEL,
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 5923d5665209..83789a03379f 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -100,7 +100,7 @@ static char *aa_simple_write_to_buffer(int op, const char __user *userbuf,
 		return ERR_PTR(-EACCES);
 
 	/* freed by caller to simple_write_to_buffer */
-	data = kvmalloc(alloc_size);
+	data = __aa_kvmalloc(alloc_size, 0);
 	if (data == NULL)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/security/apparmor/include/apparmor.h b/security/apparmor/include/apparmor.h
index 5d721e990876..c88fb0ebc756 100644
--- a/security/apparmor/include/apparmor.h
+++ b/security/apparmor/include/apparmor.h
@@ -68,16 +68,6 @@ char *aa_split_fqname(char *args, char **ns_name);
 void aa_info_message(const char *str);
 void *__aa_kvmalloc(size_t size, gfp_t flags);
 
-static inline void *kvmalloc(size_t size)
-{
-	return __aa_kvmalloc(size, 0);
-}
-
-static inline void *kvzalloc(size_t size)
-{
-	return __aa_kvmalloc(size, __GFP_ZERO);
-}
-
 /* returns 0 if kref not incremented */
 static inline int kref_get_not0(struct kref *kref)
 {
diff --git a/security/apparmor/match.c b/security/apparmor/match.c
index 3f900fcca8fb..55f6ae0067a3 100644
--- a/security/apparmor/match.c
+++ b/security/apparmor/match.c
@@ -61,7 +61,7 @@ static struct table_header *unpack_table(char *blob, size_t bsize)
 	if (bsize < tsize)
 		goto out;
 
-	table = kvzalloc(tsize);
+	table = __aa_kvmalloc(tsize, __GFP_ZERO);
 	if (table) {
 		table->td_id = th.td_id;
 		table->td_flags = th.td_flags;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 482612b4e496..dbfe0e79232d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -502,7 +502,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
 	int i;
 	struct kvm_memslots *slots;
 
-	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
+	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!slots)
 		return NULL;
 
@@ -685,18 +685,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	return ERR_PTR(r);
 }
 
-/*
- * Avoid using vmalloc for a small buffer.
- * Should not be used when the size is statically known.
- */
-void *kvm_kvzalloc(unsigned long size)
-{
-	if (size > PAGE_SIZE)
-		return vzalloc(size);
-	else
-		return kzalloc(size, GFP_KERNEL);
-}
-
 static void kvm_destroy_devices(struct kvm *kvm)
 {
 	struct kvm_device *dev, *tmp;
@@ -775,7 +763,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
 	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
 
-	memslot->dirty_bitmap = kvm_kvzalloc(dirty_bytes);
+	memslot->dirty_bitmap = kvzalloc(dirty_bytes, GFP_KERNEL);
 	if (!memslot->dirty_bitmap)
 		return -ENOMEM;
 
@@ -995,7 +983,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			goto out_free;
 	}
 
-	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
+	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!slots)
 		goto out_free;
 	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
  2017-01-12 15:37 ` Michal Hocko
@ 2017-01-12 15:37   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Michael S. Tsirkin

From: Michal Hocko <mhocko@suse.com>

vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
vhost_vsock because it would really like to prefer kmalloc to the
vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
allocation to vmalloc") for more context. Michael Tsirkin has also
noted:
"
__GFP_REPEAT overhead is during allocation time.  Using vmalloc means all
accesses are slowed down.  Allocation is not on data path, accesses are.
"

The similar applies to other vhost_kvzalloc users.

Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two
things to be careful about. First we should prevent from the OOM killer
and so have to involve __GFP_NORETRY by default and secondly override
__GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored
for !costly orders.

Supporting __GFP_REPEAT like semantic for !costly request is possible
it would require changes in the page allocator. This is out of scope of
this patch.

This patch shouldn't introduce any functional change.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/vhost/net.c   |  9 +++------
 drivers/vhost/vhost.c | 15 +++------------
 drivers/vhost/vsock.c |  9 +++------
 mm/util.c             | 17 ++++++++++++++---
 4 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc34653274a..105cd04c7414 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	struct vhost_virtqueue **vqs;
 	int i;
 
-	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!n) {
-		n = vmalloc(sizeof *n);
-		if (!n)
-			return -ENOMEM;
-	}
+	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);
+	if (!n)
+		return -ENOMEM;
 	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
 	if (!vqs) {
 		kvfree(n);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d6432603880c..d2bf8a41f55e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -515,18 +515,9 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
 
-static void *vhost_kvzalloc(unsigned long size)
-{
-	void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-
-	if (!n)
-		n = vzalloc(size);
-	return n;
-}
-
 struct vhost_umem *vhost_dev_reset_owner_prepare(void)
 {
-	return vhost_kvzalloc(sizeof(struct vhost_umem));
+	return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
 
@@ -1190,7 +1181,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
 
 static struct vhost_umem *vhost_umem_alloc(void)
 {
-	struct vhost_umem *umem = vhost_kvzalloc(sizeof(*umem));
+	struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL);
 
 	if (!umem)
 		return NULL;
@@ -1216,7 +1207,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 		return -EOPNOTSUPP;
 	if (mem.nregions > max_mem_regions)
 		return -E2BIG;
-	newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions));
+	newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
 	if (!newmem)
 		return -ENOMEM;
 
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bbbf588540ed..7e0159867553 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -455,12 +455,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	/* This struct is large and allocation could fail, fall back to vmalloc
 	 * if there is no other way.
 	 */
-	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!vsock) {
-		vsock = vmalloc(sizeof(*vsock));
-		if (!vsock)
-			return -ENOMEM;
-	}
+	vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_REPEAT);
+	if (!vsock)
+		return -ENOMEM;
 
 	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
 	if (!vqs) {
diff --git a/mm/util.c b/mm/util.c
index 7e0c240b5760..9306244b9f41 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
  * Uses kmalloc to get the memory but if the allocation fails then falls back
  * to the vmalloc allocator. Use kvfree for freeing the memory.
  *
- * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
+ * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
+ * is supported only for large (>64kB) allocations
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
 {
@@ -350,8 +351,18 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	 * Make sure that larger requests are not too disruptive - no OOM
 	 * killer and no allocation failure warnings as we have a fallback
 	 */
-	if (size > PAGE_SIZE)
-		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
+	if (size > PAGE_SIZE) {
+		kmalloc_flags |= __GFP_NOWARN;
+
+		/*
+		 * We have to override __GFP_REPEAT by __GFP_NORETRY for !costly
+		 * requests because there is no other way to tell the allocator
+		 * that we want to fail rather than retry endlessly.
+		 */
+		if (!(kmalloc_flags & __GFP_REPEAT) ||
+				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
+			kmalloc_flags |= __GFP_NORETRY;
+	}
 
 	ret = kmalloc_node(size, kmalloc_flags, node);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
@ 2017-01-12 15:37   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Michael S. Tsirkin

From: Michal Hocko <mhocko@suse.com>

vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
vhost_vsock because it would really like to prefer kmalloc to the
vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
allocation to vmalloc") for more context. Michael Tsirkin has also
noted:
"
__GFP_REPEAT overhead is during allocation time.  Using vmalloc means all
accesses are slowed down.  Allocation is not on data path, accesses are.
"

The similar applies to other vhost_kvzalloc users.

Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two
things to be careful about. First we should prevent from the OOM killer
and so have to involve __GFP_NORETRY by default and secondly override
__GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored
for !costly orders.

Supporting __GFP_REPEAT like semantic for !costly request is possible
it would require changes in the page allocator. This is out of scope of
this patch.

This patch shouldn't introduce any functional change.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/vhost/net.c   |  9 +++------
 drivers/vhost/vhost.c | 15 +++------------
 drivers/vhost/vsock.c |  9 +++------
 mm/util.c             | 17 ++++++++++++++---
 4 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc34653274a..105cd04c7414 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	struct vhost_virtqueue **vqs;
 	int i;
 
-	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!n) {
-		n = vmalloc(sizeof *n);
-		if (!n)
-			return -ENOMEM;
-	}
+	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);
+	if (!n)
+		return -ENOMEM;
 	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
 	if (!vqs) {
 		kvfree(n);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d6432603880c..d2bf8a41f55e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -515,18 +515,9 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
 
-static void *vhost_kvzalloc(unsigned long size)
-{
-	void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-
-	if (!n)
-		n = vzalloc(size);
-	return n;
-}
-
 struct vhost_umem *vhost_dev_reset_owner_prepare(void)
 {
-	return vhost_kvzalloc(sizeof(struct vhost_umem));
+	return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
 
@@ -1190,7 +1181,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
 
 static struct vhost_umem *vhost_umem_alloc(void)
 {
-	struct vhost_umem *umem = vhost_kvzalloc(sizeof(*umem));
+	struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL);
 
 	if (!umem)
 		return NULL;
@@ -1216,7 +1207,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 		return -EOPNOTSUPP;
 	if (mem.nregions > max_mem_regions)
 		return -E2BIG;
-	newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions));
+	newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
 	if (!newmem)
 		return -ENOMEM;
 
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bbbf588540ed..7e0159867553 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -455,12 +455,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	/* This struct is large and allocation could fail, fall back to vmalloc
 	 * if there is no other way.
 	 */
-	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!vsock) {
-		vsock = vmalloc(sizeof(*vsock));
-		if (!vsock)
-			return -ENOMEM;
-	}
+	vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_REPEAT);
+	if (!vsock)
+		return -ENOMEM;
 
 	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
 	if (!vqs) {
diff --git a/mm/util.c b/mm/util.c
index 7e0c240b5760..9306244b9f41 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
  * Uses kmalloc to get the memory but if the allocation fails then falls back
  * to the vmalloc allocator. Use kvfree for freeing the memory.
  *
- * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
+ * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
+ * is supported only for large (>64kB) allocations
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
 {
@@ -350,8 +351,18 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	 * Make sure that larger requests are not too disruptive - no OOM
 	 * killer and no allocation failure warnings as we have a fallback
 	 */
-	if (size > PAGE_SIZE)
-		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
+	if (size > PAGE_SIZE) {
+		kmalloc_flags |= __GFP_NOWARN;
+
+		/*
+		 * We have to override __GFP_REPEAT by __GFP_NORETRY for !costly
+		 * requests because there is no other way to tell the allocator
+		 * that we want to fail rather than retry endlessly.
+		 */
+		if (!(kmalloc_flags & __GFP_REPEAT) ||
+				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
+			kmalloc_flags |= __GFP_NORETRY;
+	}
 
 	ret = kmalloc_node(size, kmalloc_flags, node);
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 3/6] rhashtable: simplify a strange allocation pattern
  2017-01-12 15:37 ` Michal Hocko
@ 2017-01-12 15:37   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Tom Herbert, Eric Dumazet

From: Michal Hocko <mhocko@suse.com>

alloc_bucket_locks allocation pattern is quite unusual. We are
preferring vmalloc when CONFIG_NUMA is enabled. The rationale is that
vmalloc will respect the memory policy of the current process and so the
backing memory will get distributed over multiple nodes if the requester
is configured properly. At least that is the intention, in reality
rhastable is shrunk and expanded from a kernel worker so no mempolicy
can be assumed.

Let's just simplify the code and use kvmalloc helper, which is a
transparent way to use kmalloc with vmalloc fallback, if the caller
is allowed to block and use the flag otherwise.

Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 lib/rhashtable.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 32d0ad058380..1a487ea70829 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -77,16 +77,9 @@ static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
 	size = min_t(unsigned int, size, tbl->size >> 1);
 
 	if (sizeof(spinlock_t) != 0) {
-		tbl->locks = NULL;
-#ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
-		    gfp == GFP_KERNEL)
-			tbl->locks = vmalloc(size * sizeof(spinlock_t));
-#endif
-		if (gfp != GFP_KERNEL)
-			gfp |= __GFP_NOWARN | __GFP_NORETRY;
-
-		if (!tbl->locks)
+		if (gfpflags_allow_blocking(gfp))
+			tbl->locks = kvmalloc(size * sizeof(spinlock_t), gfp);
+		else
 			tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
 						   gfp);
 		if (!tbl->locks)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 3/6] rhashtable: simplify a strange allocation pattern
@ 2017-01-12 15:37   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Tom Herbert, Eric Dumazet

From: Michal Hocko <mhocko@suse.com>

alloc_bucket_locks allocation pattern is quite unusual. We are
preferring vmalloc when CONFIG_NUMA is enabled. The rationale is that
vmalloc will respect the memory policy of the current process and so the
backing memory will get distributed over multiple nodes if the requester
is configured properly. At least that is the intention, in reality
rhastable is shrunk and expanded from a kernel worker so no mempolicy
can be assumed.

Let's just simplify the code and use kvmalloc helper, which is a
transparent way to use kmalloc with vmalloc fallback, if the caller
is allowed to block and use the flag otherwise.

Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 lib/rhashtable.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 32d0ad058380..1a487ea70829 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -77,16 +77,9 @@ static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
 	size = min_t(unsigned int, size, tbl->size >> 1);
 
 	if (sizeof(spinlock_t) != 0) {
-		tbl->locks = NULL;
-#ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
-		    gfp == GFP_KERNEL)
-			tbl->locks = vmalloc(size * sizeof(spinlock_t));
-#endif
-		if (gfp != GFP_KERNEL)
-			gfp |= __GFP_NOWARN | __GFP_NORETRY;
-
-		if (!tbl->locks)
+		if (gfpflags_allow_blocking(gfp))
+			tbl->locks = kvmalloc(size * sizeof(spinlock_t), gfp);
+		else
 			tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
 						   gfp);
 		if (!tbl->locks)
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 4/6] ila: simplify a strange allocation pattern
  2017-01-12 15:37 ` Michal Hocko
@ 2017-01-12 15:37   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Tom Herbert, Eric Dumazet

From: Michal Hocko <mhocko@suse.com>

alloc_ila_locks seemed to c&p from alloc_bucket_locks allocation
pattern which is quite unusual. The default allocation size is 320 *
sizeof(spinlock_t) which is sub page unless lockdep is enabled when the
performance benefit is really questionable and not worth the subtle code
IMHO. Also note that the context when we call ila_init_net (modprobe or
a task creating a net namespace) has to be properly configured.

Let's just simplify the code and use kvmalloc helper which is a
transparent way to use kmalloc with vmalloc fallback.

Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 net/ipv6/ila/ila_xlat.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index af8f52ee7180..2fd5ca151dcf 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -41,13 +41,7 @@ static int alloc_ila_locks(struct ila_net *ilan)
 	size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU);
 
 	if (sizeof(spinlock_t) != 0) {
-#ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE)
-			ilan->locks = vmalloc(size * sizeof(spinlock_t));
-		else
-#endif
-		ilan->locks = kmalloc_array(size, sizeof(spinlock_t),
-					    GFP_KERNEL);
+		ilan->locks = kvmalloc(size * sizeof(spinlock_t), GFP_KERNEL);
 		if (!ilan->locks)
 			return -ENOMEM;
 		for (i = 0; i < size; i++)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 4/6] ila: simplify a strange allocation pattern
@ 2017-01-12 15:37   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Tom Herbert, Eric Dumazet

From: Michal Hocko <mhocko@suse.com>

alloc_ila_locks seemed to c&p from alloc_bucket_locks allocation
pattern which is quite unusual. The default allocation size is 320 *
sizeof(spinlock_t) which is sub page unless lockdep is enabled when the
performance benefit is really questionable and not worth the subtle code
IMHO. Also note that the context when we call ila_init_net (modprobe or
a task creating a net namespace) has to be properly configured.

Let's just simplify the code and use kvmalloc helper which is a
transparent way to use kmalloc with vmalloc fallback.

Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 net/ipv6/ila/ila_xlat.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index af8f52ee7180..2fd5ca151dcf 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -41,13 +41,7 @@ static int alloc_ila_locks(struct ila_net *ilan)
 	size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU);
 
 	if (sizeof(spinlock_t) != 0) {
-#ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE)
-			ilan->locks = vmalloc(size * sizeof(spinlock_t));
-		else
-#endif
-		ilan->locks = kmalloc_array(size, sizeof(spinlock_t),
-					    GFP_KERNEL);
+		ilan->locks = kvmalloc(size * sizeof(spinlock_t), GFP_KERNEL);
 		if (!ilan->locks)
 			return -ENOMEM;
 		for (i = 0; i < size; i++)
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37 ` Michal Hocko
  (?)
@ 2017-01-12 15:37   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev

From: Michal Hocko <mhocko@suse.com>

There are many code paths opencoding kvmalloc. Let's use the helper
instead. The main difference to kvmalloc is that those users are usually
not considering all the aspects of the memory allocator. E.g. allocation
requests < 64kB are basically never failing and invoke OOM killer to
satisfy the allocation. This sounds too disruptive for something that
has a reasonable fallback - the vmalloc. On the other hand those
requests might fallback to vmalloc even when the memory allocator would
succeed after several more reclaim/compaction attempts previously. There
is no guarantee something like that happens though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Sterba <dsterba@suse.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/kvm/kvm-s390.c                           | 10 ++-----
 crypto/lzo.c                                       |  4 +--
 drivers/acpi/apei/erst.c                           |  8 ++---
 drivers/char/agp/generic.c                         |  8 +----
 drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
 drivers/md/bcache/util.h                           | 12 ++------
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
 drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
 drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
 drivers/nvdimm/dimm_devs.c                         |  5 +---
 .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
 drivers/xen/evtchn.c                               | 14 +--------
 fs/btrfs/ctree.c                                   |  9 ++----
 fs/btrfs/ioctl.c                                   |  9 ++----
 fs/btrfs/send.c                                    | 27 ++++++-----------
 fs/ceph/file.c                                     |  9 ++----
 fs/select.c                                        |  5 +---
 fs/xattr.c                                         | 27 ++++++-----------
 kernel/bpf/hashtab.c                               | 11 ++-----
 lib/iov_iter.c                                     |  5 +---
 mm/frame_vector.c                                  |  5 +---
 net/ipv4/inet_hashtables.c                         |  6 +---
 net/ipv4/tcp_metrics.c                             |  5 +---
 net/mpls/af_mpls.c                                 |  5 +---
 net/netfilter/x_tables.c                           | 34 ++++++----------------
 net/netfilter/xt_recent.c                          |  5 +---
 net/sched/sch_choke.c                              |  5 +---
 net/sched/sch_fq_codel.c                           | 26 ++++-------------
 net/sched/sch_hhf.c                                | 33 ++++++---------------
 net/sched/sch_netem.c                              |  6 +---
 net/sched/sch_sfq.c                                |  6 +---
 security/keys/keyctl.c                             | 22 ++++----------
 35 files changed, 96 insertions(+), 319 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4f74511015b8..e6bbb33d2956 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
@@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
diff --git a/crypto/lzo.c b/crypto/lzo.c
index 168df784da84..218567d717d6 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
 {
 	void *ctx;
 
-	ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
-	if (!ctx)
-		ctx = vmalloc(LZO1X_MEM_COMPRESS);
+	ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index ec4f507b524f..a2898df61744 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
 	if (i < erst_record_id_cache.len)
 		goto retry;
 	if (erst_record_id_cache.len >= erst_record_id_cache.size) {
-		int new_size, alloc_size;
+		int new_size;
 		u64 *new_entries;
 
 		new_size = erst_record_id_cache.size * 2;
@@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
 				pr_warn(FW_WARN "too many record IDs!\n");
 			return 0;
 		}
-		alloc_size = new_size * sizeof(entries[0]);
-		if (alloc_size < PAGE_SIZE)
-			new_entries = kmalloc(alloc_size, GFP_KERNEL);
-		else
-			new_entries = vmalloc(alloc_size);
+		new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
 		if (!new_entries)
 			return -ENOMEM;
 		memcpy(new_entries, entries,
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index f002fa5d1887..bdf418cac8ef 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -88,13 +88,7 @@ static int agp_get_key(void)
 
 void agp_alloc_page_array(size_t size, struct agp_memory *mem)
 {
-	mem->pages = NULL;
-
-	if (size <= 2*PAGE_SIZE)
-		mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (mem->pages == NULL) {
-		mem->pages = vmalloc(size);
-	}
+	mem->pages = kvmalloc(size, GFP_KERNEL);
 }
 EXPORT_SYMBOL(agp_alloc_page_array);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 201b52b750dd..77dd73ff126f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
 
 	size *= nmemb;
 
-	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!mem)
-		mem = vmalloc(size);
+	mem = kvmalloc(size, GFP_KERNEL);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index cf2cbc211d83..d00bcb64d3a8 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -43,11 +43,7 @@ struct closure;
 	(heap)->used = 0;						\
 	(heap)->size = (_size);						\
 	_bytes = (heap)->size * sizeof(*(heap)->data);			\
-	(heap)->data = NULL;						\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(heap)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(heap)->data) && ((gfp) & GFP_KERNEL))			\
-		(heap)->data = vmalloc(_bytes);				\
+	(heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(heap)->data;							\
 })
 
@@ -136,12 +132,8 @@ do {									\
 									\
 	(fifo)->mask = _allocated_size - 1;				\
 	(fifo)->front = (fifo)->back = 0;				\
-	(fifo)->data = NULL;						\
 									\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(fifo)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))			\
-		(fifo)->data = vmalloc(_bytes);				\
+	(fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(fifo)->data;							\
 })
 
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
index 920d918ed193..f04e81f33795 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
@@ -41,9 +41,6 @@
 
 #define VALIDATE_TID 1
 
-void *cxgb_alloc_mem(unsigned long size);
-void cxgb_free_mem(void *addr);
-
 /*
  * Map an ATID or STID to their entries in the corresponding TID tables.
  */
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
index 76684dcb874c..606d4a3ade04 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
@@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
 }
 
 /*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *cxgb_alloc_mem(unsigned long size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through t3_alloc_mem().
- */
-void cxgb_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
-/*
  * Allocate and initialize the TID tables.  Returns 0 on success.
  */
 static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
@@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 	unsigned long size = ntids * sizeof(*t->tid_tab) +
 	    natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
 
-	t->tid_tab = cxgb_alloc_mem(size);
+	t->tid_tab = kvmalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 
 static void free_tid_maps(struct tid_info *t)
 {
-	cxgb_free_mem(t->tid_tab);
+	kvfree(t->tid_tab);
 }
 
 static inline void add_adapter(struct adapter *adap)
diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
index 5f226eda8cd6..c9b06501ee0c 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
@@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
 	struct l2t_data *d;
 	int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
 
-	d = cxgb_alloc_mem(size);
+	d = kvmalloc(size, GFP_KERNEL);
 	if (!d)
 		return NULL;
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 6f951877430b..671695cb3c15 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
 	return err;
 }
 
-/*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *t4_alloc_mem(size_t size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through alloc_mem().
- */
-void t4_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
 static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
 			     void *accel_priv, select_queue_fallback_t fallback)
 {
@@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
 	       max_ftids * sizeof(*t->ftid_tab) +
 	       ftid_bmap_size * sizeof(long);
 
-	t->tid_tab = t4_alloc_mem(size);
+	t->tid_tab = kvmalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
 		/* allocate memory to read the header of the firmware on the
 		 * card
 		 */
-		card_fw = t4_alloc_mem(sizeof(*card_fw));
+		card_fw = kvmalloc(sizeof(*card_fw), GFP_KERNEL);
 
 		/* Get FW from from /lib/firmware/ */
 		ret = request_firmware(&fw, fw_info->fw_mod_name,
@@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
 
 		/* Cleaning up */
 		release_firmware(fw);
-		t4_free_mem(card_fw);
+		kvfree(card_fw);
 
 		if (ret < 0)
 			goto bye;
@@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
 {
 	unsigned int i;
 
-	t4_free_mem(adapter->l2t);
+	kvfree(adapter->l2t);
 	t4_cleanup_sched(adapter);
-	t4_free_mem(adapter->tids.tid_tab);
+	kvfree(adapter->tids.tid_tab);
 	cxgb4_cleanup_tc_u32(adapter);
 	kfree(adapter->sge.egr_map);
 	kfree(adapter->sge.ingr_map);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 5886ad78058f..a5c1b815145e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
 
 	tmp = size * sizeof(struct mlx4_en_tx_info);
-	ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
+	ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
 	if (!ring->tx_info) {
-		ring->tx_info = vmalloc(tmp);
-		if (!ring->tx_info) {
-			err = -ENOMEM;
-			goto err_ring;
-		}
+		err = -ENOMEM;
+		goto err_ring;
 	}
 
 	en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index 395b5463cfd9..82354fd0a87e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
 
 	for (i = 0; i <= buddy->max_order; ++i) {
 		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-		buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
-		if (!buddy->bits[i]) {
-			buddy->bits[i] = vzalloc(s * sizeof(long));
-			if (!buddy->bits[i])
-				goto err_out_free;
-		}
+		buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
+		if (!buddy->bits[i])
+			goto err_out_free;
 	}
 
 	set_bit(0, buddy->bits[buddy->max_order]);
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 0eedc49e0d47..3bd332b167d9 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
 		return -ENXIO;
 	}
 
-	ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
-	if (!ndd->data)
-		ndd->data = vmalloc(ndd->nsarea.config_size);
-
+	ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
 	if (!ndd->data)
 		return -ENOMEM;
 
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
index a6a76a681ea9..8f638267e704 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
@@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
 void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
 			  gfp_t flags)
 {
-	void *ret;
-
-	ret = kzalloc_node(size, flags | __GFP_NOWARN,
-			   cfs_cpt_spread_node(cptab, cpt));
-	if (!ret) {
-		WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
-		ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
-	}
-
-	return ret;
+	return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
 }
 EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 6890897a6f30..10f1ef582659 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -87,18 +87,6 @@ struct user_evtchn {
 	bool enabled;
 };
 
-static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
-{
-	evtchn_port_t *ring;
-	size_t s = size * sizeof(*ring);
-
-	ring = kmalloc(s, GFP_KERNEL);
-	if (!ring)
-		ring = vmalloc(s);
-
-	return ring;
-}
-
 static void evtchn_free_ring(evtchn_port_t *ring)
 {
 	kvfree(ring);
@@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
 	else
 		new_size = 2 * u->ring_size;
 
-	new_ring = evtchn_alloc_ring(new_size);
+	new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
 	if (!new_ring)
 		return -ENOMEM;
 
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 146b2dc0d2cf..4fc9712d927d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
 		goto out;
 	}
 
-	tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
+	tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
 	if (!tmp_buf) {
-		tmp_buf = vmalloc(fs_info->nodesize);
-		if (!tmp_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	left_path->search_commit_root = 1;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 77dabfed3a5d..6f0b488c7428 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 	u64 last_dest_end = destoff;
 
 	ret = -ENOMEM;
-	buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
-	if (!buf) {
-		buf = vmalloc(fs_info->nodesize);
-		if (!buf)
-			return ret;
-	}
+	buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
+	if (!buf)
+		return ret;
 
 	path = btrfs_alloc_path();
 	if (!path) {
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index d145ce804620..0621ca2a7b5d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
 	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
-	sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
+	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
-		sctx->send_buf = vmalloc(sctx->send_max_size);
-		if (!sctx->send_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
-	sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
+	sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
 	if (!sctx->read_buf) {
-		sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
-		if (!sctx->read_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	sctx->pending_dir_moves = RB_ROOT;
@@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
 
 	if (arg->clone_sources_count) {
-		clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
+		clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!clone_sources_tmp) {
-			clone_sources_tmp = vmalloc(alloc_size);
-			if (!clone_sources_tmp) {
-				ret = -ENOMEM;
-				goto out;
-			}
+			ret = -ENOMEM;
+			goto out;
 		}
 
 		ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 045d30d26624..78b18acf33ba 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
 	align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
 		(PAGE_SIZE - 1);
 	npages = calc_pages_for(align, nbytes);
-	pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
-	if (!pages) {
-		pages = vmalloc(sizeof(*pages) * npages);
-		if (!pages)
-			return ERR_PTR(-ENOMEM);
-	}
+	pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
+	if (!pages)
+		return ERR_PTR(-ENOMEM);
 
 	for (idx = 0; idx < npages; ) {
 		size_t start;
diff --git a/fs/select.c b/fs/select.c
index 305c0daf5d67..9e8e1189eb99 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -586,10 +586,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
 			goto out_nofds;
 
 		alloc_size = 6 * size;
-		bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
-		if (!bits && alloc_size > PAGE_SIZE)
-			bits = vmalloc(alloc_size);
-
+		bits = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!bits)
 			goto out_nofds;
 	}
diff --git a/fs/xattr.c b/fs/xattr.c
index 7e3317cf4045..4269a7c26db7 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -431,12 +431,9 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			return -E2BIG;
-		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 		if (copy_from_user(kvalue, value, size)) {
 			error = -EFAULT;
 			goto out;
@@ -528,12 +525,9 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			size = XATTR_SIZE_MAX;
-		kvalue = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 	}
 
 	error = vfs_getxattr(d, kname, kvalue, size);
@@ -611,12 +605,9 @@ listxattr(struct dentry *d, char __user *list, size_t size)
 	if (size) {
 		if (size > XATTR_LIST_MAX)
 			size = XATTR_LIST_MAX;
-		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
-		if (!klist) {
-			klist = vmalloc(size);
-			if (!klist)
-				return -ENOMEM;
-		}
+		klist = kvmalloc(size, GFP_KERNEL);
+		if (!klist)
+			return -ENOMEM;
 	}
 
 	error = vfs_listxattr(d, klist, size);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 34debc1a9641..4ca30a951bbc 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -320,14 +320,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		goto free_htab;
 
 	err = -ENOMEM;
-	htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket),
-				      GFP_USER | __GFP_NOWARN);
-
-	if (!htab->buckets) {
-		htab->buckets = vmalloc(htab->n_buckets * sizeof(struct bucket));
-		if (!htab->buckets)
-			goto free_htab;
-	}
+	htab->buckets = kvmalloc(htab->n_buckets * sizeof(struct bucket), GFP_USER);
+	if (!htab->buckets)
+		goto free_htab;
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_HEAD(&htab->buckets[i].head);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 25f572303801..45c17b5562b5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -957,10 +957,7 @@ EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
 {
-	struct page **p = kmalloc(n * sizeof(struct page *), GFP_KERNEL);
-	if (!p)
-		p = vmalloc(n * sizeof(struct page *));
-	return p;
+	return kvmalloc(n * sizeof(struct page *), GFP_KERNEL);
 }
 
 static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index db77dcb38afd..72ebec18629c 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -200,10 +200,7 @@ struct frame_vector *frame_vector_create(unsigned int nr_frames)
 	 * Avoid higher order allocations, use vmalloc instead. It should
 	 * be rare anyway.
 	 */
-	if (size <= PAGE_SIZE)
-		vec = kmalloc(size, GFP_KERNEL);
-	else
-		vec = vmalloc(size);
+	vec = kvmalloc(size, GFP_KERNEL);
 	if (!vec)
 		return NULL;
 	vec->nr_allocated = nr_frames;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ca97835bfec4..a46a9fd8b540 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -687,11 +687,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 		/* no more locks than number of hash buckets */
 		nblocks = min(nblocks, hashinfo->ehash_mask + 1);
 
-		hashinfo->ehash_locks =	kmalloc_array(nblocks, locksz,
-						      GFP_KERNEL | __GFP_NOWARN);
-		if (!hashinfo->ehash_locks)
-			hashinfo->ehash_locks = vmalloc(nblocks * locksz);
-
+		hashinfo->ehash_locks = kvmalloc(nblocks * locksz, GFP_KERNEL);
 		if (!hashinfo->ehash_locks)
 			return -ENOMEM;
 
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index d46f4d5b1c62..39b2166d3be8 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -1155,10 +1155,7 @@ static int __net_init tcp_net_metrics_init(struct net *net)
 	tcp_metrics_hash_log = order_base_2(slots);
 	size = sizeof(struct tcpm_hash_bucket) << tcp_metrics_hash_log;
 
-	tcp_metrics_hash = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!tcp_metrics_hash)
-		tcp_metrics_hash = vzalloc(size);
-
+	tcp_metrics_hash = kvzalloc(size, GFP_KERNEL);
 	if (!tcp_metrics_hash)
 		return -ENOMEM;
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 15fe97644ffe..a0c82ef74389 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1525,10 +1525,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 	unsigned index;
 
 	if (size) {
-		labels = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-		if (!labels)
-			labels = vzalloc(size);
-
+		labels = kvzalloc(size, GFP_KERNEL);
 		if (!labels)
 			goto nolabels;
 	}
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index a011322a027d..eeed0af3ea25 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,17 +712,11 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-	unsigned int *off;
-
-	off = kcalloc(size, sizeof(unsigned int), GFP_KERNEL | __GFP_NOWARN);
-
-	if (off)
-		return off;
-
 	if (size < (SIZE_MAX / sizeof(unsigned int)))
-		off = vmalloc(size * sizeof(unsigned int));
+		return kvmalloc(size * sizeof(unsigned int), GFP_KERNEL);
+
+	return NULL;
 
-	return off;
 }
 EXPORT_SYMBOL(xt_alloc_entry_offsets);
 
@@ -956,15 +950,9 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (!info) {
-		info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN |
-				     __GFP_NORETRY | __GFP_HIGHMEM,
-				 PAGE_KERNEL);
-		if (!info)
-			return NULL;
-	}
+	info = kvmalloc(sz, GFP_KERNEL);
+	if (!info)
+		return NULL;
 	memset(info, 0, sizeof(*info));
 	info->size = size;
 	return info;
@@ -1066,7 +1054,7 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 
 	size = sizeof(void **) * nr_cpu_ids;
 	if (size > PAGE_SIZE)
-		i->jumpstack = vzalloc(size);
+		i->jumpstack = kvzalloc(size, GFP_KERNEL);
 	else
 		i->jumpstack = kzalloc(size, GFP_KERNEL);
 	if (i->jumpstack == NULL)
@@ -1088,12 +1076,8 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	 */
 	size = sizeof(void *) * i->stacksize * 2u;
 	for_each_possible_cpu(cpu) {
-		if (size > PAGE_SIZE)
-			i->jumpstack[cpu] = vmalloc_node(size,
-				cpu_to_node(cpu));
-		else
-			i->jumpstack[cpu] = kmalloc_node(size,
-				GFP_KERNEL, cpu_to_node(cpu));
+		i->jumpstack[cpu] = kvmalloc_node(size, GFP_KERNEL,
+			cpu_to_node(cpu));
 		if (i->jumpstack[cpu] == NULL)
 			/*
 			 * Freeing will be done later on by the callers. The
diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index 1d89a4eaf841..d6aa8f63ed2e 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -388,10 +388,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par,
 	}
 
 	sz = sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size;
-	if (sz <= PAGE_SIZE)
-		t = kzalloc(sz, GFP_KERNEL);
-	else
-		t = vzalloc(sz);
+	t = kvzalloc(sz, GFP_KERNEL);
 	if (t == NULL) {
 		ret = -ENOMEM;
 		goto out;
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 3b6d5bd69101..30d6a39fd2c8 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -431,10 +431,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 	if (mask != q->tab_mask) {
 		struct sk_buff **ntab;
 
-		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *),
-			       GFP_KERNEL | __GFP_NOWARN);
-		if (!ntab)
-			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		ntab = kvzalloc((mask + 1) * sizeof(struct sk_buff *), GFP_KERNEL);
 		if (!ntab)
 			return -ENOMEM;
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index a5ea0e9b6be4..04e2d006f277 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -449,27 +449,13 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
 	return 0;
 }
 
-static void *fq_codel_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-	return ptr;
-}
-
-static void fq_codel_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void fq_codel_destroy(struct Qdisc *sch)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 
 	tcf_destroy_chain(&q->filter_list);
-	fq_codel_free(q->backlogs);
-	fq_codel_free(q->flows);
+	kvfree(q->backlogs);
+	kvfree(q->flows);
 }
 
 static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
@@ -497,13 +483,13 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
 	}
 
 	if (!q->flows) {
-		q->flows = fq_codel_zalloc(q->flows_cnt *
-					   sizeof(struct fq_codel_flow));
+		q->flows = kvmalloc(q->flows_cnt *
+					   sizeof(struct fq_codel_flow), GFP_KERNEL);
 		if (!q->flows)
 			return -ENOMEM;
-		q->backlogs = fq_codel_zalloc(q->flows_cnt * sizeof(u32));
+		q->backlogs = kvmalloc(q->flows_cnt * sizeof(u32), GFP_KERNEL);
 		if (!q->backlogs) {
-			fq_codel_free(q->flows);
+			kvfree(q->flows);
 			return -ENOMEM;
 		}
 		for (i = 0; i < q->flows_cnt; i++) {
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index e3d0458af17b..858b2de5db59 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -467,29 +467,14 @@ static void hhf_reset(struct Qdisc *sch)
 		rtnl_kfree_skbs(skb, skb);
 }
 
-static void *hhf_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-
-	return ptr;
-}
-
-static void hhf_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void hhf_destroy(struct Qdisc *sch)
 {
 	int i;
 	struct hhf_sched_data *q = qdisc_priv(sch);
 
 	for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-		hhf_free(q->hhf_arrays[i]);
-		hhf_free(q->hhf_valid_bits[i]);
+		kvfree(q->hhf_arrays[i]);
+		kvfree(q->hhf_valid_bits[i]);
 	}
 
 	for (i = 0; i < HH_FLOWS_CNT; i++) {
@@ -503,7 +488,7 @@ static void hhf_destroy(struct Qdisc *sch)
 			kfree(flow);
 		}
 	}
-	hhf_free(q->hh_flows);
+	kvfree(q->hh_flows);
 }
 
 static const struct nla_policy hhf_policy[TCA_HHF_MAX + 1] = {
@@ -609,8 +594,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 	if (!q->hh_flows) {
 		/* Initialize heavy-hitter flow table. */
-		q->hh_flows = hhf_zalloc(HH_FLOWS_CNT *
-					 sizeof(struct list_head));
+		q->hh_flows = kvmalloc(HH_FLOWS_CNT *
+					 sizeof(struct list_head), GFP_KERNEL);
 		if (!q->hh_flows)
 			return -ENOMEM;
 		for (i = 0; i < HH_FLOWS_CNT; i++)
@@ -624,8 +609,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_arrays[i] = hhf_zalloc(HHF_ARRAYS_LEN *
-						      sizeof(u32));
+			q->hhf_arrays[i] = kvmalloc(HHF_ARRAYS_LEN *
+						      sizeof(u32), GFP_KERNEL);
 			if (!q->hhf_arrays[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
@@ -635,8 +620,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize valid bits of heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_valid_bits[i] = hhf_zalloc(HHF_ARRAYS_LEN /
-							  BITS_PER_BYTE);
+			q->hhf_valid_bits[i] = kvmalloc(HHF_ARRAYS_LEN /
+							  BITS_PER_BYTE, GFP_KERNEL);
 			if (!q->hhf_valid_bits[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bcfadfdea8e0..08a3d2af1792 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -692,15 +692,11 @@ static int get_dist_table(struct Qdisc *sch, const struct nlattr *attr)
 	spinlock_t *root_lock;
 	struct disttable *d;
 	int i;
-	size_t s;
 
 	if (n > NETEM_DIST_MAX)
 		return -EINVAL;
 
-	s = sizeof(struct disttable) + n * sizeof(s16);
-	d = kmalloc(s, GFP_KERNEL | __GFP_NOWARN);
-	if (!d)
-		d = vmalloc(s);
+	d = kvmalloc(sizeof(struct disttable) + n * sizeof(s16), GFP_KERNEL);
 	if (!d)
 		return -ENOMEM;
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 7f195ed4d568..5d70cd6a032d 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -684,11 +684,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
 static void *sfq_alloc(size_t sz)
 {
-	void *ptr = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vmalloc(sz);
-	return ptr;
+	return  kvmalloc(sz, GFP_KERNEL);
 }
 
 static void sfq_free(void *addr)
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 38c00e867bda..a5c21f05ece4 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -99,14 +99,9 @@ SYSCALL_DEFINE5(add_key, const char __user *, _type,
 
 	if (_payload) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error2;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error2;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error2;
 
 		ret = -EFAULT;
 		if (copy_from_user(payload, _payload, plen) != 0)
@@ -1064,14 +1059,9 @@ long keyctl_instantiate_key_common(key_serial_t id,
 
 	if (from) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error;
 
 		ret = -EFAULT;
 		if (!copy_from_iter_full(payload, plen, from))
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 15:37   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin

From: Michal Hocko <mhocko@suse.com>

There are many code paths opencoding kvmalloc. Let's use the helper
instead. The main difference to kvmalloc is that those users are usually
not considering all the aspects of the memory allocator. E.g. allocation
requests < 64kB are basically never failing and invoke OOM killer to
satisfy the allocation. This sounds too disruptive for something that
has a reasonable fallback - the vmalloc. On the other hand those
requests might fallback to vmalloc even when the memory allocator would
succeed after several more reclaim/compaction attempts previously. There
is no guarantee something like that happens though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Sterba <dsterba@suse.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/kvm/kvm-s390.c                           | 10 ++-----
 crypto/lzo.c                                       |  4 +--
 drivers/acpi/apei/erst.c                           |  8 ++---
 drivers/char/agp/generic.c                         |  8 +----
 drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
 drivers/md/bcache/util.h                           | 12 ++------
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
 drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
 drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
 drivers/nvdimm/dimm_devs.c                         |  5 +---
 .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
 drivers/xen/evtchn.c                               | 14 +--------
 fs/btrfs/ctree.c                                   |  9 ++----
 fs/btrfs/ioctl.c                                   |  9 ++----
 fs/btrfs/send.c                                    | 27 ++++++-----------
 fs/ceph/file.c                                     |  9 ++----
 fs/select.c                                        |  5 +---
 fs/xattr.c                                         | 27 ++++++-----------
 kernel/bpf/hashtab.c                               | 11 ++-----
 lib/iov_iter.c                                     |  5 +---
 mm/frame_vector.c                                  |  5 +---
 net/ipv4/inet_hashtables.c                         |  6 +---
 net/ipv4/tcp_metrics.c                             |  5 +---
 net/mpls/af_mpls.c                                 |  5 +---
 net/netfilter/x_tables.c                           | 34 ++++++----------------
 net/netfilter/xt_recent.c                          |  5 +---
 net/sched/sch_choke.c                              |  5 +---
 net/sched/sch_fq_codel.c                           | 26 ++++-------------
 net/sched/sch_hhf.c                                | 33 ++++++---------------
 net/sched/sch_netem.c                              |  6 +---
 net/sched/sch_sfq.c                                |  6 +---
 security/keys/keyctl.c                             | 22 ++++----------
 35 files changed, 96 insertions(+), 319 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4f74511015b8..e6bbb33d2956 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
@@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
diff --git a/crypto/lzo.c b/crypto/lzo.c
index 168df784da84..218567d717d6 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
 {
 	void *ctx;
 
-	ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
-	if (!ctx)
-		ctx = vmalloc(LZO1X_MEM_COMPRESS);
+	ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index ec4f507b524f..a2898df61744 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
 	if (i < erst_record_id_cache.len)
 		goto retry;
 	if (erst_record_id_cache.len >= erst_record_id_cache.size) {
-		int new_size, alloc_size;
+		int new_size;
 		u64 *new_entries;
 
 		new_size = erst_record_id_cache.size * 2;
@@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
 				pr_warn(FW_WARN "too many record IDs!\n");
 			return 0;
 		}
-		alloc_size = new_size * sizeof(entries[0]);
-		if (alloc_size < PAGE_SIZE)
-			new_entries = kmalloc(alloc_size, GFP_KERNEL);
-		else
-			new_entries = vmalloc(alloc_size);
+		new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
 		if (!new_entries)
 			return -ENOMEM;
 		memcpy(new_entries, entries,
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index f002fa5d1887..bdf418cac8ef 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -88,13 +88,7 @@ static int agp_get_key(void)
 
 void agp_alloc_page_array(size_t size, struct agp_memory *mem)
 {
-	mem->pages = NULL;
-
-	if (size <= 2*PAGE_SIZE)
-		mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (mem->pages == NULL) {
-		mem->pages = vmalloc(size);
-	}
+	mem->pages = kvmalloc(size, GFP_KERNEL);
 }
 EXPORT_SYMBOL(agp_alloc_page_array);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 201b52b750dd..77dd73ff126f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
 
 	size *= nmemb;
 
-	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!mem)
-		mem = vmalloc(size);
+	mem = kvmalloc(size, GFP_KERNEL);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index cf2cbc211d83..d00bcb64d3a8 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -43,11 +43,7 @@ struct closure;
 	(heap)->used = 0;						\
 	(heap)->size = (_size);						\
 	_bytes = (heap)->size * sizeof(*(heap)->data);			\
-	(heap)->data = NULL;						\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(heap)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(heap)->data) && ((gfp) & GFP_KERNEL))			\
-		(heap)->data = vmalloc(_bytes);				\
+	(heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(heap)->data;							\
 })
 
@@ -136,12 +132,8 @@ do {									\
 									\
 	(fifo)->mask = _allocated_size - 1;				\
 	(fifo)->front = (fifo)->back = 0;				\
-	(fifo)->data = NULL;						\
 									\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(fifo)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))			\
-		(fifo)->data = vmalloc(_bytes);				\
+	(fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(fifo)->data;							\
 })
 
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
index 920d918ed193..f04e81f33795 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
@@ -41,9 +41,6 @@
 
 #define VALIDATE_TID 1
 
-void *cxgb_alloc_mem(unsigned long size);
-void cxgb_free_mem(void *addr);
-
 /*
  * Map an ATID or STID to their entries in the corresponding TID tables.
  */
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
index 76684dcb874c..606d4a3ade04 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
@@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
 }
 
 /*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *cxgb_alloc_mem(unsigned long size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through t3_alloc_mem().
- */
-void cxgb_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
-/*
  * Allocate and initialize the TID tables.  Returns 0 on success.
  */
 static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
@@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 	unsigned long size = ntids * sizeof(*t->tid_tab) +
 	    natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
 
-	t->tid_tab = cxgb_alloc_mem(size);
+	t->tid_tab = kvmalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 
 static void free_tid_maps(struct tid_info *t)
 {
-	cxgb_free_mem(t->tid_tab);
+	kvfree(t->tid_tab);
 }
 
 static inline void add_adapter(struct adapter *adap)
diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
index 5f226eda8cd6..c9b06501ee0c 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
@@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
 	struct l2t_data *d;
 	int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
 
-	d = cxgb_alloc_mem(size);
+	d = kvmalloc(size, GFP_KERNEL);
 	if (!d)
 		return NULL;
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 6f951877430b..671695cb3c15 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
 	return err;
 }
 
-/*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *t4_alloc_mem(size_t size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through alloc_mem().
- */
-void t4_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
 static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
 			     void *accel_priv, select_queue_fallback_t fallback)
 {
@@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
 	       max_ftids * sizeof(*t->ftid_tab) +
 	       ftid_bmap_size * sizeof(long);
 
-	t->tid_tab = t4_alloc_mem(size);
+	t->tid_tab = kvmalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
 		/* allocate memory to read the header of the firmware on the
 		 * card
 		 */
-		card_fw = t4_alloc_mem(sizeof(*card_fw));
+		card_fw = kvmalloc(sizeof(*card_fw), GFP_KERNEL);
 
 		/* Get FW from from /lib/firmware/ */
 		ret = request_firmware(&fw, fw_info->fw_mod_name,
@@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
 
 		/* Cleaning up */
 		release_firmware(fw);
-		t4_free_mem(card_fw);
+		kvfree(card_fw);
 
 		if (ret < 0)
 			goto bye;
@@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
 {
 	unsigned int i;
 
-	t4_free_mem(adapter->l2t);
+	kvfree(adapter->l2t);
 	t4_cleanup_sched(adapter);
-	t4_free_mem(adapter->tids.tid_tab);
+	kvfree(adapter->tids.tid_tab);
 	cxgb4_cleanup_tc_u32(adapter);
 	kfree(adapter->sge.egr_map);
 	kfree(adapter->sge.ingr_map);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 5886ad78058f..a5c1b815145e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
 
 	tmp = size * sizeof(struct mlx4_en_tx_info);
-	ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
+	ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
 	if (!ring->tx_info) {
-		ring->tx_info = vmalloc(tmp);
-		if (!ring->tx_info) {
-			err = -ENOMEM;
-			goto err_ring;
-		}
+		err = -ENOMEM;
+		goto err_ring;
 	}
 
 	en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index 395b5463cfd9..82354fd0a87e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
 
 	for (i = 0; i <= buddy->max_order; ++i) {
 		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-		buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
-		if (!buddy->bits[i]) {
-			buddy->bits[i] = vzalloc(s * sizeof(long));
-			if (!buddy->bits[i])
-				goto err_out_free;
-		}
+		buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
+		if (!buddy->bits[i])
+			goto err_out_free;
 	}
 
 	set_bit(0, buddy->bits[buddy->max_order]);
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 0eedc49e0d47..3bd332b167d9 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
 		return -ENXIO;
 	}
 
-	ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
-	if (!ndd->data)
-		ndd->data = vmalloc(ndd->nsarea.config_size);
-
+	ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
 	if (!ndd->data)
 		return -ENOMEM;
 
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
index a6a76a681ea9..8f638267e704 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
@@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
 void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
 			  gfp_t flags)
 {
-	void *ret;
-
-	ret = kzalloc_node(size, flags | __GFP_NOWARN,
-			   cfs_cpt_spread_node(cptab, cpt));
-	if (!ret) {
-		WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
-		ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
-	}
-
-	return ret;
+	return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
 }
 EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 6890897a6f30..10f1ef582659 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -87,18 +87,6 @@ struct user_evtchn {
 	bool enabled;
 };
 
-static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
-{
-	evtchn_port_t *ring;
-	size_t s = size * sizeof(*ring);
-
-	ring = kmalloc(s, GFP_KERNEL);
-	if (!ring)
-		ring = vmalloc(s);
-
-	return ring;
-}
-
 static void evtchn_free_ring(evtchn_port_t *ring)
 {
 	kvfree(ring);
@@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
 	else
 		new_size = 2 * u->ring_size;
 
-	new_ring = evtchn_alloc_ring(new_size);
+	new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
 	if (!new_ring)
 		return -ENOMEM;
 
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 146b2dc0d2cf..4fc9712d927d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
 		goto out;
 	}
 
-	tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
+	tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
 	if (!tmp_buf) {
-		tmp_buf = vmalloc(fs_info->nodesize);
-		if (!tmp_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	left_path->search_commit_root = 1;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 77dabfed3a5d..6f0b488c7428 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 	u64 last_dest_end = destoff;
 
 	ret = -ENOMEM;
-	buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
-	if (!buf) {
-		buf = vmalloc(fs_info->nodesize);
-		if (!buf)
-			return ret;
-	}
+	buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
+	if (!buf)
+		return ret;
 
 	path = btrfs_alloc_path();
 	if (!path) {
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index d145ce804620..0621ca2a7b5d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
 	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
-	sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
+	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
-		sctx->send_buf = vmalloc(sctx->send_max_size);
-		if (!sctx->send_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
-	sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
+	sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
 	if (!sctx->read_buf) {
-		sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
-		if (!sctx->read_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	sctx->pending_dir_moves = RB_ROOT;
@@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
 
 	if (arg->clone_sources_count) {
-		clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
+		clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!clone_sources_tmp) {
-			clone_sources_tmp = vmalloc(alloc_size);
-			if (!clone_sources_tmp) {
-				ret = -ENOMEM;
-				goto out;
-			}
+			ret = -ENOMEM;
+			goto out;
 		}
 
 		ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 045d30d26624..78b18acf33ba 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
 	align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
 		(PAGE_SIZE - 1);
 	npages = calc_pages_for(align, nbytes);
-	pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
-	if (!pages) {
-		pages = vmalloc(sizeof(*pages) * npages);
-		if (!pages)
-			return ERR_PTR(-ENOMEM);
-	}
+	pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
+	if (!pages)
+		return ERR_PTR(-ENOMEM);
 
 	for (idx = 0; idx < npages; ) {
 		size_t start;
diff --git a/fs/select.c b/fs/select.c
index 305c0daf5d67..9e8e1189eb99 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -586,10 +586,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
 			goto out_nofds;
 
 		alloc_size = 6 * size;
-		bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
-		if (!bits && alloc_size > PAGE_SIZE)
-			bits = vmalloc(alloc_size);
-
+		bits = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!bits)
 			goto out_nofds;
 	}
diff --git a/fs/xattr.c b/fs/xattr.c
index 7e3317cf4045..4269a7c26db7 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -431,12 +431,9 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			return -E2BIG;
-		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 		if (copy_from_user(kvalue, value, size)) {
 			error = -EFAULT;
 			goto out;
@@ -528,12 +525,9 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			size = XATTR_SIZE_MAX;
-		kvalue = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 	}
 
 	error = vfs_getxattr(d, kname, kvalue, size);
@@ -611,12 +605,9 @@ listxattr(struct dentry *d, char __user *list, size_t size)
 	if (size) {
 		if (size > XATTR_LIST_MAX)
 			size = XATTR_LIST_MAX;
-		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
-		if (!klist) {
-			klist = vmalloc(size);
-			if (!klist)
-				return -ENOMEM;
-		}
+		klist = kvmalloc(size, GFP_KERNEL);
+		if (!klist)
+			return -ENOMEM;
 	}
 
 	error = vfs_listxattr(d, klist, size);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 34debc1a9641..4ca30a951bbc 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -320,14 +320,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		goto free_htab;
 
 	err = -ENOMEM;
-	htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket),
-				      GFP_USER | __GFP_NOWARN);
-
-	if (!htab->buckets) {
-		htab->buckets = vmalloc(htab->n_buckets * sizeof(struct bucket));
-		if (!htab->buckets)
-			goto free_htab;
-	}
+	htab->buckets = kvmalloc(htab->n_buckets * sizeof(struct bucket), GFP_USER);
+	if (!htab->buckets)
+		goto free_htab;
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_HEAD(&htab->buckets[i].head);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 25f572303801..45c17b5562b5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -957,10 +957,7 @@ EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
 {
-	struct page **p = kmalloc(n * sizeof(struct page *), GFP_KERNEL);
-	if (!p)
-		p = vmalloc(n * sizeof(struct page *));
-	return p;
+	return kvmalloc(n * sizeof(struct page *), GFP_KERNEL);
 }
 
 static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index db77dcb38afd..72ebec18629c 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -200,10 +200,7 @@ struct frame_vector *frame_vector_create(unsigned int nr_frames)
 	 * Avoid higher order allocations, use vmalloc instead. It should
 	 * be rare anyway.
 	 */
-	if (size <= PAGE_SIZE)
-		vec = kmalloc(size, GFP_KERNEL);
-	else
-		vec = vmalloc(size);
+	vec = kvmalloc(size, GFP_KERNEL);
 	if (!vec)
 		return NULL;
 	vec->nr_allocated = nr_frames;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ca97835bfec4..a46a9fd8b540 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -687,11 +687,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 		/* no more locks than number of hash buckets */
 		nblocks = min(nblocks, hashinfo->ehash_mask + 1);
 
-		hashinfo->ehash_locks =	kmalloc_array(nblocks, locksz,
-						      GFP_KERNEL | __GFP_NOWARN);
-		if (!hashinfo->ehash_locks)
-			hashinfo->ehash_locks = vmalloc(nblocks * locksz);
-
+		hashinfo->ehash_locks = kvmalloc(nblocks * locksz, GFP_KERNEL);
 		if (!hashinfo->ehash_locks)
 			return -ENOMEM;
 
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index d46f4d5b1c62..39b2166d3be8 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -1155,10 +1155,7 @@ static int __net_init tcp_net_metrics_init(struct net *net)
 	tcp_metrics_hash_log = order_base_2(slots);
 	size = sizeof(struct tcpm_hash_bucket) << tcp_metrics_hash_log;
 
-	tcp_metrics_hash = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!tcp_metrics_hash)
-		tcp_metrics_hash = vzalloc(size);
-
+	tcp_metrics_hash = kvzalloc(size, GFP_KERNEL);
 	if (!tcp_metrics_hash)
 		return -ENOMEM;
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 15fe97644ffe..a0c82ef74389 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1525,10 +1525,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 	unsigned index;
 
 	if (size) {
-		labels = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-		if (!labels)
-			labels = vzalloc(size);
-
+		labels = kvzalloc(size, GFP_KERNEL);
 		if (!labels)
 			goto nolabels;
 	}
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index a011322a027d..eeed0af3ea25 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,17 +712,11 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-	unsigned int *off;
-
-	off = kcalloc(size, sizeof(unsigned int), GFP_KERNEL | __GFP_NOWARN);
-
-	if (off)
-		return off;
-
 	if (size < (SIZE_MAX / sizeof(unsigned int)))
-		off = vmalloc(size * sizeof(unsigned int));
+		return kvmalloc(size * sizeof(unsigned int), GFP_KERNEL);
+
+	return NULL;
 
-	return off;
 }
 EXPORT_SYMBOL(xt_alloc_entry_offsets);
 
@@ -956,15 +950,9 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (!info) {
-		info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN |
-				     __GFP_NORETRY | __GFP_HIGHMEM,
-				 PAGE_KERNEL);
-		if (!info)
-			return NULL;
-	}
+	info = kvmalloc(sz, GFP_KERNEL);
+	if (!info)
+		return NULL;
 	memset(info, 0, sizeof(*info));
 	info->size = size;
 	return info;
@@ -1066,7 +1054,7 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 
 	size = sizeof(void **) * nr_cpu_ids;
 	if (size > PAGE_SIZE)
-		i->jumpstack = vzalloc(size);
+		i->jumpstack = kvzalloc(size, GFP_KERNEL);
 	else
 		i->jumpstack = kzalloc(size, GFP_KERNEL);
 	if (i->jumpstack == NULL)
@@ -1088,12 +1076,8 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	 */
 	size = sizeof(void *) * i->stacksize * 2u;
 	for_each_possible_cpu(cpu) {
-		if (size > PAGE_SIZE)
-			i->jumpstack[cpu] = vmalloc_node(size,
-				cpu_to_node(cpu));
-		else
-			i->jumpstack[cpu] = kmalloc_node(size,
-				GFP_KERNEL, cpu_to_node(cpu));
+		i->jumpstack[cpu] = kvmalloc_node(size, GFP_KERNEL,
+			cpu_to_node(cpu));
 		if (i->jumpstack[cpu] == NULL)
 			/*
 			 * Freeing will be done later on by the callers. The
diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index 1d89a4eaf841..d6aa8f63ed2e 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -388,10 +388,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par,
 	}
 
 	sz = sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size;
-	if (sz <= PAGE_SIZE)
-		t = kzalloc(sz, GFP_KERNEL);
-	else
-		t = vzalloc(sz);
+	t = kvzalloc(sz, GFP_KERNEL);
 	if (t == NULL) {
 		ret = -ENOMEM;
 		goto out;
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 3b6d5bd69101..30d6a39fd2c8 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -431,10 +431,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 	if (mask != q->tab_mask) {
 		struct sk_buff **ntab;
 
-		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *),
-			       GFP_KERNEL | __GFP_NOWARN);
-		if (!ntab)
-			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		ntab = kvzalloc((mask + 1) * sizeof(struct sk_buff *), GFP_KERNEL);
 		if (!ntab)
 			return -ENOMEM;
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index a5ea0e9b6be4..04e2d006f277 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -449,27 +449,13 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
 	return 0;
 }
 
-static void *fq_codel_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-	return ptr;
-}
-
-static void fq_codel_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void fq_codel_destroy(struct Qdisc *sch)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 
 	tcf_destroy_chain(&q->filter_list);
-	fq_codel_free(q->backlogs);
-	fq_codel_free(q->flows);
+	kvfree(q->backlogs);
+	kvfree(q->flows);
 }
 
 static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
@@ -497,13 +483,13 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
 	}
 
 	if (!q->flows) {
-		q->flows = fq_codel_zalloc(q->flows_cnt *
-					   sizeof(struct fq_codel_flow));
+		q->flows = kvmalloc(q->flows_cnt *
+					   sizeof(struct fq_codel_flow), GFP_KERNEL);
 		if (!q->flows)
 			return -ENOMEM;
-		q->backlogs = fq_codel_zalloc(q->flows_cnt * sizeof(u32));
+		q->backlogs = kvmalloc(q->flows_cnt * sizeof(u32), GFP_KERNEL);
 		if (!q->backlogs) {
-			fq_codel_free(q->flows);
+			kvfree(q->flows);
 			return -ENOMEM;
 		}
 		for (i = 0; i < q->flows_cnt; i++) {
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index e3d0458af17b..858b2de5db59 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -467,29 +467,14 @@ static void hhf_reset(struct Qdisc *sch)
 		rtnl_kfree_skbs(skb, skb);
 }
 
-static void *hhf_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-
-	return ptr;
-}
-
-static void hhf_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void hhf_destroy(struct Qdisc *sch)
 {
 	int i;
 	struct hhf_sched_data *q = qdisc_priv(sch);
 
 	for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-		hhf_free(q->hhf_arrays[i]);
-		hhf_free(q->hhf_valid_bits[i]);
+		kvfree(q->hhf_arrays[i]);
+		kvfree(q->hhf_valid_bits[i]);
 	}
 
 	for (i = 0; i < HH_FLOWS_CNT; i++) {
@@ -503,7 +488,7 @@ static void hhf_destroy(struct Qdisc *sch)
 			kfree(flow);
 		}
 	}
-	hhf_free(q->hh_flows);
+	kvfree(q->hh_flows);
 }
 
 static const struct nla_policy hhf_policy[TCA_HHF_MAX + 1] = {
@@ -609,8 +594,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 	if (!q->hh_flows) {
 		/* Initialize heavy-hitter flow table. */
-		q->hh_flows = hhf_zalloc(HH_FLOWS_CNT *
-					 sizeof(struct list_head));
+		q->hh_flows = kvmalloc(HH_FLOWS_CNT *
+					 sizeof(struct list_head), GFP_KERNEL);
 		if (!q->hh_flows)
 			return -ENOMEM;
 		for (i = 0; i < HH_FLOWS_CNT; i++)
@@ -624,8 +609,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_arrays[i] = hhf_zalloc(HHF_ARRAYS_LEN *
-						      sizeof(u32));
+			q->hhf_arrays[i] = kvmalloc(HHF_ARRAYS_LEN *
+						      sizeof(u32), GFP_KERNEL);
 			if (!q->hhf_arrays[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
@@ -635,8 +620,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize valid bits of heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_valid_bits[i] = hhf_zalloc(HHF_ARRAYS_LEN /
-							  BITS_PER_BYTE);
+			q->hhf_valid_bits[i] = kvmalloc(HHF_ARRAYS_LEN /
+							  BITS_PER_BYTE, GFP_KERNEL);
 			if (!q->hhf_valid_bits[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bcfadfdea8e0..08a3d2af1792 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -692,15 +692,11 @@ static int get_dist_table(struct Qdisc *sch, const struct nlattr *attr)
 	spinlock_t *root_lock;
 	struct disttable *d;
 	int i;
-	size_t s;
 
 	if (n > NETEM_DIST_MAX)
 		return -EINVAL;
 
-	s = sizeof(struct disttable) + n * sizeof(s16);
-	d = kmalloc(s, GFP_KERNEL | __GFP_NOWARN);
-	if (!d)
-		d = vmalloc(s);
+	d = kvmalloc(sizeof(struct disttable) + n * sizeof(s16), GFP_KERNEL);
 	if (!d)
 		return -ENOMEM;
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 7f195ed4d568..5d70cd6a032d 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -684,11 +684,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
 static void *sfq_alloc(size_t sz)
 {
-	void *ptr = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vmalloc(sz);
-	return ptr;
+	return  kvmalloc(sz, GFP_KERNEL);
 }
 
 static void sfq_free(void *addr)
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 38c00e867bda..a5c21f05ece4 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -99,14 +99,9 @@ SYSCALL_DEFINE5(add_key, const char __user *, _type,
 
 	if (_payload) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error2;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error2;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error2;
 
 		ret = -EFAULT;
 		if (copy_from_user(payload, _payload, plen) != 0)
@@ -1064,14 +1059,9 @@ long keyctl_instantiate_key_common(key_serial_t id,
 
 	if (from) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error;
 
 		ret = -EFAULT;
 		if (!copy_from_iter_full(payload, plen, from))
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 15:37   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev

From: Michal Hocko <mhocko@suse.com>

There are many code paths opencoding kvmalloc. Let's use the helper
instead. The main difference to kvmalloc is that those users are usually
not considering all the aspects of the memory allocator. E.g. allocation
requests < 64kB are basically never failing and invoke OOM killer to
satisfy the allocation. This sounds too disruptive for something that
has a reasonable fallback - the vmalloc. On the other hand those
requests might fallback to vmalloc even when the memory allocator would
succeed after several more reclaim/compaction attempts previously. There
is no guarantee something like that happens though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Sterba <dsterba@suse.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/kvm/kvm-s390.c                           | 10 ++-----
 crypto/lzo.c                                       |  4 +--
 drivers/acpi/apei/erst.c                           |  8 ++---
 drivers/char/agp/generic.c                         |  8 +----
 drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
 drivers/md/bcache/util.h                           | 12 ++------
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
 drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
 drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
 drivers/nvdimm/dimm_devs.c                         |  5 +---
 .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
 drivers/xen/evtchn.c                               | 14 +--------
 fs/btrfs/ctree.c                                   |  9 ++----
 fs/btrfs/ioctl.c                                   |  9 ++----
 fs/btrfs/send.c                                    | 27 ++++++-----------
 fs/ceph/file.c                                     |  9 ++----
 fs/select.c                                        |  5 +---
 fs/xattr.c                                         | 27 ++++++-----------
 kernel/bpf/hashtab.c                               | 11 ++-----
 lib/iov_iter.c                                     |  5 +---
 mm/frame_vector.c                                  |  5 +---
 net/ipv4/inet_hashtables.c                         |  6 +---
 net/ipv4/tcp_metrics.c                             |  5 +---
 net/mpls/af_mpls.c                                 |  5 +---
 net/netfilter/x_tables.c                           | 34 ++++++----------------
 net/netfilter/xt_recent.c                          |  5 +---
 net/sched/sch_choke.c                              |  5 +---
 net/sched/sch_fq_codel.c                           | 26 ++++-------------
 net/sched/sch_hhf.c                                | 33 ++++++---------------
 net/sched/sch_netem.c                              |  6 +---
 net/sched/sch_sfq.c                                |  6 +---
 security/keys/keyctl.c                             | 22 ++++----------
 35 files changed, 96 insertions(+), 319 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4f74511015b8..e6bbb33d2956 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
@@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
diff --git a/crypto/lzo.c b/crypto/lzo.c
index 168df784da84..218567d717d6 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
 {
 	void *ctx;
 
-	ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
-	if (!ctx)
-		ctx = vmalloc(LZO1X_MEM_COMPRESS);
+	ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index ec4f507b524f..a2898df61744 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
 	if (i < erst_record_id_cache.len)
 		goto retry;
 	if (erst_record_id_cache.len >= erst_record_id_cache.size) {
-		int new_size, alloc_size;
+		int new_size;
 		u64 *new_entries;
 
 		new_size = erst_record_id_cache.size * 2;
@@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
 				pr_warn(FW_WARN "too many record IDs!\n");
 			return 0;
 		}
-		alloc_size = new_size * sizeof(entries[0]);
-		if (alloc_size < PAGE_SIZE)
-			new_entries = kmalloc(alloc_size, GFP_KERNEL);
-		else
-			new_entries = vmalloc(alloc_size);
+		new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
 		if (!new_entries)
 			return -ENOMEM;
 		memcpy(new_entries, entries,
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index f002fa5d1887..bdf418cac8ef 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -88,13 +88,7 @@ static int agp_get_key(void)
 
 void agp_alloc_page_array(size_t size, struct agp_memory *mem)
 {
-	mem->pages = NULL;
-
-	if (size <= 2*PAGE_SIZE)
-		mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (mem->pages == NULL) {
-		mem->pages = vmalloc(size);
-	}
+	mem->pages = kvmalloc(size, GFP_KERNEL);
 }
 EXPORT_SYMBOL(agp_alloc_page_array);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 201b52b750dd..77dd73ff126f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
 
 	size *= nmemb;
 
-	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!mem)
-		mem = vmalloc(size);
+	mem = kvmalloc(size, GFP_KERNEL);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index cf2cbc211d83..d00bcb64d3a8 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -43,11 +43,7 @@ struct closure;
 	(heap)->used = 0;						\
 	(heap)->size = (_size);						\
 	_bytes = (heap)->size * sizeof(*(heap)->data);			\
-	(heap)->data = NULL;						\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(heap)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(heap)->data) && ((gfp) & GFP_KERNEL))			\
-		(heap)->data = vmalloc(_bytes);				\
+	(heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(heap)->data;							\
 })
 
@@ -136,12 +132,8 @@ do {									\
 									\
 	(fifo)->mask = _allocated_size - 1;				\
 	(fifo)->front = (fifo)->back = 0;				\
-	(fifo)->data = NULL;						\
 									\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(fifo)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))			\
-		(fifo)->data = vmalloc(_bytes);				\
+	(fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(fifo)->data;							\
 })
 
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
index 920d918ed193..f04e81f33795 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
@@ -41,9 +41,6 @@
 
 #define VALIDATE_TID 1
 
-void *cxgb_alloc_mem(unsigned long size);
-void cxgb_free_mem(void *addr);
-
 /*
  * Map an ATID or STID to their entries in the corresponding TID tables.
  */
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
index 76684dcb874c..606d4a3ade04 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
@@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
 }
 
 /*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *cxgb_alloc_mem(unsigned long size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through t3_alloc_mem().
- */
-void cxgb_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
-/*
  * Allocate and initialize the TID tables.  Returns 0 on success.
  */
 static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
@@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 	unsigned long size = ntids * sizeof(*t->tid_tab) +
 	    natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
 
-	t->tid_tab = cxgb_alloc_mem(size);
+	t->tid_tab = kvmalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 
 static void free_tid_maps(struct tid_info *t)
 {
-	cxgb_free_mem(t->tid_tab);
+	kvfree(t->tid_tab);
 }
 
 static inline void add_adapter(struct adapter *adap)
diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
index 5f226eda8cd6..c9b06501ee0c 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
@@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
 	struct l2t_data *d;
 	int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
 
-	d = cxgb_alloc_mem(size);
+	d = kvmalloc(size, GFP_KERNEL);
 	if (!d)
 		return NULL;
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 6f951877430b..671695cb3c15 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
 	return err;
 }
 
-/*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *t4_alloc_mem(size_t size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through alloc_mem().
- */
-void t4_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
 static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
 			     void *accel_priv, select_queue_fallback_t fallback)
 {
@@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
 	       max_ftids * sizeof(*t->ftid_tab) +
 	       ftid_bmap_size * sizeof(long);
 
-	t->tid_tab = t4_alloc_mem(size);
+	t->tid_tab = kvmalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
 		/* allocate memory to read the header of the firmware on the
 		 * card
 		 */
-		card_fw = t4_alloc_mem(sizeof(*card_fw));
+		card_fw = kvmalloc(sizeof(*card_fw), GFP_KERNEL);
 
 		/* Get FW from from /lib/firmware/ */
 		ret = request_firmware(&fw, fw_info->fw_mod_name,
@@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
 
 		/* Cleaning up */
 		release_firmware(fw);
-		t4_free_mem(card_fw);
+		kvfree(card_fw);
 
 		if (ret < 0)
 			goto bye;
@@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
 {
 	unsigned int i;
 
-	t4_free_mem(adapter->l2t);
+	kvfree(adapter->l2t);
 	t4_cleanup_sched(adapter);
-	t4_free_mem(adapter->tids.tid_tab);
+	kvfree(adapter->tids.tid_tab);
 	cxgb4_cleanup_tc_u32(adapter);
 	kfree(adapter->sge.egr_map);
 	kfree(adapter->sge.ingr_map);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 5886ad78058f..a5c1b815145e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
 
 	tmp = size * sizeof(struct mlx4_en_tx_info);
-	ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
+	ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
 	if (!ring->tx_info) {
-		ring->tx_info = vmalloc(tmp);
-		if (!ring->tx_info) {
-			err = -ENOMEM;
-			goto err_ring;
-		}
+		err = -ENOMEM;
+		goto err_ring;
 	}
 
 	en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index 395b5463cfd9..82354fd0a87e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
 
 	for (i = 0; i <= buddy->max_order; ++i) {
 		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-		buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
-		if (!buddy->bits[i]) {
-			buddy->bits[i] = vzalloc(s * sizeof(long));
-			if (!buddy->bits[i])
-				goto err_out_free;
-		}
+		buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
+		if (!buddy->bits[i])
+			goto err_out_free;
 	}
 
 	set_bit(0, buddy->bits[buddy->max_order]);
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 0eedc49e0d47..3bd332b167d9 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
 		return -ENXIO;
 	}
 
-	ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
-	if (!ndd->data)
-		ndd->data = vmalloc(ndd->nsarea.config_size);
-
+	ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
 	if (!ndd->data)
 		return -ENOMEM;
 
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
index a6a76a681ea9..8f638267e704 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
@@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
 void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
 			  gfp_t flags)
 {
-	void *ret;
-
-	ret = kzalloc_node(size, flags | __GFP_NOWARN,
-			   cfs_cpt_spread_node(cptab, cpt));
-	if (!ret) {
-		WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
-		ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
-	}
-
-	return ret;
+	return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
 }
 EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 6890897a6f30..10f1ef582659 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -87,18 +87,6 @@ struct user_evtchn {
 	bool enabled;
 };
 
-static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
-{
-	evtchn_port_t *ring;
-	size_t s = size * sizeof(*ring);
-
-	ring = kmalloc(s, GFP_KERNEL);
-	if (!ring)
-		ring = vmalloc(s);
-
-	return ring;
-}
-
 static void evtchn_free_ring(evtchn_port_t *ring)
 {
 	kvfree(ring);
@@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
 	else
 		new_size = 2 * u->ring_size;
 
-	new_ring = evtchn_alloc_ring(new_size);
+	new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
 	if (!new_ring)
 		return -ENOMEM;
 
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 146b2dc0d2cf..4fc9712d927d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
 		goto out;
 	}
 
-	tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
+	tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
 	if (!tmp_buf) {
-		tmp_buf = vmalloc(fs_info->nodesize);
-		if (!tmp_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	left_path->search_commit_root = 1;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 77dabfed3a5d..6f0b488c7428 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 	u64 last_dest_end = destoff;
 
 	ret = -ENOMEM;
-	buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
-	if (!buf) {
-		buf = vmalloc(fs_info->nodesize);
-		if (!buf)
-			return ret;
-	}
+	buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
+	if (!buf)
+		return ret;
 
 	path = btrfs_alloc_path();
 	if (!path) {
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index d145ce804620..0621ca2a7b5d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
 	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
-	sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
+	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
-		sctx->send_buf = vmalloc(sctx->send_max_size);
-		if (!sctx->send_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
-	sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
+	sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
 	if (!sctx->read_buf) {
-		sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
-		if (!sctx->read_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	sctx->pending_dir_moves = RB_ROOT;
@@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
 
 	if (arg->clone_sources_count) {
-		clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
+		clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!clone_sources_tmp) {
-			clone_sources_tmp = vmalloc(alloc_size);
-			if (!clone_sources_tmp) {
-				ret = -ENOMEM;
-				goto out;
-			}
+			ret = -ENOMEM;
+			goto out;
 		}
 
 		ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 045d30d26624..78b18acf33ba 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
 	align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
 		(PAGE_SIZE - 1);
 	npages = calc_pages_for(align, nbytes);
-	pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
-	if (!pages) {
-		pages = vmalloc(sizeof(*pages) * npages);
-		if (!pages)
-			return ERR_PTR(-ENOMEM);
-	}
+	pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
+	if (!pages)
+		return ERR_PTR(-ENOMEM);
 
 	for (idx = 0; idx < npages; ) {
 		size_t start;
diff --git a/fs/select.c b/fs/select.c
index 305c0daf5d67..9e8e1189eb99 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -586,10 +586,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
 			goto out_nofds;
 
 		alloc_size = 6 * size;
-		bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
-		if (!bits && alloc_size > PAGE_SIZE)
-			bits = vmalloc(alloc_size);
-
+		bits = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!bits)
 			goto out_nofds;
 	}
diff --git a/fs/xattr.c b/fs/xattr.c
index 7e3317cf4045..4269a7c26db7 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -431,12 +431,9 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			return -E2BIG;
-		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 		if (copy_from_user(kvalue, value, size)) {
 			error = -EFAULT;
 			goto out;
@@ -528,12 +525,9 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			size = XATTR_SIZE_MAX;
-		kvalue = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 	}
 
 	error = vfs_getxattr(d, kname, kvalue, size);
@@ -611,12 +605,9 @@ listxattr(struct dentry *d, char __user *list, size_t size)
 	if (size) {
 		if (size > XATTR_LIST_MAX)
 			size = XATTR_LIST_MAX;
-		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
-		if (!klist) {
-			klist = vmalloc(size);
-			if (!klist)
-				return -ENOMEM;
-		}
+		klist = kvmalloc(size, GFP_KERNEL);
+		if (!klist)
+			return -ENOMEM;
 	}
 
 	error = vfs_listxattr(d, klist, size);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 34debc1a9641..4ca30a951bbc 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -320,14 +320,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		goto free_htab;
 
 	err = -ENOMEM;
-	htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket),
-				      GFP_USER | __GFP_NOWARN);
-
-	if (!htab->buckets) {
-		htab->buckets = vmalloc(htab->n_buckets * sizeof(struct bucket));
-		if (!htab->buckets)
-			goto free_htab;
-	}
+	htab->buckets = kvmalloc(htab->n_buckets * sizeof(struct bucket), GFP_USER);
+	if (!htab->buckets)
+		goto free_htab;
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_HEAD(&htab->buckets[i].head);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 25f572303801..45c17b5562b5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -957,10 +957,7 @@ EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
 {
-	struct page **p = kmalloc(n * sizeof(struct page *), GFP_KERNEL);
-	if (!p)
-		p = vmalloc(n * sizeof(struct page *));
-	return p;
+	return kvmalloc(n * sizeof(struct page *), GFP_KERNEL);
 }
 
 static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index db77dcb38afd..72ebec18629c 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -200,10 +200,7 @@ struct frame_vector *frame_vector_create(unsigned int nr_frames)
 	 * Avoid higher order allocations, use vmalloc instead. It should
 	 * be rare anyway.
 	 */
-	if (size <= PAGE_SIZE)
-		vec = kmalloc(size, GFP_KERNEL);
-	else
-		vec = vmalloc(size);
+	vec = kvmalloc(size, GFP_KERNEL);
 	if (!vec)
 		return NULL;
 	vec->nr_allocated = nr_frames;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ca97835bfec4..a46a9fd8b540 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -687,11 +687,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 		/* no more locks than number of hash buckets */
 		nblocks = min(nblocks, hashinfo->ehash_mask + 1);
 
-		hashinfo->ehash_locks =	kmalloc_array(nblocks, locksz,
-						      GFP_KERNEL | __GFP_NOWARN);
-		if (!hashinfo->ehash_locks)
-			hashinfo->ehash_locks = vmalloc(nblocks * locksz);
-
+		hashinfo->ehash_locks = kvmalloc(nblocks * locksz, GFP_KERNEL);
 		if (!hashinfo->ehash_locks)
 			return -ENOMEM;
 
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index d46f4d5b1c62..39b2166d3be8 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -1155,10 +1155,7 @@ static int __net_init tcp_net_metrics_init(struct net *net)
 	tcp_metrics_hash_log = order_base_2(slots);
 	size = sizeof(struct tcpm_hash_bucket) << tcp_metrics_hash_log;
 
-	tcp_metrics_hash = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!tcp_metrics_hash)
-		tcp_metrics_hash = vzalloc(size);
-
+	tcp_metrics_hash = kvzalloc(size, GFP_KERNEL);
 	if (!tcp_metrics_hash)
 		return -ENOMEM;
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 15fe97644ffe..a0c82ef74389 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1525,10 +1525,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 	unsigned index;
 
 	if (size) {
-		labels = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-		if (!labels)
-			labels = vzalloc(size);
-
+		labels = kvzalloc(size, GFP_KERNEL);
 		if (!labels)
 			goto nolabels;
 	}
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index a011322a027d..eeed0af3ea25 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,17 +712,11 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-	unsigned int *off;
-
-	off = kcalloc(size, sizeof(unsigned int), GFP_KERNEL | __GFP_NOWARN);
-
-	if (off)
-		return off;
-
 	if (size < (SIZE_MAX / sizeof(unsigned int)))
-		off = vmalloc(size * sizeof(unsigned int));
+		return kvmalloc(size * sizeof(unsigned int), GFP_KERNEL);
+
+	return NULL;
 
-	return off;
 }
 EXPORT_SYMBOL(xt_alloc_entry_offsets);
 
@@ -956,15 +950,9 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (!info) {
-		info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN |
-				     __GFP_NORETRY | __GFP_HIGHMEM,
-				 PAGE_KERNEL);
-		if (!info)
-			return NULL;
-	}
+	info = kvmalloc(sz, GFP_KERNEL);
+	if (!info)
+		return NULL;
 	memset(info, 0, sizeof(*info));
 	info->size = size;
 	return info;
@@ -1066,7 +1054,7 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 
 	size = sizeof(void **) * nr_cpu_ids;
 	if (size > PAGE_SIZE)
-		i->jumpstack = vzalloc(size);
+		i->jumpstack = kvzalloc(size, GFP_KERNEL);
 	else
 		i->jumpstack = kzalloc(size, GFP_KERNEL);
 	if (i->jumpstack == NULL)
@@ -1088,12 +1076,8 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	 */
 	size = sizeof(void *) * i->stacksize * 2u;
 	for_each_possible_cpu(cpu) {
-		if (size > PAGE_SIZE)
-			i->jumpstack[cpu] = vmalloc_node(size,
-				cpu_to_node(cpu));
-		else
-			i->jumpstack[cpu] = kmalloc_node(size,
-				GFP_KERNEL, cpu_to_node(cpu));
+		i->jumpstack[cpu] = kvmalloc_node(size, GFP_KERNEL,
+			cpu_to_node(cpu));
 		if (i->jumpstack[cpu] == NULL)
 			/*
 			 * Freeing will be done later on by the callers. The
diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index 1d89a4eaf841..d6aa8f63ed2e 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -388,10 +388,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par,
 	}
 
 	sz = sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size;
-	if (sz <= PAGE_SIZE)
-		t = kzalloc(sz, GFP_KERNEL);
-	else
-		t = vzalloc(sz);
+	t = kvzalloc(sz, GFP_KERNEL);
 	if (t == NULL) {
 		ret = -ENOMEM;
 		goto out;
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 3b6d5bd69101..30d6a39fd2c8 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -431,10 +431,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 	if (mask != q->tab_mask) {
 		struct sk_buff **ntab;
 
-		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *),
-			       GFP_KERNEL | __GFP_NOWARN);
-		if (!ntab)
-			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		ntab = kvzalloc((mask + 1) * sizeof(struct sk_buff *), GFP_KERNEL);
 		if (!ntab)
 			return -ENOMEM;
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index a5ea0e9b6be4..04e2d006f277 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -449,27 +449,13 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
 	return 0;
 }
 
-static void *fq_codel_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-	return ptr;
-}
-
-static void fq_codel_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void fq_codel_destroy(struct Qdisc *sch)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 
 	tcf_destroy_chain(&q->filter_list);
-	fq_codel_free(q->backlogs);
-	fq_codel_free(q->flows);
+	kvfree(q->backlogs);
+	kvfree(q->flows);
 }
 
 static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
@@ -497,13 +483,13 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
 	}
 
 	if (!q->flows) {
-		q->flows = fq_codel_zalloc(q->flows_cnt *
-					   sizeof(struct fq_codel_flow));
+		q->flows = kvmalloc(q->flows_cnt *
+					   sizeof(struct fq_codel_flow), GFP_KERNEL);
 		if (!q->flows)
 			return -ENOMEM;
-		q->backlogs = fq_codel_zalloc(q->flows_cnt * sizeof(u32));
+		q->backlogs = kvmalloc(q->flows_cnt * sizeof(u32), GFP_KERNEL);
 		if (!q->backlogs) {
-			fq_codel_free(q->flows);
+			kvfree(q->flows);
 			return -ENOMEM;
 		}
 		for (i = 0; i < q->flows_cnt; i++) {
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index e3d0458af17b..858b2de5db59 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -467,29 +467,14 @@ static void hhf_reset(struct Qdisc *sch)
 		rtnl_kfree_skbs(skb, skb);
 }
 
-static void *hhf_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-
-	return ptr;
-}
-
-static void hhf_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void hhf_destroy(struct Qdisc *sch)
 {
 	int i;
 	struct hhf_sched_data *q = qdisc_priv(sch);
 
 	for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-		hhf_free(q->hhf_arrays[i]);
-		hhf_free(q->hhf_valid_bits[i]);
+		kvfree(q->hhf_arrays[i]);
+		kvfree(q->hhf_valid_bits[i]);
 	}
 
 	for (i = 0; i < HH_FLOWS_CNT; i++) {
@@ -503,7 +488,7 @@ static void hhf_destroy(struct Qdisc *sch)
 			kfree(flow);
 		}
 	}
-	hhf_free(q->hh_flows);
+	kvfree(q->hh_flows);
 }
 
 static const struct nla_policy hhf_policy[TCA_HHF_MAX + 1] = {
@@ -609,8 +594,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 	if (!q->hh_flows) {
 		/* Initialize heavy-hitter flow table. */
-		q->hh_flows = hhf_zalloc(HH_FLOWS_CNT *
-					 sizeof(struct list_head));
+		q->hh_flows = kvmalloc(HH_FLOWS_CNT *
+					 sizeof(struct list_head), GFP_KERNEL);
 		if (!q->hh_flows)
 			return -ENOMEM;
 		for (i = 0; i < HH_FLOWS_CNT; i++)
@@ -624,8 +609,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_arrays[i] = hhf_zalloc(HHF_ARRAYS_LEN *
-						      sizeof(u32));
+			q->hhf_arrays[i] = kvmalloc(HHF_ARRAYS_LEN *
+						      sizeof(u32), GFP_KERNEL);
 			if (!q->hhf_arrays[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
@@ -635,8 +620,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize valid bits of heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_valid_bits[i] = hhf_zalloc(HHF_ARRAYS_LEN /
-							  BITS_PER_BYTE);
+			q->hhf_valid_bits[i] = kvmalloc(HHF_ARRAYS_LEN /
+							  BITS_PER_BYTE, GFP_KERNEL);
 			if (!q->hhf_valid_bits[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bcfadfdea8e0..08a3d2af1792 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -692,15 +692,11 @@ static int get_dist_table(struct Qdisc *sch, const struct nlattr *attr)
 	spinlock_t *root_lock;
 	struct disttable *d;
 	int i;
-	size_t s;
 
 	if (n > NETEM_DIST_MAX)
 		return -EINVAL;
 
-	s = sizeof(struct disttable) + n * sizeof(s16);
-	d = kmalloc(s, GFP_KERNEL | __GFP_NOWARN);
-	if (!d)
-		d = vmalloc(s);
+	d = kvmalloc(sizeof(struct disttable) + n * sizeof(s16), GFP_KERNEL);
 	if (!d)
 		return -ENOMEM;
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 7f195ed4d568..5d70cd6a032d 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -684,11 +684,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
 static void *sfq_alloc(size_t sz)
 {
-	void *ptr = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vmalloc(sz);
-	return ptr;
+	return  kvmalloc(sz, GFP_KERNEL);
 }
 
 static void sfq_free(void *addr)
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 38c00e867bda..a5c21f05ece4 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -99,14 +99,9 @@ SYSCALL_DEFINE5(add_key, const char __user *, _type,
 
 	if (_payload) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error2;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error2;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error2;
 
 		ret = -EFAULT;
 		if (copy_from_user(payload, _payload, plen) != 0)
@@ -1064,14 +1059,9 @@ long keyctl_instantiate_key_common(key_serial_t id,
 
 	if (from) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error;
 
 		ret = -EFAULT;
 		if (!copy_from_iter_full(payload, plen, from))
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [RFC PATCH 6/6] net: use kvmalloc with __GFP_REPEAT rather than open coded variant
  2017-01-12 15:37 ` Michal Hocko
  (?)
@ 2017-01-12 15:37   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Eric Dumazet, netdev

From: Michal Hocko <mhocko@suse.com>

fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc
with vmalloc fallback. Use the kvmalloc variant instead. Keep the
__GFP_REPEAT flag based on explanation from Eric:
"
At the time, tests on the hardware I had in my labs showed that
vmalloc() could deliver pages spread all over the memory and that was a
small penalty (once memory is fragmented enough, not at boot time)
"

The way how the code is constructed means, however, that we prefer to go
and hit the OOM killer before we fall back to the vmalloc for requests
smaller than 64kB in the current code. This is rather disruptive for
something that can be achived with the fallback. On the other hand
__GFP_REPEAT doesn't have any useful semantic for these requests. So the
effect of this patch is that requests smaller than 64kB will fallback to
vmalloc esier now.

Cc: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 net/core/dev.c     | 24 +++++++++---------------
 net/sched/sch_fq.c | 12 +-----------
 2 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 56818f7eab2b..5cf2762387aa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7111,12 +7111,10 @@ static int netif_alloc_rx_queues(struct net_device *dev)
 
 	BUG_ON(count < 1);
 
-	rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!rx) {
-		rx = vzalloc(sz);
-		if (!rx)
-			return -ENOMEM;
-	}
+	rx = kvzalloc(sz, GFP_KERNEL | __GFP_REPEAT);
+	if (!rx)
+		return -ENOMEM;
+
 	dev->_rx = rx;
 
 	for (i = 0; i < count; i++)
@@ -7153,12 +7151,10 @@ static int netif_alloc_netdev_queues(struct net_device *dev)
 	if (count < 1 || count > 0xffff)
 		return -EINVAL;
 
-	tx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!tx) {
-		tx = vzalloc(sz);
-		if (!tx)
-			return -ENOMEM;
-	}
+	tx = kvzalloc(sz, GFP_KERNEL | __GFP_REPEAT);
+	if (!tx)
+		return -ENOMEM;
+
 	dev->_tx = tx;
 
 	netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
@@ -7691,9 +7687,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	/* ensure 32-byte alignment of whole construct */
 	alloc_size += NETDEV_ALIGN - 1;
 
-	p = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!p)
-		p = vzalloc(alloc_size);
+	p = kvzalloc(alloc_size, GFP_KERNEL | __GFP_REPEAT);
 	if (!p)
 		return NULL;
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index a4f738ac7728..594f77d89f6c 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -624,16 +624,6 @@ static void fq_rehash(struct fq_sched_data *q,
 	q->stat_gc_flows += fcnt;
 }
 
-static void *fq_alloc_node(size_t sz, int node)
-{
-	void *ptr;
-
-	ptr = kmalloc_node(sz, GFP_KERNEL | __GFP_REPEAT | __GFP_NOWARN, node);
-	if (!ptr)
-		ptr = vmalloc_node(sz, node);
-	return ptr;
-}
-
 static void fq_free(void *addr)
 {
 	kvfree(addr);
@@ -650,7 +640,7 @@ static int fq_resize(struct Qdisc *sch, u32 log)
 		return 0;
 
 	/* If XPS was setup, we can allocate memory on right NUMA node */
-	array = fq_alloc_node(sizeof(struct rb_root) << log,
+	array = kvmalloc_node(sizeof(struct rb_root) << log, GFP_KERNEL | __GFP_REPEAT,
 			      netdev_queue_numa_node_read(sch->dev_queue));
 	if (!array)
 		return -ENOMEM;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [RFC PATCH 6/6] net: use kvmalloc with __GFP_REPEAT rather than open coded variant
@ 2017-01-12 15:37   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Eric Dumazet, netdev

From: Michal Hocko <mhocko@suse.com>

fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc
with vmalloc fallback. Use the kvmalloc variant instead. Keep the
__GFP_REPEAT flag based on explanation from Eric:
"
At the time, tests on the hardware I had in my labs showed that
vmalloc() could deliver pages spread all over the memory and that was a
small penalty (once memory is fragmented enough, not at boot time)
"

The way how the code is constructed means, however, that we prefer to go
and hit the OOM killer before we fall back to the vmalloc for requests
smaller than 64kB in the current code. This is rather disruptive for
something that can be achived with the fallback. On the other hand
__GFP_REPEAT doesn't have any useful semantic for these requests. So the
effect of this patch is that requests smaller than 64kB will fallback to
vmalloc esier now.

Cc: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 net/core/dev.c     | 24 +++++++++---------------
 net/sched/sch_fq.c | 12 +-----------
 2 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 56818f7eab2b..5cf2762387aa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7111,12 +7111,10 @@ static int netif_alloc_rx_queues(struct net_device *dev)
 
 	BUG_ON(count < 1);
 
-	rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!rx) {
-		rx = vzalloc(sz);
-		if (!rx)
-			return -ENOMEM;
-	}
+	rx = kvzalloc(sz, GFP_KERNEL | __GFP_REPEAT);
+	if (!rx)
+		return -ENOMEM;
+
 	dev->_rx = rx;
 
 	for (i = 0; i < count; i++)
@@ -7153,12 +7151,10 @@ static int netif_alloc_netdev_queues(struct net_device *dev)
 	if (count < 1 || count > 0xffff)
 		return -EINVAL;
 
-	tx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!tx) {
-		tx = vzalloc(sz);
-		if (!tx)
-			return -ENOMEM;
-	}
+	tx = kvzalloc(sz, GFP_KERNEL | __GFP_REPEAT);
+	if (!tx)
+		return -ENOMEM;
+
 	dev->_tx = tx;
 
 	netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
@@ -7691,9 +7687,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	/* ensure 32-byte alignment of whole construct */
 	alloc_size += NETDEV_ALIGN - 1;
 
-	p = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!p)
-		p = vzalloc(alloc_size);
+	p = kvzalloc(alloc_size, GFP_KERNEL | __GFP_REPEAT);
 	if (!p)
 		return NULL;
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index a4f738ac7728..594f77d89f6c 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -624,16 +624,6 @@ static void fq_rehash(struct fq_sched_data *q,
 	q->stat_gc_flows += fcnt;
 }
 
-static void *fq_alloc_node(size_t sz, int node)
-{
-	void *ptr;
-
-	ptr = kmalloc_node(sz, GFP_KERNEL | __GFP_REPEAT | __GFP_NOWARN, node);
-	if (!ptr)
-		ptr = vmalloc_node(sz, node);
-	return ptr;
-}
-
 static void fq_free(void *addr)
 {
 	kvfree(addr);
@@ -650,7 +640,7 @@ static int fq_resize(struct Qdisc *sch, u32 log)
 		return 0;
 
 	/* If XPS was setup, we can allocate memory on right NUMA node */
-	array = fq_alloc_node(sizeof(struct rb_root) << log,
+	array = kvmalloc_node(sizeof(struct rb_root) << log, GFP_KERNEL | __GFP_REPEAT,
 			      netdev_queue_numa_node_read(sch->dev_queue));
 	if (!array)
 		return -ENOMEM;
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [RFC PATCH 6/6] net: use kvmalloc with __GFP_REPEAT rather than open coded variant
@ 2017-01-12 15:37   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 15:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Eric Dumazet, netdev

From: Michal Hocko <mhocko@suse.com>

fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc
with vmalloc fallback. Use the kvmalloc variant instead. Keep the
__GFP_REPEAT flag based on explanation from Eric:
"
At the time, tests on the hardware I had in my labs showed that
vmalloc() could deliver pages spread all over the memory and that was a
small penalty (once memory is fragmented enough, not at boot time)
"

The way how the code is constructed means, however, that we prefer to go
and hit the OOM killer before we fall back to the vmalloc for requests
smaller than 64kB in the current code. This is rather disruptive for
something that can be achived with the fallback. On the other hand
__GFP_REPEAT doesn't have any useful semantic for these requests. So the
effect of this patch is that requests smaller than 64kB will fallback to
vmalloc esier now.

Cc: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 net/core/dev.c     | 24 +++++++++---------------
 net/sched/sch_fq.c | 12 +-----------
 2 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 56818f7eab2b..5cf2762387aa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7111,12 +7111,10 @@ static int netif_alloc_rx_queues(struct net_device *dev)
 
 	BUG_ON(count < 1);
 
-	rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!rx) {
-		rx = vzalloc(sz);
-		if (!rx)
-			return -ENOMEM;
-	}
+	rx = kvzalloc(sz, GFP_KERNEL | __GFP_REPEAT);
+	if (!rx)
+		return -ENOMEM;
+
 	dev->_rx = rx;
 
 	for (i = 0; i < count; i++)
@@ -7153,12 +7151,10 @@ static int netif_alloc_netdev_queues(struct net_device *dev)
 	if (count < 1 || count > 0xffff)
 		return -EINVAL;
 
-	tx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!tx) {
-		tx = vzalloc(sz);
-		if (!tx)
-			return -ENOMEM;
-	}
+	tx = kvzalloc(sz, GFP_KERNEL | __GFP_REPEAT);
+	if (!tx)
+		return -ENOMEM;
+
 	dev->_tx = tx;
 
 	netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
@@ -7691,9 +7687,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	/* ensure 32-byte alignment of whole construct */
 	alloc_size += NETDEV_ALIGN - 1;
 
-	p = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!p)
-		p = vzalloc(alloc_size);
+	p = kvzalloc(alloc_size, GFP_KERNEL | __GFP_REPEAT);
 	if (!p)
 		return NULL;
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index a4f738ac7728..594f77d89f6c 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -624,16 +624,6 @@ static void fq_rehash(struct fq_sched_data *q,
 	q->stat_gc_flows += fcnt;
 }
 
-static void *fq_alloc_node(size_t sz, int node)
-{
-	void *ptr;
-
-	ptr = kmalloc_node(sz, GFP_KERNEL | __GFP_REPEAT | __GFP_NOWARN, node);
-	if (!ptr)
-		ptr = vmalloc_node(sz, node);
-	return ptr;
-}
-
 static void fq_free(void *addr)
 {
 	kvfree(addr);
@@ -650,7 +640,7 @@ static int fq_resize(struct Qdisc *sch, u32 log)
 		return 0;
 
 	/* If XPS was setup, we can allocate memory on right NUMA node */
-	array = fq_alloc_node(sizeof(struct rb_root) << log,
+	array = kvmalloc_node(sizeof(struct rb_root) << log, GFP_KERNEL | __GFP_REPEAT,
 			      netdev_queue_numa_node_read(sch->dev_queue));
 	if (!array)
 		return -ENOMEM;
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
  (?)
@ 2017-01-12 15:57     ` David Sterba
  -1 siblings, 0 replies; 180+ messages in thread
From: David Sterba @ 2017-01-12 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Colin Cross, Hariprasad S, Santosh Raspatur,
	Kees Cook, Johannes Weiner, Heiko Carstens, Martin Schwidefsky,
	Anton Vorontsov, Eric Dumazet, Ilya Dryomov, Kent Overstreet,
	Herbert Xu, David Rientjes, Andreas Dilger, Dan Williams,
	Oleg Drokin, Tony Luck, Alexei Starovoitov, linux-mm,
	Tariq Toukan, Yishai Hadas, Boris Ostrovsky, Ben Skeggs,
	Zheng Yan, Rafael J. Wysocki, Michal Hocko, Vlastimil Babka,
	Mel Gorman, LKML, netdev, Al Viro

On Thu, Jan 12, 2017 at 04:37:16PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
> 
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.

For the btrfs bits,

Acked-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 15:57     ` David Sterba
  0 siblings, 0 replies; 180+ messages in thread
From: David Sterba @ 2017-01-12 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Colin Cross, Hariprasad S, Santosh Raspatur,
	Kees Cook, Johannes Weiner, Heiko Carstens, Martin Schwidefsky,
	Anton Vorontsov, Eric Dumazet, Ilya Dryomov, Kent Overstreet,
	Herbert Xu, David Rientjes, Andreas Dilger, Dan Williams,
	Oleg Drokin, Tony Luck, Alexei Starovoitov, linux-mm,
	Tariq Toukan, Yishai Hadas, Boris Ostrovsky

On Thu, Jan 12, 2017 at 04:37:16PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
> 
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.

For the btrfs bits,

Acked-by: David Sterba <dsterba@suse.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 15:57     ` David Sterba
  0 siblings, 0 replies; 180+ messages in thread
From: David Sterba @ 2017-01-12 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Colin Cross, Hariprasad S, Santosh Raspatur,
	Kees Cook, Johannes Weiner, Heiko Carstens, Martin Schwidefsky,
	Anton Vorontsov, Eric Dumazet, Ilya Dryomov, Kent Overstreet,
	Herbert Xu, David Rientjes, Andreas Dilger, Dan Williams,
	Oleg Drokin, Tony Luck, Alexei Starovoitov, linux-mm,
	Tariq Toukan, Yishai Hadas, Boris Ostrovsky, Ben Skeggs,
	Zheng Yan, Rafael J. Wysocki, Michal Hocko, Vlastimil Babka,
	Mel Gorman, LKML, netdev, Al Viro

On Thu, Jan 12, 2017 at 04:37:16PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
> 
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.

For the btrfs bits,

Acked-by: David Sterba <dsterba@suse.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
  (?)
@ 2017-01-12 16:05     ` Christian Borntraeger
  -1 siblings, 0 replies; 180+ messages in thread
From: Christian Borntraeger @ 2017-01-12 16:05 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev

On 01/12/2017 04:37 PM, Michal Hocko wrote:
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>  	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>  		return -EINVAL;
> 
> -	keys = kmalloc_array(args->count, sizeof(uint8_t),
> -			     GFP_KERNEL | __GFP_NOWARN);
> -	if (!keys)
> -		keys = vmalloc(sizeof(uint8_t) * args->count);
> +	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
>  	if (!keys)
>  		return -ENOMEM;
> 
> @@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>  	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>  		return -EINVAL;
> 
> -	keys = kmalloc_array(args->count, sizeof(uint8_t),
> -			     GFP_KERNEL | __GFP_NOWARN);
> -	if (!keys)
> -		keys = vmalloc(sizeof(uint8_t) * args->count);
> +	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
>  	if (!keys)
>  		return -ENOMEM;

KVM/s390 parts

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 16:05     ` Christian Borntraeger
  0 siblings, 0 replies; 180+ messages in thread
From: Christian Borntraeger @ 2017-01-12 16:05 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq

On 01/12/2017 04:37 PM, Michal Hocko wrote:
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>  	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>  		return -EINVAL;
> 
> -	keys = kmalloc_array(args->count, sizeof(uint8_t),
> -			     GFP_KERNEL | __GFP_NOWARN);
> -	if (!keys)
> -		keys = vmalloc(sizeof(uint8_t) * args->count);
> +	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
>  	if (!keys)
>  		return -ENOMEM;
> 
> @@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>  	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>  		return -EINVAL;
> 
> -	keys = kmalloc_array(args->count, sizeof(uint8_t),
> -			     GFP_KERNEL | __GFP_NOWARN);
> -	if (!keys)
> -		keys = vmalloc(sizeof(uint8_t) * args->count);
> +	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
>  	if (!keys)
>  		return -ENOMEM;

KVM/s390 parts

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 16:05     ` Christian Borntraeger
  0 siblings, 0 replies; 180+ messages in thread
From: Christian Borntraeger @ 2017-01-12 16:05 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev

On 01/12/2017 04:37 PM, Michal Hocko wrote:
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>  	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>  		return -EINVAL;
> 
> -	keys = kmalloc_array(args->count, sizeof(uint8_t),
> -			     GFP_KERNEL | __GFP_NOWARN);
> -	if (!keys)
> -		keys = vmalloc(sizeof(uint8_t) * args->count);
> +	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
>  	if (!keys)
>  		return -ENOMEM;
> 
> @@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>  	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>  		return -EINVAL;
> 
> -	keys = kmalloc_array(args->count, sizeof(uint8_t),
> -			     GFP_KERNEL | __GFP_NOWARN);
> -	if (!keys)
> -		keys = vmalloc(sizeof(uint8_t) * args->count);
> +	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
>  	if (!keys)
>  		return -ENOMEM;

KVM/s390 parts

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
  2017-01-12 15:37   ` Michal Hocko
@ 2017-01-12 16:12     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 180+ messages in thread
From: Michael S. Tsirkin @ 2017-01-12 16:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko

On Thu, Jan 12, 2017 at 04:37:13PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
> vhost_vsock because it would really like to prefer kmalloc to the
> vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
> allocation to vmalloc") for more context. Michael Tsirkin has also
> noted:
> "
> __GFP_REPEAT overhead is during allocation time.  Using vmalloc means all
> accesses are slowed down.  Allocation is not on data path, accesses are.
> "
> 
> The similar applies to other vhost_kvzalloc users.
> 
> Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two
> things to be careful about. First we should prevent from the OOM killer
> and so have to involve __GFP_NORETRY by default and secondly override
> __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored
> for !costly orders.
> 
> Supporting __GFP_REPEAT like semantic for !costly request is possible
> it would require changes in the page allocator. This is out of scope of
> this patch.
> 
> This patch shouldn't introduce any functional change.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/vhost/net.c   |  9 +++------
>  drivers/vhost/vhost.c | 15 +++------------
>  drivers/vhost/vsock.c |  9 +++------
>  mm/util.c             | 17 ++++++++++++++---
>  4 files changed, 23 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc34653274a..105cd04c7414 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  	struct vhost_virtqueue **vqs;
>  	int i;
>  
> -	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!n) {
> -		n = vmalloc(sizeof *n);
> -		if (!n)
> -			return -ENOMEM;
> -	}
> +	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);
> +	if (!n)
> +		return -ENOMEM;
>  	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
>  		kvfree(n);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index d6432603880c..d2bf8a41f55e 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -515,18 +515,9 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
>  
> -static void *vhost_kvzalloc(unsigned long size)
> -{
> -	void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -
> -	if (!n)
> -		n = vzalloc(size);
> -	return n;
> -}
> -
>  struct vhost_umem *vhost_dev_reset_owner_prepare(void)
>  {
> -	return vhost_kvzalloc(sizeof(struct vhost_umem));
> +	return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
>  
> @@ -1190,7 +1181,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
>  
>  static struct vhost_umem *vhost_umem_alloc(void)
>  {
> -	struct vhost_umem *umem = vhost_kvzalloc(sizeof(*umem));
> +	struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL);
>  
>  	if (!umem)
>  		return NULL;
> @@ -1216,7 +1207,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  		return -EOPNOTSUPP;
>  	if (mem.nregions > max_mem_regions)
>  		return -E2BIG;
> -	newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions));
> +	newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
>  	if (!newmem)
>  		return -ENOMEM;
>  
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bbbf588540ed..7e0159867553 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -455,12 +455,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>  	/* This struct is large and allocation could fail, fall back to vmalloc
>  	 * if there is no other way.
>  	 */
> -	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!vsock) {
> -		vsock = vmalloc(sizeof(*vsock));
> -		if (!vsock)
> -			return -ENOMEM;
> -	}
> +	vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_REPEAT);
> +	if (!vsock)
> +		return -ENOMEM;
>  
>  	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
> diff --git a/mm/util.c b/mm/util.c
> index 7e0c240b5760..9306244b9f41 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
>   * Uses kmalloc to get the memory but if the allocation fails then falls back
>   * to the vmalloc allocator. Use kvfree for freeing the memory.
>   *
> - * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> + * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
> + * is supported only for large (>64kB) allocations
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> @@ -350,8 +351,18 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	 * Make sure that larger requests are not too disruptive - no OOM
>  	 * killer and no allocation failure warnings as we have a fallback
>  	 */
> -	if (size > PAGE_SIZE)
> -		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +	if (size > PAGE_SIZE) {
> +		kmalloc_flags |= __GFP_NOWARN;
> +
> +		/*
> +		 * We have to override __GFP_REPEAT by __GFP_NORETRY for !costly
> +		 * requests because there is no other way to tell the allocator
> +		 * that we want to fail rather than retry endlessly.
> +		 */
> +		if (!(kmalloc_flags & __GFP_REPEAT) ||
> +				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> +			kmalloc_flags |= __GFP_NORETRY;
> +	}
>  
>  	ret = kmalloc_node(size, kmalloc_flags, node);
>  
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
@ 2017-01-12 16:12     ` Michael S. Tsirkin
  0 siblings, 0 replies; 180+ messages in thread
From: Michael S. Tsirkin @ 2017-01-12 16:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko

On Thu, Jan 12, 2017 at 04:37:13PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
> vhost_vsock because it would really like to prefer kmalloc to the
> vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
> allocation to vmalloc") for more context. Michael Tsirkin has also
> noted:
> "
> __GFP_REPEAT overhead is during allocation time.  Using vmalloc means all
> accesses are slowed down.  Allocation is not on data path, accesses are.
> "
> 
> The similar applies to other vhost_kvzalloc users.
> 
> Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two
> things to be careful about. First we should prevent from the OOM killer
> and so have to involve __GFP_NORETRY by default and secondly override
> __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored
> for !costly orders.
> 
> Supporting __GFP_REPEAT like semantic for !costly request is possible
> it would require changes in the page allocator. This is out of scope of
> this patch.
> 
> This patch shouldn't introduce any functional change.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/vhost/net.c   |  9 +++------
>  drivers/vhost/vhost.c | 15 +++------------
>  drivers/vhost/vsock.c |  9 +++------
>  mm/util.c             | 17 ++++++++++++++---
>  4 files changed, 23 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc34653274a..105cd04c7414 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  	struct vhost_virtqueue **vqs;
>  	int i;
>  
> -	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!n) {
> -		n = vmalloc(sizeof *n);
> -		if (!n)
> -			return -ENOMEM;
> -	}
> +	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);
> +	if (!n)
> +		return -ENOMEM;
>  	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
>  		kvfree(n);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index d6432603880c..d2bf8a41f55e 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -515,18 +515,9 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
>  
> -static void *vhost_kvzalloc(unsigned long size)
> -{
> -	void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -
> -	if (!n)
> -		n = vzalloc(size);
> -	return n;
> -}
> -
>  struct vhost_umem *vhost_dev_reset_owner_prepare(void)
>  {
> -	return vhost_kvzalloc(sizeof(struct vhost_umem));
> +	return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
>  
> @@ -1190,7 +1181,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
>  
>  static struct vhost_umem *vhost_umem_alloc(void)
>  {
> -	struct vhost_umem *umem = vhost_kvzalloc(sizeof(*umem));
> +	struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL);
>  
>  	if (!umem)
>  		return NULL;
> @@ -1216,7 +1207,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  		return -EOPNOTSUPP;
>  	if (mem.nregions > max_mem_regions)
>  		return -E2BIG;
> -	newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions));
> +	newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
>  	if (!newmem)
>  		return -ENOMEM;
>  
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bbbf588540ed..7e0159867553 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -455,12 +455,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>  	/* This struct is large and allocation could fail, fall back to vmalloc
>  	 * if there is no other way.
>  	 */
> -	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!vsock) {
> -		vsock = vmalloc(sizeof(*vsock));
> -		if (!vsock)
> -			return -ENOMEM;
> -	}
> +	vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_REPEAT);
> +	if (!vsock)
> +		return -ENOMEM;
>  
>  	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
> diff --git a/mm/util.c b/mm/util.c
> index 7e0c240b5760..9306244b9f41 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
>   * Uses kmalloc to get the memory but if the allocation fails then falls back
>   * to the vmalloc allocator. Use kvfree for freeing the memory.
>   *
> - * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> + * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
> + * is supported only for large (>64kB) allocations
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> @@ -350,8 +351,18 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	 * Make sure that larger requests are not too disruptive - no OOM
>  	 * killer and no allocation failure warnings as we have a fallback
>  	 */
> -	if (size > PAGE_SIZE)
> -		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +	if (size > PAGE_SIZE) {
> +		kmalloc_flags |= __GFP_NOWARN;
> +
> +		/*
> +		 * We have to override __GFP_REPEAT by __GFP_NORETRY for !costly
> +		 * requests because there is no other way to tell the allocator
> +		 * that we want to fail rather than retry endlessly.
> +		 */
> +		if (!(kmalloc_flags & __GFP_REPEAT) ||
> +				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> +			kmalloc_flags |= __GFP_NORETRY;
> +	}
>  
>  	ret = kmalloc_node(size, kmalloc_flags, node);
>  
> -- 
> 2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
  (?)
@ 2017-01-12 16:54     ` Ilya Dryomov
  -1 siblings, 0 replies; 180+ messages in thread
From: Ilya Dryomov @ 2017-01-12 16:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Alexei Starovoitov,
	Eric Dumazet, netdev

On Thu, Jan 12, 2017 at 4:37 PM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
>  crypto/lzo.c                                       |  4 +--
>  drivers/acpi/apei/erst.c                           |  8 ++---
>  drivers/char/agp/generic.c                         |  8 +----
>  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
>  drivers/md/bcache/util.h                           | 12 ++------
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
>  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
>  drivers/nvdimm/dimm_devs.c                         |  5 +---
>  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
>  drivers/xen/evtchn.c                               | 14 +--------
>  fs/btrfs/ctree.c                                   |  9 ++----
>  fs/btrfs/ioctl.c                                   |  9 ++----
>  fs/btrfs/send.c                                    | 27 ++++++-----------
>  fs/ceph/file.c                                     |  9 ++----
>  fs/select.c                                        |  5 +---
>  fs/xattr.c                                         | 27 ++++++-----------
>  kernel/bpf/hashtab.c                               | 11 ++-----
>  lib/iov_iter.c                                     |  5 +---
>  mm/frame_vector.c                                  |  5 +---
>  net/ipv4/inet_hashtables.c                         |  6 +---
>  net/ipv4/tcp_metrics.c                             |  5 +---
>  net/mpls/af_mpls.c                                 |  5 +---
>  net/netfilter/x_tables.c                           | 34 ++++++----------------
>  net/netfilter/xt_recent.c                          |  5 +---
>  net/sched/sch_choke.c                              |  5 +---
>  net/sched/sch_fq_codel.c                           | 26 ++++-------------
>  net/sched/sch_hhf.c                                | 33 ++++++---------------
>  net/sched/sch_netem.c                              |  6 +---
>  net/sched/sch_sfq.c                                |  6 +---
>  security/keys/keyctl.c                             | 22 ++++----------
>  35 files changed, 96 insertions(+), 319 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>                 return -EINVAL;
>
> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> -                            GFP_KERNEL | __GFP_NOWARN);
> -       if (!keys)
> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
>         if (!keys)
>                 return -ENOMEM;
>
> @@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>                 return -EINVAL;
>
> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> -                            GFP_KERNEL | __GFP_NOWARN);
> -       if (!keys)
> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> +       keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
>         if (!keys)
>                 return -ENOMEM;
>
> diff --git a/crypto/lzo.c b/crypto/lzo.c
> index 168df784da84..218567d717d6 100644
> --- a/crypto/lzo.c
> +++ b/crypto/lzo.c
> @@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
>  {
>         void *ctx;
>
> -       ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
> -       if (!ctx)
> -               ctx = vmalloc(LZO1X_MEM_COMPRESS);
> +       ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
>         if (!ctx)
>                 return ERR_PTR(-ENOMEM);
>
> diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
> index ec4f507b524f..a2898df61744 100644
> --- a/drivers/acpi/apei/erst.c
> +++ b/drivers/acpi/apei/erst.c
> @@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
>         if (i < erst_record_id_cache.len)
>                 goto retry;
>         if (erst_record_id_cache.len >= erst_record_id_cache.size) {
> -               int new_size, alloc_size;
> +               int new_size;
>                 u64 *new_entries;
>
>                 new_size = erst_record_id_cache.size * 2;
> @@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
>                                 pr_warn(FW_WARN "too many record IDs!\n");
>                         return 0;
>                 }
> -               alloc_size = new_size * sizeof(entries[0]);
> -               if (alloc_size < PAGE_SIZE)
> -                       new_entries = kmalloc(alloc_size, GFP_KERNEL);
> -               else
> -                       new_entries = vmalloc(alloc_size);
> +               new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
>                 if (!new_entries)
>                         return -ENOMEM;
>                 memcpy(new_entries, entries,
> diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
> index f002fa5d1887..bdf418cac8ef 100644
> --- a/drivers/char/agp/generic.c
> +++ b/drivers/char/agp/generic.c
> @@ -88,13 +88,7 @@ static int agp_get_key(void)
>
>  void agp_alloc_page_array(size_t size, struct agp_memory *mem)
>  {
> -       mem->pages = NULL;
> -
> -       if (size <= 2*PAGE_SIZE)
> -               mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -       if (mem->pages == NULL) {
> -               mem->pages = vmalloc(size);
> -       }
> +       mem->pages = kvmalloc(size, GFP_KERNEL);
>  }
>  EXPORT_SYMBOL(agp_alloc_page_array);
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index 201b52b750dd..77dd73ff126f 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
>
>         size *= nmemb;
>
> -       mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -       if (!mem)
> -               mem = vmalloc(size);
> +       mem = kvmalloc(size, GFP_KERNEL);
>         if (!mem)
>                 return ERR_PTR(-ENOMEM);
>
> diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
> index cf2cbc211d83..d00bcb64d3a8 100644
> --- a/drivers/md/bcache/util.h
> +++ b/drivers/md/bcache/util.h
> @@ -43,11 +43,7 @@ struct closure;
>         (heap)->used = 0;                                               \
>         (heap)->size = (_size);                                         \
>         _bytes = (heap)->size * sizeof(*(heap)->data);                  \
> -       (heap)->data = NULL;                                            \
> -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> -               (heap)->data = kmalloc(_bytes, (gfp));                  \
> -       if ((!(heap)->data) && ((gfp) & GFP_KERNEL))                    \
> -               (heap)->data = vmalloc(_bytes);                         \
> +       (heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
>         (heap)->data;                                                   \
>  })
>
> @@ -136,12 +132,8 @@ do {                                                                       \
>                                                                         \
>         (fifo)->mask = _allocated_size - 1;                             \
>         (fifo)->front = (fifo)->back = 0;                               \
> -       (fifo)->data = NULL;                                            \
>                                                                         \
> -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> -               (fifo)->data = kmalloc(_bytes, (gfp));                  \
> -       if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))                    \
> -               (fifo)->data = vmalloc(_bytes);                         \
> +       (fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
>         (fifo)->data;                                                   \
>  })
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> index 920d918ed193..f04e81f33795 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> @@ -41,9 +41,6 @@
>
>  #define VALIDATE_TID 1
>
> -void *cxgb_alloc_mem(unsigned long size);
> -void cxgb_free_mem(void *addr);
> -
>  /*
>   * Map an ATID or STID to their entries in the corresponding TID tables.
>   */
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> index 76684dcb874c..606d4a3ade04 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> @@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
>  }
>
>  /*
> - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> - * The allocated memory is cleared.
> - */
> -void *cxgb_alloc_mem(unsigned long size)
> -{
> -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -
> -       if (!p)
> -               p = vzalloc(size);
> -       return p;
> -}
> -
> -/*
> - * Free memory allocated through t3_alloc_mem().
> - */
> -void cxgb_free_mem(void *addr)
> -{
> -       kvfree(addr);
> -}
> -
> -/*
>   * Allocate and initialize the TID tables.  Returns 0 on success.
>   */
>  static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> @@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
>         unsigned long size = ntids * sizeof(*t->tid_tab) +
>             natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
>
> -       t->tid_tab = cxgb_alloc_mem(size);
> +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
>         if (!t->tid_tab)
>                 return -ENOMEM;
>
> @@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
>
>  static void free_tid_maps(struct tid_info *t)
>  {
> -       cxgb_free_mem(t->tid_tab);
> +       kvfree(t->tid_tab);
>  }
>
>  static inline void add_adapter(struct adapter *adap)
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> index 5f226eda8cd6..c9b06501ee0c 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> @@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
>         struct l2t_data *d;
>         int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
>
> -       d = cxgb_alloc_mem(size);
> +       d = kvmalloc(size, GFP_KERNEL);
>         if (!d)
>                 return NULL;
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index 6f951877430b..671695cb3c15 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
>         return err;
>  }
>
> -/*
> - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> - * The allocated memory is cleared.
> - */
> -void *t4_alloc_mem(size_t size)
> -{
> -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -
> -       if (!p)
> -               p = vzalloc(size);
> -       return p;
> -}
> -
> -/*
> - * Free memory allocated through alloc_mem().
> - */
> -void t4_free_mem(void *addr)
> -{
> -       kvfree(addr);
> -}
> -
>  static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
>                              void *accel_priv, select_queue_fallback_t fallback)
>  {
> @@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
>                max_ftids * sizeof(*t->ftid_tab) +
>                ftid_bmap_size * sizeof(long);
>
> -       t->tid_tab = t4_alloc_mem(size);
> +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
>         if (!t->tid_tab)
>                 return -ENOMEM;
>
> @@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
>                 /* allocate memory to read the header of the firmware on the
>                  * card
>                  */
> -               card_fw = t4_alloc_mem(sizeof(*card_fw));
> +               card_fw = kvmalloc(sizeof(*card_fw), GFP_KERNEL);
>
>                 /* Get FW from from /lib/firmware/ */
>                 ret = request_firmware(&fw, fw_info->fw_mod_name,
> @@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
>
>                 /* Cleaning up */
>                 release_firmware(fw);
> -               t4_free_mem(card_fw);
> +               kvfree(card_fw);
>
>                 if (ret < 0)
>                         goto bye;
> @@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
>  {
>         unsigned int i;
>
> -       t4_free_mem(adapter->l2t);
> +       kvfree(adapter->l2t);
>         t4_cleanup_sched(adapter);
> -       t4_free_mem(adapter->tids.tid_tab);
> +       kvfree(adapter->tids.tid_tab);
>         cxgb4_cleanup_tc_u32(adapter);
>         kfree(adapter->sge.egr_map);
>         kfree(adapter->sge.ingr_map);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 5886ad78058f..a5c1b815145e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
>         ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
>
>         tmp = size * sizeof(struct mlx4_en_tx_info);
> -       ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
> +       ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
>         if (!ring->tx_info) {
> -               ring->tx_info = vmalloc(tmp);
> -               if (!ring->tx_info) {
> -                       err = -ENOMEM;
> -                       goto err_ring;
> -               }
> +               err = -ENOMEM;
> +               goto err_ring;
>         }
>
>         en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
> index 395b5463cfd9..82354fd0a87e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mr.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
> @@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
>
>         for (i = 0; i <= buddy->max_order; ++i) {
>                 s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> -               buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
> -               if (!buddy->bits[i]) {
> -                       buddy->bits[i] = vzalloc(s * sizeof(long));
> -                       if (!buddy->bits[i])
> -                               goto err_out_free;
> -               }
> +               buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
> +               if (!buddy->bits[i])
> +                       goto err_out_free;
>         }
>
>         set_bit(0, buddy->bits[buddy->max_order]);
> diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> index 0eedc49e0d47..3bd332b167d9 100644
> --- a/drivers/nvdimm/dimm_devs.c
> +++ b/drivers/nvdimm/dimm_devs.c
> @@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
>                 return -ENXIO;
>         }
>
> -       ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
> -       if (!ndd->data)
> -               ndd->data = vmalloc(ndd->nsarea.config_size);
> -
> +       ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
>         if (!ndd->data)
>                 return -ENOMEM;
>
> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> index a6a76a681ea9..8f638267e704 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
>  void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
>                           gfp_t flags)
>  {
> -       void *ret;
> -
> -       ret = kzalloc_node(size, flags | __GFP_NOWARN,
> -                          cfs_cpt_spread_node(cptab, cpt));
> -       if (!ret) {
> -               WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
> -               ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
> -       }
> -
> -       return ret;
> +       return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
>  }
>  EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 6890897a6f30..10f1ef582659 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -87,18 +87,6 @@ struct user_evtchn {
>         bool enabled;
>  };
>
> -static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
> -{
> -       evtchn_port_t *ring;
> -       size_t s = size * sizeof(*ring);
> -
> -       ring = kmalloc(s, GFP_KERNEL);
> -       if (!ring)
> -               ring = vmalloc(s);
> -
> -       return ring;
> -}
> -
>  static void evtchn_free_ring(evtchn_port_t *ring)
>  {
>         kvfree(ring);
> @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
>         else
>                 new_size = 2 * u->ring_size;
>
> -       new_ring = evtchn_alloc_ring(new_size);
> +       new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
>         if (!new_ring)
>                 return -ENOMEM;
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 146b2dc0d2cf..4fc9712d927d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
>                 goto out;
>         }
>
> -       tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> +       tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
>         if (!tmp_buf) {
> -               tmp_buf = vmalloc(fs_info->nodesize);
> -               if (!tmp_buf) {
> -                       ret = -ENOMEM;
> -                       goto out;
> -               }
> +               ret = -ENOMEM;
> +               goto out;
>         }
>
>         left_path->search_commit_root = 1;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 77dabfed3a5d..6f0b488c7428 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
>         u64 last_dest_end = destoff;
>
>         ret = -ENOMEM;
> -       buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> -       if (!buf) {
> -               buf = vmalloc(fs_info->nodesize);
> -               if (!buf)
> -                       return ret;
> -       }
> +       buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
> +       if (!buf)
> +               return ret;
>
>         path = btrfs_alloc_path();
>         if (!path) {
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index d145ce804620..0621ca2a7b5d 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
>         sctx->clone_roots_cnt = arg->clone_sources_count;
>
>         sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
> -       sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
> +       sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
>         if (!sctx->send_buf) {
> -               sctx->send_buf = vmalloc(sctx->send_max_size);
> -               if (!sctx->send_buf) {
> -                       ret = -ENOMEM;
> -                       goto out;
> -               }
> +               ret = -ENOMEM;
> +               goto out;
>         }
>
> -       sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
> +       sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
>         if (!sctx->read_buf) {
> -               sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
> -               if (!sctx->read_buf) {
> -                       ret = -ENOMEM;
> -                       goto out;
> -               }
> +               ret = -ENOMEM;
> +               goto out;
>         }
>
>         sctx->pending_dir_moves = RB_ROOT;
> @@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
>         alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
>
>         if (arg->clone_sources_count) {
> -               clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
> +               clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
>                 if (!clone_sources_tmp) {
> -                       clone_sources_tmp = vmalloc(alloc_size);
> -                       if (!clone_sources_tmp) {
> -                               ret = -ENOMEM;
> -                               goto out;
> -                       }
> +                       ret = -ENOMEM;
> +                       goto out;
>                 }
>
>                 ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 045d30d26624..78b18acf33ba 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
>         align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
>                 (PAGE_SIZE - 1);
>         npages = calc_pages_for(align, nbytes);
> -       pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> -       if (!pages) {
> -               pages = vmalloc(sizeof(*pages) * npages);
> -               if (!pages)
> -                       return ERR_PTR(-ENOMEM);
> -       }
> +       pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> +       if (!pages)
> +               return ERR_PTR(-ENOMEM);

ceph hunk looks fine:

Acked-by: Ilya Dryomov <idryomov@gmail.com>

However I noticed that in some cases you've dropped the zeroing part:
fq_codel_init() and hhf_zalloc() zeroed both k and v, and some others
were inconsistent and zeroed only k.  Given that the fallback branch
was probably dead, I'd keep the k behaviour.  Was that intentional?

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 16:54     ` Ilya Dryomov
  0 siblings, 0 replies; 180+ messages in thread
From: Ilya Dryomov @ 2017-01-12 16:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams

On Thu, Jan 12, 2017 at 4:37 PM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
>  crypto/lzo.c                                       |  4 +--
>  drivers/acpi/apei/erst.c                           |  8 ++---
>  drivers/char/agp/generic.c                         |  8 +----
>  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
>  drivers/md/bcache/util.h                           | 12 ++------
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
>  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
>  drivers/nvdimm/dimm_devs.c                         |  5 +---
>  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
>  drivers/xen/evtchn.c                               | 14 +--------
>  fs/btrfs/ctree.c                                   |  9 ++----
>  fs/btrfs/ioctl.c                                   |  9 ++----
>  fs/btrfs/send.c                                    | 27 ++++++-----------
>  fs/ceph/file.c                                     |  9 ++----
>  fs/select.c                                        |  5 +---
>  fs/xattr.c                                         | 27 ++++++-----------
>  kernel/bpf/hashtab.c                               | 11 ++-----
>  lib/iov_iter.c                                     |  5 +---
>  mm/frame_vector.c                                  |  5 +---
>  net/ipv4/inet_hashtables.c                         |  6 +---
>  net/ipv4/tcp_metrics.c                             |  5 +---
>  net/mpls/af_mpls.c                                 |  5 +---
>  net/netfilter/x_tables.c                           | 34 ++++++----------------
>  net/netfilter/xt_recent.c                          |  5 +---
>  net/sched/sch_choke.c                              |  5 +---
>  net/sched/sch_fq_codel.c                           | 26 ++++-------------
>  net/sched/sch_hhf.c                                | 33 ++++++---------------
>  net/sched/sch_netem.c                              |  6 +---
>  net/sched/sch_sfq.c                                |  6 +---
>  security/keys/keyctl.c                             | 22 ++++----------
>  35 files changed, 96 insertions(+), 319 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>                 return -EINVAL;
>
> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> -                            GFP_KERNEL | __GFP_NOWARN);
> -       if (!keys)
> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
>         if (!keys)
>                 return -ENOMEM;
>
> @@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>                 return -EINVAL;
>
> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> -                            GFP_KERNEL | __GFP_NOWARN);
> -       if (!keys)
> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> +       keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
>         if (!keys)
>                 return -ENOMEM;
>
> diff --git a/crypto/lzo.c b/crypto/lzo.c
> index 168df784da84..218567d717d6 100644
> --- a/crypto/lzo.c
> +++ b/crypto/lzo.c
> @@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
>  {
>         void *ctx;
>
> -       ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
> -       if (!ctx)
> -               ctx = vmalloc(LZO1X_MEM_COMPRESS);
> +       ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
>         if (!ctx)
>                 return ERR_PTR(-ENOMEM);
>
> diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
> index ec4f507b524f..a2898df61744 100644
> --- a/drivers/acpi/apei/erst.c
> +++ b/drivers/acpi/apei/erst.c
> @@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
>         if (i < erst_record_id_cache.len)
>                 goto retry;
>         if (erst_record_id_cache.len >= erst_record_id_cache.size) {
> -               int new_size, alloc_size;
> +               int new_size;
>                 u64 *new_entries;
>
>                 new_size = erst_record_id_cache.size * 2;
> @@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
>                                 pr_warn(FW_WARN "too many record IDs!\n");
>                         return 0;
>                 }
> -               alloc_size = new_size * sizeof(entries[0]);
> -               if (alloc_size < PAGE_SIZE)
> -                       new_entries = kmalloc(alloc_size, GFP_KERNEL);
> -               else
> -                       new_entries = vmalloc(alloc_size);
> +               new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
>                 if (!new_entries)
>                         return -ENOMEM;
>                 memcpy(new_entries, entries,
> diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
> index f002fa5d1887..bdf418cac8ef 100644
> --- a/drivers/char/agp/generic.c
> +++ b/drivers/char/agp/generic.c
> @@ -88,13 +88,7 @@ static int agp_get_key(void)
>
>  void agp_alloc_page_array(size_t size, struct agp_memory *mem)
>  {
> -       mem->pages = NULL;
> -
> -       if (size <= 2*PAGE_SIZE)
> -               mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -       if (mem->pages == NULL) {
> -               mem->pages = vmalloc(size);
> -       }
> +       mem->pages = kvmalloc(size, GFP_KERNEL);
>  }
>  EXPORT_SYMBOL(agp_alloc_page_array);
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index 201b52b750dd..77dd73ff126f 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
>
>         size *= nmemb;
>
> -       mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -       if (!mem)
> -               mem = vmalloc(size);
> +       mem = kvmalloc(size, GFP_KERNEL);
>         if (!mem)
>                 return ERR_PTR(-ENOMEM);
>
> diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
> index cf2cbc211d83..d00bcb64d3a8 100644
> --- a/drivers/md/bcache/util.h
> +++ b/drivers/md/bcache/util.h
> @@ -43,11 +43,7 @@ struct closure;
>         (heap)->used = 0;                                               \
>         (heap)->size = (_size);                                         \
>         _bytes = (heap)->size * sizeof(*(heap)->data);                  \
> -       (heap)->data = NULL;                                            \
> -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> -               (heap)->data = kmalloc(_bytes, (gfp));                  \
> -       if ((!(heap)->data) && ((gfp) & GFP_KERNEL))                    \
> -               (heap)->data = vmalloc(_bytes);                         \
> +       (heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
>         (heap)->data;                                                   \
>  })
>
> @@ -136,12 +132,8 @@ do {                                                                       \
>                                                                         \
>         (fifo)->mask = _allocated_size - 1;                             \
>         (fifo)->front = (fifo)->back = 0;                               \
> -       (fifo)->data = NULL;                                            \
>                                                                         \
> -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> -               (fifo)->data = kmalloc(_bytes, (gfp));                  \
> -       if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))                    \
> -               (fifo)->data = vmalloc(_bytes);                         \
> +       (fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
>         (fifo)->data;                                                   \
>  })
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> index 920d918ed193..f04e81f33795 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> @@ -41,9 +41,6 @@
>
>  #define VALIDATE_TID 1
>
> -void *cxgb_alloc_mem(unsigned long size);
> -void cxgb_free_mem(void *addr);
> -
>  /*
>   * Map an ATID or STID to their entries in the corresponding TID tables.
>   */
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> index 76684dcb874c..606d4a3ade04 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> @@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
>  }
>
>  /*
> - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> - * The allocated memory is cleared.
> - */
> -void *cxgb_alloc_mem(unsigned long size)
> -{
> -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -
> -       if (!p)
> -               p = vzalloc(size);
> -       return p;
> -}
> -
> -/*
> - * Free memory allocated through t3_alloc_mem().
> - */
> -void cxgb_free_mem(void *addr)
> -{
> -       kvfree(addr);
> -}
> -
> -/*
>   * Allocate and initialize the TID tables.  Returns 0 on success.
>   */
>  static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> @@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
>         unsigned long size = ntids * sizeof(*t->tid_tab) +
>             natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
>
> -       t->tid_tab = cxgb_alloc_mem(size);
> +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
>         if (!t->tid_tab)
>                 return -ENOMEM;
>
> @@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
>
>  static void free_tid_maps(struct tid_info *t)
>  {
> -       cxgb_free_mem(t->tid_tab);
> +       kvfree(t->tid_tab);
>  }
>
>  static inline void add_adapter(struct adapter *adap)
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> index 5f226eda8cd6..c9b06501ee0c 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> @@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
>         struct l2t_data *d;
>         int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
>
> -       d = cxgb_alloc_mem(size);
> +       d = kvmalloc(size, GFP_KERNEL);
>         if (!d)
>                 return NULL;
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index 6f951877430b..671695cb3c15 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
>         return err;
>  }
>
> -/*
> - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> - * The allocated memory is cleared.
> - */
> -void *t4_alloc_mem(size_t size)
> -{
> -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -
> -       if (!p)
> -               p = vzalloc(size);
> -       return p;
> -}
> -
> -/*
> - * Free memory allocated through alloc_mem().
> - */
> -void t4_free_mem(void *addr)
> -{
> -       kvfree(addr);
> -}
> -
>  static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
>                              void *accel_priv, select_queue_fallback_t fallback)
>  {
> @@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
>                max_ftids * sizeof(*t->ftid_tab) +
>                ftid_bmap_size * sizeof(long);
>
> -       t->tid_tab = t4_alloc_mem(size);
> +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
>         if (!t->tid_tab)
>                 return -ENOMEM;
>
> @@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
>                 /* allocate memory to read the header of the firmware on the
>                  * card
>                  */
> -               card_fw = t4_alloc_mem(sizeof(*card_fw));
> +               card_fw = kvmalloc(sizeof(*card_fw), GFP_KERNEL);
>
>                 /* Get FW from from /lib/firmware/ */
>                 ret = request_firmware(&fw, fw_info->fw_mod_name,
> @@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
>
>                 /* Cleaning up */
>                 release_firmware(fw);
> -               t4_free_mem(card_fw);
> +               kvfree(card_fw);
>
>                 if (ret < 0)
>                         goto bye;
> @@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
>  {
>         unsigned int i;
>
> -       t4_free_mem(adapter->l2t);
> +       kvfree(adapter->l2t);
>         t4_cleanup_sched(adapter);
> -       t4_free_mem(adapter->tids.tid_tab);
> +       kvfree(adapter->tids.tid_tab);
>         cxgb4_cleanup_tc_u32(adapter);
>         kfree(adapter->sge.egr_map);
>         kfree(adapter->sge.ingr_map);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 5886ad78058f..a5c1b815145e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
>         ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
>
>         tmp = size * sizeof(struct mlx4_en_tx_info);
> -       ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
> +       ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
>         if (!ring->tx_info) {
> -               ring->tx_info = vmalloc(tmp);
> -               if (!ring->tx_info) {
> -                       err = -ENOMEM;
> -                       goto err_ring;
> -               }
> +               err = -ENOMEM;
> +               goto err_ring;
>         }
>
>         en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
> index 395b5463cfd9..82354fd0a87e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mr.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
> @@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
>
>         for (i = 0; i <= buddy->max_order; ++i) {
>                 s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> -               buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
> -               if (!buddy->bits[i]) {
> -                       buddy->bits[i] = vzalloc(s * sizeof(long));
> -                       if (!buddy->bits[i])
> -                               goto err_out_free;
> -               }
> +               buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
> +               if (!buddy->bits[i])
> +                       goto err_out_free;
>         }
>
>         set_bit(0, buddy->bits[buddy->max_order]);
> diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> index 0eedc49e0d47..3bd332b167d9 100644
> --- a/drivers/nvdimm/dimm_devs.c
> +++ b/drivers/nvdimm/dimm_devs.c
> @@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
>                 return -ENXIO;
>         }
>
> -       ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
> -       if (!ndd->data)
> -               ndd->data = vmalloc(ndd->nsarea.config_size);
> -
> +       ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
>         if (!ndd->data)
>                 return -ENOMEM;
>
> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> index a6a76a681ea9..8f638267e704 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
>  void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
>                           gfp_t flags)
>  {
> -       void *ret;
> -
> -       ret = kzalloc_node(size, flags | __GFP_NOWARN,
> -                          cfs_cpt_spread_node(cptab, cpt));
> -       if (!ret) {
> -               WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
> -               ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
> -       }
> -
> -       return ret;
> +       return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
>  }
>  EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 6890897a6f30..10f1ef582659 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -87,18 +87,6 @@ struct user_evtchn {
>         bool enabled;
>  };
>
> -static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
> -{
> -       evtchn_port_t *ring;
> -       size_t s = size * sizeof(*ring);
> -
> -       ring = kmalloc(s, GFP_KERNEL);
> -       if (!ring)
> -               ring = vmalloc(s);
> -
> -       return ring;
> -}
> -
>  static void evtchn_free_ring(evtchn_port_t *ring)
>  {
>         kvfree(ring);
> @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
>         else
>                 new_size = 2 * u->ring_size;
>
> -       new_ring = evtchn_alloc_ring(new_size);
> +       new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
>         if (!new_ring)
>                 return -ENOMEM;
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 146b2dc0d2cf..4fc9712d927d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
>                 goto out;
>         }
>
> -       tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> +       tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
>         if (!tmp_buf) {
> -               tmp_buf = vmalloc(fs_info->nodesize);
> -               if (!tmp_buf) {
> -                       ret = -ENOMEM;
> -                       goto out;
> -               }
> +               ret = -ENOMEM;
> +               goto out;
>         }
>
>         left_path->search_commit_root = 1;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 77dabfed3a5d..6f0b488c7428 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
>         u64 last_dest_end = destoff;
>
>         ret = -ENOMEM;
> -       buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> -       if (!buf) {
> -               buf = vmalloc(fs_info->nodesize);
> -               if (!buf)
> -                       return ret;
> -       }
> +       buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
> +       if (!buf)
> +               return ret;
>
>         path = btrfs_alloc_path();
>         if (!path) {
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index d145ce804620..0621ca2a7b5d 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
>         sctx->clone_roots_cnt = arg->clone_sources_count;
>
>         sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
> -       sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
> +       sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
>         if (!sctx->send_buf) {
> -               sctx->send_buf = vmalloc(sctx->send_max_size);
> -               if (!sctx->send_buf) {
> -                       ret = -ENOMEM;
> -                       goto out;
> -               }
> +               ret = -ENOMEM;
> +               goto out;
>         }
>
> -       sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
> +       sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
>         if (!sctx->read_buf) {
> -               sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
> -               if (!sctx->read_buf) {
> -                       ret = -ENOMEM;
> -                       goto out;
> -               }
> +               ret = -ENOMEM;
> +               goto out;
>         }
>
>         sctx->pending_dir_moves = RB_ROOT;
> @@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
>         alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
>
>         if (arg->clone_sources_count) {
> -               clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
> +               clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
>                 if (!clone_sources_tmp) {
> -                       clone_sources_tmp = vmalloc(alloc_size);
> -                       if (!clone_sources_tmp) {
> -                               ret = -ENOMEM;
> -                               goto out;
> -                       }
> +                       ret = -ENOMEM;
> +                       goto out;
>                 }
>
>                 ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 045d30d26624..78b18acf33ba 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
>         align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
>                 (PAGE_SIZE - 1);
>         npages = calc_pages_for(align, nbytes);
> -       pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> -       if (!pages) {
> -               pages = vmalloc(sizeof(*pages) * npages);
> -               if (!pages)
> -                       return ERR_PTR(-ENOMEM);
> -       }
> +       pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> +       if (!pages)
> +               return ERR_PTR(-ENOMEM);

ceph hunk looks fine:

Acked-by: Ilya Dryomov <idryomov@gmail.com>

However I noticed that in some cases you've dropped the zeroing part:
fq_codel_init() and hhf_zalloc() zeroed both k and v, and some others
were inconsistent and zeroed only k.  Given that the fallback branch
was probably dead, I'd keep the k behaviour.  Was that intentional?

Thanks,

                Ilya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 16:54     ` Ilya Dryomov
  0 siblings, 0 replies; 180+ messages in thread
From: Ilya Dryomov @ 2017-01-12 16:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Alexei Starovoitov,
	Eric Dumazet, netdev

On Thu, Jan 12, 2017 at 4:37 PM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
>  crypto/lzo.c                                       |  4 +--
>  drivers/acpi/apei/erst.c                           |  8 ++---
>  drivers/char/agp/generic.c                         |  8 +----
>  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
>  drivers/md/bcache/util.h                           | 12 ++------
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
>  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
>  drivers/nvdimm/dimm_devs.c                         |  5 +---
>  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
>  drivers/xen/evtchn.c                               | 14 +--------
>  fs/btrfs/ctree.c                                   |  9 ++----
>  fs/btrfs/ioctl.c                                   |  9 ++----
>  fs/btrfs/send.c                                    | 27 ++++++-----------
>  fs/ceph/file.c                                     |  9 ++----
>  fs/select.c                                        |  5 +---
>  fs/xattr.c                                         | 27 ++++++-----------
>  kernel/bpf/hashtab.c                               | 11 ++-----
>  lib/iov_iter.c                                     |  5 +---
>  mm/frame_vector.c                                  |  5 +---
>  net/ipv4/inet_hashtables.c                         |  6 +---
>  net/ipv4/tcp_metrics.c                             |  5 +---
>  net/mpls/af_mpls.c                                 |  5 +---
>  net/netfilter/x_tables.c                           | 34 ++++++----------------
>  net/netfilter/xt_recent.c                          |  5 +---
>  net/sched/sch_choke.c                              |  5 +---
>  net/sched/sch_fq_codel.c                           | 26 ++++-------------
>  net/sched/sch_hhf.c                                | 33 ++++++---------------
>  net/sched/sch_netem.c                              |  6 +---
>  net/sched/sch_sfq.c                                |  6 +---
>  security/keys/keyctl.c                             | 22 ++++----------
>  35 files changed, 96 insertions(+), 319 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>                 return -EINVAL;
>
> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> -                            GFP_KERNEL | __GFP_NOWARN);
> -       if (!keys)
> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
>         if (!keys)
>                 return -ENOMEM;
>
> @@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>                 return -EINVAL;
>
> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> -                            GFP_KERNEL | __GFP_NOWARN);
> -       if (!keys)
> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> +       keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
>         if (!keys)
>                 return -ENOMEM;
>
> diff --git a/crypto/lzo.c b/crypto/lzo.c
> index 168df784da84..218567d717d6 100644
> --- a/crypto/lzo.c
> +++ b/crypto/lzo.c
> @@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
>  {
>         void *ctx;
>
> -       ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
> -       if (!ctx)
> -               ctx = vmalloc(LZO1X_MEM_COMPRESS);
> +       ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
>         if (!ctx)
>                 return ERR_PTR(-ENOMEM);
>
> diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
> index ec4f507b524f..a2898df61744 100644
> --- a/drivers/acpi/apei/erst.c
> +++ b/drivers/acpi/apei/erst.c
> @@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
>         if (i < erst_record_id_cache.len)
>                 goto retry;
>         if (erst_record_id_cache.len >= erst_record_id_cache.size) {
> -               int new_size, alloc_size;
> +               int new_size;
>                 u64 *new_entries;
>
>                 new_size = erst_record_id_cache.size * 2;
> @@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
>                                 pr_warn(FW_WARN "too many record IDs!\n");
>                         return 0;
>                 }
> -               alloc_size = new_size * sizeof(entries[0]);
> -               if (alloc_size < PAGE_SIZE)
> -                       new_entries = kmalloc(alloc_size, GFP_KERNEL);
> -               else
> -                       new_entries = vmalloc(alloc_size);
> +               new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
>                 if (!new_entries)
>                         return -ENOMEM;
>                 memcpy(new_entries, entries,
> diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
> index f002fa5d1887..bdf418cac8ef 100644
> --- a/drivers/char/agp/generic.c
> +++ b/drivers/char/agp/generic.c
> @@ -88,13 +88,7 @@ static int agp_get_key(void)
>
>  void agp_alloc_page_array(size_t size, struct agp_memory *mem)
>  {
> -       mem->pages = NULL;
> -
> -       if (size <= 2*PAGE_SIZE)
> -               mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -       if (mem->pages == NULL) {
> -               mem->pages = vmalloc(size);
> -       }
> +       mem->pages = kvmalloc(size, GFP_KERNEL);
>  }
>  EXPORT_SYMBOL(agp_alloc_page_array);
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index 201b52b750dd..77dd73ff126f 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
>
>         size *= nmemb;
>
> -       mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -       if (!mem)
> -               mem = vmalloc(size);
> +       mem = kvmalloc(size, GFP_KERNEL);
>         if (!mem)
>                 return ERR_PTR(-ENOMEM);
>
> diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
> index cf2cbc211d83..d00bcb64d3a8 100644
> --- a/drivers/md/bcache/util.h
> +++ b/drivers/md/bcache/util.h
> @@ -43,11 +43,7 @@ struct closure;
>         (heap)->used = 0;                                               \
>         (heap)->size = (_size);                                         \
>         _bytes = (heap)->size * sizeof(*(heap)->data);                  \
> -       (heap)->data = NULL;                                            \
> -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> -               (heap)->data = kmalloc(_bytes, (gfp));                  \
> -       if ((!(heap)->data) && ((gfp) & GFP_KERNEL))                    \
> -               (heap)->data = vmalloc(_bytes);                         \
> +       (heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
>         (heap)->data;                                                   \
>  })
>
> @@ -136,12 +132,8 @@ do {                                                                       \
>                                                                         \
>         (fifo)->mask = _allocated_size - 1;                             \
>         (fifo)->front = (fifo)->back = 0;                               \
> -       (fifo)->data = NULL;                                            \
>                                                                         \
> -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> -               (fifo)->data = kmalloc(_bytes, (gfp));                  \
> -       if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))                    \
> -               (fifo)->data = vmalloc(_bytes);                         \
> +       (fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
>         (fifo)->data;                                                   \
>  })
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> index 920d918ed193..f04e81f33795 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> @@ -41,9 +41,6 @@
>
>  #define VALIDATE_TID 1
>
> -void *cxgb_alloc_mem(unsigned long size);
> -void cxgb_free_mem(void *addr);
> -
>  /*
>   * Map an ATID or STID to their entries in the corresponding TID tables.
>   */
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> index 76684dcb874c..606d4a3ade04 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> @@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
>  }
>
>  /*
> - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> - * The allocated memory is cleared.
> - */
> -void *cxgb_alloc_mem(unsigned long size)
> -{
> -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -
> -       if (!p)
> -               p = vzalloc(size);
> -       return p;
> -}
> -
> -/*
> - * Free memory allocated through t3_alloc_mem().
> - */
> -void cxgb_free_mem(void *addr)
> -{
> -       kvfree(addr);
> -}
> -
> -/*
>   * Allocate and initialize the TID tables.  Returns 0 on success.
>   */
>  static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> @@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
>         unsigned long size = ntids * sizeof(*t->tid_tab) +
>             natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
>
> -       t->tid_tab = cxgb_alloc_mem(size);
> +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
>         if (!t->tid_tab)
>                 return -ENOMEM;
>
> @@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
>
>  static void free_tid_maps(struct tid_info *t)
>  {
> -       cxgb_free_mem(t->tid_tab);
> +       kvfree(t->tid_tab);
>  }
>
>  static inline void add_adapter(struct adapter *adap)
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> index 5f226eda8cd6..c9b06501ee0c 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> @@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
>         struct l2t_data *d;
>         int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
>
> -       d = cxgb_alloc_mem(size);
> +       d = kvmalloc(size, GFP_KERNEL);
>         if (!d)
>                 return NULL;
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index 6f951877430b..671695cb3c15 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
>         return err;
>  }
>
> -/*
> - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> - * The allocated memory is cleared.
> - */
> -void *t4_alloc_mem(size_t size)
> -{
> -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -
> -       if (!p)
> -               p = vzalloc(size);
> -       return p;
> -}
> -
> -/*
> - * Free memory allocated through alloc_mem().
> - */
> -void t4_free_mem(void *addr)
> -{
> -       kvfree(addr);
> -}
> -
>  static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
>                              void *accel_priv, select_queue_fallback_t fallback)
>  {
> @@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
>                max_ftids * sizeof(*t->ftid_tab) +
>                ftid_bmap_size * sizeof(long);
>
> -       t->tid_tab = t4_alloc_mem(size);
> +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
>         if (!t->tid_tab)
>                 return -ENOMEM;
>
> @@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
>                 /* allocate memory to read the header of the firmware on the
>                  * card
>                  */
> -               card_fw = t4_alloc_mem(sizeof(*card_fw));
> +               card_fw = kvmalloc(sizeof(*card_fw), GFP_KERNEL);
>
>                 /* Get FW from from /lib/firmware/ */
>                 ret = request_firmware(&fw, fw_info->fw_mod_name,
> @@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
>
>                 /* Cleaning up */
>                 release_firmware(fw);
> -               t4_free_mem(card_fw);
> +               kvfree(card_fw);
>
>                 if (ret < 0)
>                         goto bye;
> @@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
>  {
>         unsigned int i;
>
> -       t4_free_mem(adapter->l2t);
> +       kvfree(adapter->l2t);
>         t4_cleanup_sched(adapter);
> -       t4_free_mem(adapter->tids.tid_tab);
> +       kvfree(adapter->tids.tid_tab);
>         cxgb4_cleanup_tc_u32(adapter);
>         kfree(adapter->sge.egr_map);
>         kfree(adapter->sge.ingr_map);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 5886ad78058f..a5c1b815145e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
>         ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
>
>         tmp = size * sizeof(struct mlx4_en_tx_info);
> -       ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
> +       ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
>         if (!ring->tx_info) {
> -               ring->tx_info = vmalloc(tmp);
> -               if (!ring->tx_info) {
> -                       err = -ENOMEM;
> -                       goto err_ring;
> -               }
> +               err = -ENOMEM;
> +               goto err_ring;
>         }
>
>         en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
> index 395b5463cfd9..82354fd0a87e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mr.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
> @@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
>
>         for (i = 0; i <= buddy->max_order; ++i) {
>                 s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> -               buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
> -               if (!buddy->bits[i]) {
> -                       buddy->bits[i] = vzalloc(s * sizeof(long));
> -                       if (!buddy->bits[i])
> -                               goto err_out_free;
> -               }
> +               buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
> +               if (!buddy->bits[i])
> +                       goto err_out_free;
>         }
>
>         set_bit(0, buddy->bits[buddy->max_order]);
> diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> index 0eedc49e0d47..3bd332b167d9 100644
> --- a/drivers/nvdimm/dimm_devs.c
> +++ b/drivers/nvdimm/dimm_devs.c
> @@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
>                 return -ENXIO;
>         }
>
> -       ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
> -       if (!ndd->data)
> -               ndd->data = vmalloc(ndd->nsarea.config_size);
> -
> +       ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
>         if (!ndd->data)
>                 return -ENOMEM;
>
> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> index a6a76a681ea9..8f638267e704 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
>  void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
>                           gfp_t flags)
>  {
> -       void *ret;
> -
> -       ret = kzalloc_node(size, flags | __GFP_NOWARN,
> -                          cfs_cpt_spread_node(cptab, cpt));
> -       if (!ret) {
> -               WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
> -               ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
> -       }
> -
> -       return ret;
> +       return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
>  }
>  EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 6890897a6f30..10f1ef582659 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -87,18 +87,6 @@ struct user_evtchn {
>         bool enabled;
>  };
>
> -static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
> -{
> -       evtchn_port_t *ring;
> -       size_t s = size * sizeof(*ring);
> -
> -       ring = kmalloc(s, GFP_KERNEL);
> -       if (!ring)
> -               ring = vmalloc(s);
> -
> -       return ring;
> -}
> -
>  static void evtchn_free_ring(evtchn_port_t *ring)
>  {
>         kvfree(ring);
> @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
>         else
>                 new_size = 2 * u->ring_size;
>
> -       new_ring = evtchn_alloc_ring(new_size);
> +       new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
>         if (!new_ring)
>                 return -ENOMEM;
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 146b2dc0d2cf..4fc9712d927d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
>                 goto out;
>         }
>
> -       tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> +       tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
>         if (!tmp_buf) {
> -               tmp_buf = vmalloc(fs_info->nodesize);
> -               if (!tmp_buf) {
> -                       ret = -ENOMEM;
> -                       goto out;
> -               }
> +               ret = -ENOMEM;
> +               goto out;
>         }
>
>         left_path->search_commit_root = 1;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 77dabfed3a5d..6f0b488c7428 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
>         u64 last_dest_end = destoff;
>
>         ret = -ENOMEM;
> -       buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> -       if (!buf) {
> -               buf = vmalloc(fs_info->nodesize);
> -               if (!buf)
> -                       return ret;
> -       }
> +       buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
> +       if (!buf)
> +               return ret;
>
>         path = btrfs_alloc_path();
>         if (!path) {
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index d145ce804620..0621ca2a7b5d 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
>         sctx->clone_roots_cnt = arg->clone_sources_count;
>
>         sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
> -       sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
> +       sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
>         if (!sctx->send_buf) {
> -               sctx->send_buf = vmalloc(sctx->send_max_size);
> -               if (!sctx->send_buf) {
> -                       ret = -ENOMEM;
> -                       goto out;
> -               }
> +               ret = -ENOMEM;
> +               goto out;
>         }
>
> -       sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
> +       sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
>         if (!sctx->read_buf) {
> -               sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
> -               if (!sctx->read_buf) {
> -                       ret = -ENOMEM;
> -                       goto out;
> -               }
> +               ret = -ENOMEM;
> +               goto out;
>         }
>
>         sctx->pending_dir_moves = RB_ROOT;
> @@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
>         alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
>
>         if (arg->clone_sources_count) {
> -               clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
> +               clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
>                 if (!clone_sources_tmp) {
> -                       clone_sources_tmp = vmalloc(alloc_size);
> -                       if (!clone_sources_tmp) {
> -                               ret = -ENOMEM;
> -                               goto out;
> -                       }
> +                       ret = -ENOMEM;
> +                       goto out;
>                 }
>
>                 ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 045d30d26624..78b18acf33ba 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
>         align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
>                 (PAGE_SIZE - 1);
>         npages = calc_pages_for(align, nbytes);
> -       pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> -       if (!pages) {
> -               pages = vmalloc(sizeof(*pages) * npages);
> -               if (!pages)
> -                       return ERR_PTR(-ENOMEM);
> -       }
> +       pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> +       if (!pages)
> +               return ERR_PTR(-ENOMEM);

ceph hunk looks fine:

Acked-by: Ilya Dryomov <idryomov@gmail.com>

However I noticed that in some cases you've dropped the zeroing part:
fq_codel_init() and hhf_zalloc() zeroed both k and v, and some others
were inconsistent and zeroed only k.  Given that the fallback branch
was probably dead, I'd keep the k behaviour.  Was that intentional?

Thanks,

                Ilya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
  (?)
@ 2017-01-12 17:00     ` Dan Williams
  -1 siblings, 0 replies; 180+ messages in thread
From: Dan Williams @ 2017-01-12 17:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux MM, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Netdev

On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
[..]
> Cc: Dan Williams <dan.j.williams@intel.com>
[..]
>  drivers/nvdimm/dimm_devs.c                         |  5 +---

Acked-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:00     ` Dan Williams
  0 siblings, 0 replies; 180+ messages in thread
From: Dan Williams @ 2017-01-12 17:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux MM, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Oleg Drokin

On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
[..]
> Cc: Dan Williams <dan.j.williams@intel.com>
[..]
>  drivers/nvdimm/dimm_devs.c                         |  5 +---

Acked-by: Dan Williams <dan.j.williams@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:00     ` Dan Williams
  0 siblings, 0 replies; 180+ messages in thread
From: Dan Williams @ 2017-01-12 17:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux MM, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Netdev

On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
[..]
> Cc: Dan Williams <dan.j.williams@intel.com>
[..]
>  drivers/nvdimm/dimm_devs.c                         |  5 +---

Acked-by: Dan Williams <dan.j.williams@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 16:54     ` Ilya Dryomov
  (?)
@ 2017-01-12 17:18       ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 17:18 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Alexei Starovoitov,
	Eric Dumazet, netdev

On Thu 12-01-17 17:54:34, Ilya Dryomov wrote:
> On Thu, Jan 12, 2017 at 4:37 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> > There are many code paths opencoding kvmalloc. Let's use the helper
> > instead. The main difference to kvmalloc is that those users are usually
> > not considering all the aspects of the memory allocator. E.g. allocation
> > requests < 64kB are basically never failing and invoke OOM killer to
> > satisfy the allocation. This sounds too disruptive for something that
> > has a reasonable fallback - the vmalloc. On the other hand those
> > requests might fallback to vmalloc even when the memory allocator would
> > succeed after several more reclaim/compaction attempts previously. There
> > is no guarantee something like that happens though.
> >
> > This patch converts many of those places to kv[mz]alloc* helpers because
> > they are more conservative.
> >
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: Anton Vorontsov <anton@enomsg.org>
> > Cc: Colin Cross <ccross@android.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Ben Skeggs <bskeggs@redhat.com>
> > Cc: Kent Overstreet <kent.overstreet@gmail.com>
> > Cc: Santosh Raspatur <santosh@chelsio.com>
> > Cc: Hariprasad S <hariprasad@chelsio.com>
> > Cc: Tariq Toukan <tariqt@mellanox.com>
> > Cc: Yishai Hadas <yishaih@mellanox.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Oleg Drokin <oleg.drokin@intel.com>
> > Cc: Andreas Dilger <andreas.dilger@intel.com>
> > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > Cc: David Sterba <dsterba@suse.com>
> > Cc: "Yan, Zheng" <zyan@redhat.com>
> > Cc: Ilya Dryomov <idryomov@gmail.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
> >  crypto/lzo.c                                       |  4 +--
> >  drivers/acpi/apei/erst.c                           |  8 ++---
> >  drivers/char/agp/generic.c                         |  8 +----
> >  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
> >  drivers/md/bcache/util.h                           | 12 ++------
> >  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
> >  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
> >  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
> >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
> >  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
> >  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
> >  drivers/nvdimm/dimm_devs.c                         |  5 +---
> >  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
> >  drivers/xen/evtchn.c                               | 14 +--------
> >  fs/btrfs/ctree.c                                   |  9 ++----
> >  fs/btrfs/ioctl.c                                   |  9 ++----
> >  fs/btrfs/send.c                                    | 27 ++++++-----------
> >  fs/ceph/file.c                                     |  9 ++----
> >  fs/select.c                                        |  5 +---
> >  fs/xattr.c                                         | 27 ++++++-----------
> >  kernel/bpf/hashtab.c                               | 11 ++-----
> >  lib/iov_iter.c                                     |  5 +---
> >  mm/frame_vector.c                                  |  5 +---
> >  net/ipv4/inet_hashtables.c                         |  6 +---
> >  net/ipv4/tcp_metrics.c                             |  5 +---
> >  net/mpls/af_mpls.c                                 |  5 +---
> >  net/netfilter/x_tables.c                           | 34 ++++++----------------
> >  net/netfilter/xt_recent.c                          |  5 +---
> >  net/sched/sch_choke.c                              |  5 +---
> >  net/sched/sch_fq_codel.c                           | 26 ++++-------------
> >  net/sched/sch_hhf.c                                | 33 ++++++---------------
> >  net/sched/sch_netem.c                              |  6 +---
> >  net/sched/sch_sfq.c                                |  6 +---
> >  security/keys/keyctl.c                             | 22 ++++----------
> >  35 files changed, 96 insertions(+), 319 deletions(-)
> >
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 4f74511015b8..e6bbb33d2956 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >                 return -EINVAL;
> >
> > -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> > -                            GFP_KERNEL | __GFP_NOWARN);
> > -       if (!keys)
> > -               keys = vmalloc(sizeof(uint8_t) * args->count);
> > +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
> >         if (!keys)
> >                 return -ENOMEM;
> >
> > @@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >                 return -EINVAL;
> >
> > -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> > -                            GFP_KERNEL | __GFP_NOWARN);
> > -       if (!keys)
> > -               keys = vmalloc(sizeof(uint8_t) * args->count);
> > +       keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
> >         if (!keys)
> >                 return -ENOMEM;
> >
> > diff --git a/crypto/lzo.c b/crypto/lzo.c
> > index 168df784da84..218567d717d6 100644
> > --- a/crypto/lzo.c
> > +++ b/crypto/lzo.c
> > @@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
> >  {
> >         void *ctx;
> >
> > -       ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
> > -       if (!ctx)
> > -               ctx = vmalloc(LZO1X_MEM_COMPRESS);
> > +       ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
> >         if (!ctx)
> >                 return ERR_PTR(-ENOMEM);
> >
> > diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
> > index ec4f507b524f..a2898df61744 100644
> > --- a/drivers/acpi/apei/erst.c
> > +++ b/drivers/acpi/apei/erst.c
> > @@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
> >         if (i < erst_record_id_cache.len)
> >                 goto retry;
> >         if (erst_record_id_cache.len >= erst_record_id_cache.size) {
> > -               int new_size, alloc_size;
> > +               int new_size;
> >                 u64 *new_entries;
> >
> >                 new_size = erst_record_id_cache.size * 2;
> > @@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
> >                                 pr_warn(FW_WARN "too many record IDs!\n");
> >                         return 0;
> >                 }
> > -               alloc_size = new_size * sizeof(entries[0]);
> > -               if (alloc_size < PAGE_SIZE)
> > -                       new_entries = kmalloc(alloc_size, GFP_KERNEL);
> > -               else
> > -                       new_entries = vmalloc(alloc_size);
> > +               new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
> >                 if (!new_entries)
> >                         return -ENOMEM;
> >                 memcpy(new_entries, entries,
> > diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
> > index f002fa5d1887..bdf418cac8ef 100644
> > --- a/drivers/char/agp/generic.c
> > +++ b/drivers/char/agp/generic.c
> > @@ -88,13 +88,7 @@ static int agp_get_key(void)
> >
> >  void agp_alloc_page_array(size_t size, struct agp_memory *mem)
> >  {
> > -       mem->pages = NULL;
> > -
> > -       if (size <= 2*PAGE_SIZE)
> > -               mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -       if (mem->pages == NULL) {
> > -               mem->pages = vmalloc(size);
> > -       }
> > +       mem->pages = kvmalloc(size, GFP_KERNEL);
> >  }
> >  EXPORT_SYMBOL(agp_alloc_page_array);
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> > index 201b52b750dd..77dd73ff126f 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> > @@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
> >
> >         size *= nmemb;
> >
> > -       mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -       if (!mem)
> > -               mem = vmalloc(size);
> > +       mem = kvmalloc(size, GFP_KERNEL);
> >         if (!mem)
> >                 return ERR_PTR(-ENOMEM);
> >
> > diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
> > index cf2cbc211d83..d00bcb64d3a8 100644
> > --- a/drivers/md/bcache/util.h
> > +++ b/drivers/md/bcache/util.h
> > @@ -43,11 +43,7 @@ struct closure;
> >         (heap)->used = 0;                                               \
> >         (heap)->size = (_size);                                         \
> >         _bytes = (heap)->size * sizeof(*(heap)->data);                  \
> > -       (heap)->data = NULL;                                            \
> > -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> > -               (heap)->data = kmalloc(_bytes, (gfp));                  \
> > -       if ((!(heap)->data) && ((gfp) & GFP_KERNEL))                    \
> > -               (heap)->data = vmalloc(_bytes);                         \
> > +       (heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
> >         (heap)->data;                                                   \
> >  })
> >
> > @@ -136,12 +132,8 @@ do {                                                                       \
> >                                                                         \
> >         (fifo)->mask = _allocated_size - 1;                             \
> >         (fifo)->front = (fifo)->back = 0;                               \
> > -       (fifo)->data = NULL;                                            \
> >                                                                         \
> > -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> > -               (fifo)->data = kmalloc(_bytes, (gfp));                  \
> > -       if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))                    \
> > -               (fifo)->data = vmalloc(_bytes);                         \
> > +       (fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
> >         (fifo)->data;                                                   \
> >  })
> >
> > diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> > index 920d918ed193..f04e81f33795 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> > +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> > @@ -41,9 +41,6 @@
> >
> >  #define VALIDATE_TID 1
> >
> > -void *cxgb_alloc_mem(unsigned long size);
> > -void cxgb_free_mem(void *addr);
> > -
> >  /*
> >   * Map an ATID or STID to their entries in the corresponding TID tables.
> >   */
> > diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> > index 76684dcb874c..606d4a3ade04 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> > @@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
> >  }
> >
> >  /*
> > - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> > - * The allocated memory is cleared.
> > - */
> > -void *cxgb_alloc_mem(unsigned long size)
> > -{
> > -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -
> > -       if (!p)
> > -               p = vzalloc(size);
> > -       return p;
> > -}
> > -
> > -/*
> > - * Free memory allocated through t3_alloc_mem().
> > - */
> > -void cxgb_free_mem(void *addr)
> > -{
> > -       kvfree(addr);
> > -}
> > -
> > -/*
> >   * Allocate and initialize the TID tables.  Returns 0 on success.
> >   */
> >  static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> > @@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> >         unsigned long size = ntids * sizeof(*t->tid_tab) +
> >             natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
> >
> > -       t->tid_tab = cxgb_alloc_mem(size);
> > +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
> >         if (!t->tid_tab)
> >                 return -ENOMEM;
> >
> > @@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> >
> >  static void free_tid_maps(struct tid_info *t)
> >  {
> > -       cxgb_free_mem(t->tid_tab);
> > +       kvfree(t->tid_tab);
> >  }
> >
> >  static inline void add_adapter(struct adapter *adap)
> > diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> > index 5f226eda8cd6..c9b06501ee0c 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> > @@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
> >         struct l2t_data *d;
> >         int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
> >
> > -       d = cxgb_alloc_mem(size);
> > +       d = kvmalloc(size, GFP_KERNEL);
> >         if (!d)
> >                 return NULL;
> >
> > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > index 6f951877430b..671695cb3c15 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > @@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
> >         return err;
> >  }
> >
> > -/*
> > - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> > - * The allocated memory is cleared.
> > - */
> > -void *t4_alloc_mem(size_t size)
> > -{
> > -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -
> > -       if (!p)
> > -               p = vzalloc(size);
> > -       return p;
> > -}
> > -
> > -/*
> > - * Free memory allocated through alloc_mem().
> > - */
> > -void t4_free_mem(void *addr)
> > -{
> > -       kvfree(addr);
> > -}
> > -
> >  static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
> >                              void *accel_priv, select_queue_fallback_t fallback)
> >  {
> > @@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
> >                max_ftids * sizeof(*t->ftid_tab) +
> >                ftid_bmap_size * sizeof(long);
> >
> > -       t->tid_tab = t4_alloc_mem(size);
> > +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
> >         if (!t->tid_tab)
> >                 return -ENOMEM;
> >
> > @@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
> >                 /* allocate memory to read the header of the firmware on the
> >                  * card
> >                  */
> > -               card_fw = t4_alloc_mem(sizeof(*card_fw));
> > +               card_fw = kvmalloc(sizeof(*card_fw), GFP_KERNEL);
> >
> >                 /* Get FW from from /lib/firmware/ */
> >                 ret = request_firmware(&fw, fw_info->fw_mod_name,
> > @@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
> >
> >                 /* Cleaning up */
> >                 release_firmware(fw);
> > -               t4_free_mem(card_fw);
> > +               kvfree(card_fw);
> >
> >                 if (ret < 0)
> >                         goto bye;
> > @@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
> >  {
> >         unsigned int i;
> >
> > -       t4_free_mem(adapter->l2t);
> > +       kvfree(adapter->l2t);
> >         t4_cleanup_sched(adapter);
> > -       t4_free_mem(adapter->tids.tid_tab);
> > +       kvfree(adapter->tids.tid_tab);
> >         cxgb4_cleanup_tc_u32(adapter);
> >         kfree(adapter->sge.egr_map);
> >         kfree(adapter->sge.ingr_map);
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > index 5886ad78058f..a5c1b815145e 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > @@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
> >         ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
> >
> >         tmp = size * sizeof(struct mlx4_en_tx_info);
> > -       ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
> > +       ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
> >         if (!ring->tx_info) {
> > -               ring->tx_info = vmalloc(tmp);
> > -               if (!ring->tx_info) {
> > -                       err = -ENOMEM;
> > -                       goto err_ring;
> > -               }
> > +               err = -ENOMEM;
> > +               goto err_ring;
> >         }
> >
> >         en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
> > index 395b5463cfd9..82354fd0a87e 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/mr.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
> > @@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
> >
> >         for (i = 0; i <= buddy->max_order; ++i) {
> >                 s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > -               buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
> > -               if (!buddy->bits[i]) {
> > -                       buddy->bits[i] = vzalloc(s * sizeof(long));
> > -                       if (!buddy->bits[i])
> > -                               goto err_out_free;
> > -               }
> > +               buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
> > +               if (!buddy->bits[i])
> > +                       goto err_out_free;
> >         }
> >
> >         set_bit(0, buddy->bits[buddy->max_order]);
> > diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> > index 0eedc49e0d47..3bd332b167d9 100644
> > --- a/drivers/nvdimm/dimm_devs.c
> > +++ b/drivers/nvdimm/dimm_devs.c
> > @@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
> >                 return -ENXIO;
> >         }
> >
> > -       ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
> > -       if (!ndd->data)
> > -               ndd->data = vmalloc(ndd->nsarea.config_size);
> > -
> > +       ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
> >         if (!ndd->data)
> >                 return -ENOMEM;
> >
> > diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> > index a6a76a681ea9..8f638267e704 100644
> > --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> > +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> > @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
> >  void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
> >                           gfp_t flags)
> >  {
> > -       void *ret;
> > -
> > -       ret = kzalloc_node(size, flags | __GFP_NOWARN,
> > -                          cfs_cpt_spread_node(cptab, cpt));
> > -       if (!ret) {
> > -               WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
> > -               ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
> > -       }
> > -
> > -       return ret;
> > +       return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
> >  }
> >  EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
> > diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> > index 6890897a6f30..10f1ef582659 100644
> > --- a/drivers/xen/evtchn.c
> > +++ b/drivers/xen/evtchn.c
> > @@ -87,18 +87,6 @@ struct user_evtchn {
> >         bool enabled;
> >  };
> >
> > -static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
> > -{
> > -       evtchn_port_t *ring;
> > -       size_t s = size * sizeof(*ring);
> > -
> > -       ring = kmalloc(s, GFP_KERNEL);
> > -       if (!ring)
> > -               ring = vmalloc(s);
> > -
> > -       return ring;
> > -}
> > -
> >  static void evtchn_free_ring(evtchn_port_t *ring)
> >  {
> >         kvfree(ring);
> > @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
> >         else
> >                 new_size = 2 * u->ring_size;
> >
> > -       new_ring = evtchn_alloc_ring(new_size);
> > +       new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
> >         if (!new_ring)
> >                 return -ENOMEM;
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 146b2dc0d2cf..4fc9712d927d 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
> >                 goto out;
> >         }
> >
> > -       tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> > +       tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
> >         if (!tmp_buf) {
> > -               tmp_buf = vmalloc(fs_info->nodesize);
> > -               if (!tmp_buf) {
> > -                       ret = -ENOMEM;
> > -                       goto out;
> > -               }
> > +               ret = -ENOMEM;
> > +               goto out;
> >         }
> >
> >         left_path->search_commit_root = 1;
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 77dabfed3a5d..6f0b488c7428 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
> >         u64 last_dest_end = destoff;
> >
> >         ret = -ENOMEM;
> > -       buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> > -       if (!buf) {
> > -               buf = vmalloc(fs_info->nodesize);
> > -               if (!buf)
> > -                       return ret;
> > -       }
> > +       buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
> > +       if (!buf)
> > +               return ret;
> >
> >         path = btrfs_alloc_path();
> >         if (!path) {
> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > index d145ce804620..0621ca2a7b5d 100644
> > --- a/fs/btrfs/send.c
> > +++ b/fs/btrfs/send.c
> > @@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
> >         sctx->clone_roots_cnt = arg->clone_sources_count;
> >
> >         sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
> > -       sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
> > +       sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
> >         if (!sctx->send_buf) {
> > -               sctx->send_buf = vmalloc(sctx->send_max_size);
> > -               if (!sctx->send_buf) {
> > -                       ret = -ENOMEM;
> > -                       goto out;
> > -               }
> > +               ret = -ENOMEM;
> > +               goto out;
> >         }
> >
> > -       sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
> > +       sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
> >         if (!sctx->read_buf) {
> > -               sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
> > -               if (!sctx->read_buf) {
> > -                       ret = -ENOMEM;
> > -                       goto out;
> > -               }
> > +               ret = -ENOMEM;
> > +               goto out;
> >         }
> >
> >         sctx->pending_dir_moves = RB_ROOT;
> > @@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
> >         alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
> >
> >         if (arg->clone_sources_count) {
> > -               clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
> > +               clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
> >                 if (!clone_sources_tmp) {
> > -                       clone_sources_tmp = vmalloc(alloc_size);
> > -                       if (!clone_sources_tmp) {
> > -                               ret = -ENOMEM;
> > -                               goto out;
> > -                       }
> > +                       ret = -ENOMEM;
> > +                       goto out;
> >                 }
> >
> >                 ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 045d30d26624..78b18acf33ba 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
> >         align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
> >                 (PAGE_SIZE - 1);
> >         npages = calc_pages_for(align, nbytes);
> > -       pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> > -       if (!pages) {
> > -               pages = vmalloc(sizeof(*pages) * npages);
> > -               if (!pages)
> > -                       return ERR_PTR(-ENOMEM);
> > -       }
> > +       pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> > +       if (!pages)
> > +               return ERR_PTR(-ENOMEM);
> 
> ceph hunk looks fine:
> 
> Acked-by: Ilya Dryomov <idryomov@gmail.com>

thanks!

[...]

> However I noticed that in some cases you've dropped the zeroing part:
> fq_codel_init() and hhf_zalloc() zeroed both k and v, and some others
> were inconsistent and zeroed only k.  Given that the fallback branch
> was probably dead, I'd keep the k behaviour.  Was that intentional?

No, that is an omission. Thanks for noticing. I will send the updated
patch.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:18       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 17:18 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg

On Thu 12-01-17 17:54:34, Ilya Dryomov wrote:
> On Thu, Jan 12, 2017 at 4:37 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> > There are many code paths opencoding kvmalloc. Let's use the helper
> > instead. The main difference to kvmalloc is that those users are usually
> > not considering all the aspects of the memory allocator. E.g. allocation
> > requests < 64kB are basically never failing and invoke OOM killer to
> > satisfy the allocation. This sounds too disruptive for something that
> > has a reasonable fallback - the vmalloc. On the other hand those
> > requests might fallback to vmalloc even when the memory allocator would
> > succeed after several more reclaim/compaction attempts previously. There
> > is no guarantee something like that happens though.
> >
> > This patch converts many of those places to kv[mz]alloc* helpers because
> > they are more conservative.
> >
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: Anton Vorontsov <anton@enomsg.org>
> > Cc: Colin Cross <ccross@android.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Ben Skeggs <bskeggs@redhat.com>
> > Cc: Kent Overstreet <kent.overstreet@gmail.com>
> > Cc: Santosh Raspatur <santosh@chelsio.com>
> > Cc: Hariprasad S <hariprasad@chelsio.com>
> > Cc: Tariq Toukan <tariqt@mellanox.com>
> > Cc: Yishai Hadas <yishaih@mellanox.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Oleg Drokin <oleg.drokin@intel.com>
> > Cc: Andreas Dilger <andreas.dilger@intel.com>
> > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > Cc: David Sterba <dsterba@suse.com>
> > Cc: "Yan, Zheng" <zyan@redhat.com>
> > Cc: Ilya Dryomov <idryomov@gmail.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
> >  crypto/lzo.c                                       |  4 +--
> >  drivers/acpi/apei/erst.c                           |  8 ++---
> >  drivers/char/agp/generic.c                         |  8 +----
> >  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
> >  drivers/md/bcache/util.h                           | 12 ++------
> >  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
> >  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
> >  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
> >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
> >  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
> >  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
> >  drivers/nvdimm/dimm_devs.c                         |  5 +---
> >  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
> >  drivers/xen/evtchn.c                               | 14 +--------
> >  fs/btrfs/ctree.c                                   |  9 ++----
> >  fs/btrfs/ioctl.c                                   |  9 ++----
> >  fs/btrfs/send.c                                    | 27 ++++++-----------
> >  fs/ceph/file.c                                     |  9 ++----
> >  fs/select.c                                        |  5 +---
> >  fs/xattr.c                                         | 27 ++++++-----------
> >  kernel/bpf/hashtab.c                               | 11 ++-----
> >  lib/iov_iter.c                                     |  5 +---
> >  mm/frame_vector.c                                  |  5 +---
> >  net/ipv4/inet_hashtables.c                         |  6 +---
> >  net/ipv4/tcp_metrics.c                             |  5 +---
> >  net/mpls/af_mpls.c                                 |  5 +---
> >  net/netfilter/x_tables.c                           | 34 ++++++----------------
> >  net/netfilter/xt_recent.c                          |  5 +---
> >  net/sched/sch_choke.c                              |  5 +---
> >  net/sched/sch_fq_codel.c                           | 26 ++++-------------
> >  net/sched/sch_hhf.c                                | 33 ++++++---------------
> >  net/sched/sch_netem.c                              |  6 +---
> >  net/sched/sch_sfq.c                                |  6 +---
> >  security/keys/keyctl.c                             | 22 ++++----------
> >  35 files changed, 96 insertions(+), 319 deletions(-)
> >
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 4f74511015b8..e6bbb33d2956 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >                 return -EINVAL;
> >
> > -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> > -                            GFP_KERNEL | __GFP_NOWARN);
> > -       if (!keys)
> > -               keys = vmalloc(sizeof(uint8_t) * args->count);
> > +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
> >         if (!keys)
> >                 return -ENOMEM;
> >
> > @@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >                 return -EINVAL;
> >
> > -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> > -                            GFP_KERNEL | __GFP_NOWARN);
> > -       if (!keys)
> > -               keys = vmalloc(sizeof(uint8_t) * args->count);
> > +       keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
> >         if (!keys)
> >                 return -ENOMEM;
> >
> > diff --git a/crypto/lzo.c b/crypto/lzo.c
> > index 168df784da84..218567d717d6 100644
> > --- a/crypto/lzo.c
> > +++ b/crypto/lzo.c
> > @@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
> >  {
> >         void *ctx;
> >
> > -       ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
> > -       if (!ctx)
> > -               ctx = vmalloc(LZO1X_MEM_COMPRESS);
> > +       ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
> >         if (!ctx)
> >                 return ERR_PTR(-ENOMEM);
> >
> > diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
> > index ec4f507b524f..a2898df61744 100644
> > --- a/drivers/acpi/apei/erst.c
> > +++ b/drivers/acpi/apei/erst.c
> > @@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
> >         if (i < erst_record_id_cache.len)
> >                 goto retry;
> >         if (erst_record_id_cache.len >= erst_record_id_cache.size) {
> > -               int new_size, alloc_size;
> > +               int new_size;
> >                 u64 *new_entries;
> >
> >                 new_size = erst_record_id_cache.size * 2;
> > @@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
> >                                 pr_warn(FW_WARN "too many record IDs!\n");
> >                         return 0;
> >                 }
> > -               alloc_size = new_size * sizeof(entries[0]);
> > -               if (alloc_size < PAGE_SIZE)
> > -                       new_entries = kmalloc(alloc_size, GFP_KERNEL);
> > -               else
> > -                       new_entries = vmalloc(alloc_size);
> > +               new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
> >                 if (!new_entries)
> >                         return -ENOMEM;
> >                 memcpy(new_entries, entries,
> > diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
> > index f002fa5d1887..bdf418cac8ef 100644
> > --- a/drivers/char/agp/generic.c
> > +++ b/drivers/char/agp/generic.c
> > @@ -88,13 +88,7 @@ static int agp_get_key(void)
> >
> >  void agp_alloc_page_array(size_t size, struct agp_memory *mem)
> >  {
> > -       mem->pages = NULL;
> > -
> > -       if (size <= 2*PAGE_SIZE)
> > -               mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -       if (mem->pages == NULL) {
> > -               mem->pages = vmalloc(size);
> > -       }
> > +       mem->pages = kvmalloc(size, GFP_KERNEL);
> >  }
> >  EXPORT_SYMBOL(agp_alloc_page_array);
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> > index 201b52b750dd..77dd73ff126f 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> > @@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
> >
> >         size *= nmemb;
> >
> > -       mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -       if (!mem)
> > -               mem = vmalloc(size);
> > +       mem = kvmalloc(size, GFP_KERNEL);
> >         if (!mem)
> >                 return ERR_PTR(-ENOMEM);
> >
> > diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
> > index cf2cbc211d83..d00bcb64d3a8 100644
> > --- a/drivers/md/bcache/util.h
> > +++ b/drivers/md/bcache/util.h
> > @@ -43,11 +43,7 @@ struct closure;
> >         (heap)->used = 0;                                               \
> >         (heap)->size = (_size);                                         \
> >         _bytes = (heap)->size * sizeof(*(heap)->data);                  \
> > -       (heap)->data = NULL;                                            \
> > -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> > -               (heap)->data = kmalloc(_bytes, (gfp));                  \
> > -       if ((!(heap)->data) && ((gfp) & GFP_KERNEL))                    \
> > -               (heap)->data = vmalloc(_bytes);                         \
> > +       (heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
> >         (heap)->data;                                                   \
> >  })
> >
> > @@ -136,12 +132,8 @@ do {                                                                       \
> >                                                                         \
> >         (fifo)->mask = _allocated_size - 1;                             \
> >         (fifo)->front = (fifo)->back = 0;                               \
> > -       (fifo)->data = NULL;                                            \
> >                                                                         \
> > -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> > -               (fifo)->data = kmalloc(_bytes, (gfp));                  \
> > -       if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))                    \
> > -               (fifo)->data = vmalloc(_bytes);                         \
> > +       (fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
> >         (fifo)->data;                                                   \
> >  })
> >
> > diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> > index 920d918ed193..f04e81f33795 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> > +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> > @@ -41,9 +41,6 @@
> >
> >  #define VALIDATE_TID 1
> >
> > -void *cxgb_alloc_mem(unsigned long size);
> > -void cxgb_free_mem(void *addr);
> > -
> >  /*
> >   * Map an ATID or STID to their entries in the corresponding TID tables.
> >   */
> > diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> > index 76684dcb874c..606d4a3ade04 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> > @@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
> >  }
> >
> >  /*
> > - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> > - * The allocated memory is cleared.
> > - */
> > -void *cxgb_alloc_mem(unsigned long size)
> > -{
> > -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -
> > -       if (!p)
> > -               p = vzalloc(size);
> > -       return p;
> > -}
> > -
> > -/*
> > - * Free memory allocated through t3_alloc_mem().
> > - */
> > -void cxgb_free_mem(void *addr)
> > -{
> > -       kvfree(addr);
> > -}
> > -
> > -/*
> >   * Allocate and initialize the TID tables.  Returns 0 on success.
> >   */
> >  static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> > @@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> >         unsigned long size = ntids * sizeof(*t->tid_tab) +
> >             natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
> >
> > -       t->tid_tab = cxgb_alloc_mem(size);
> > +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
> >         if (!t->tid_tab)
> >                 return -ENOMEM;
> >
> > @@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> >
> >  static void free_tid_maps(struct tid_info *t)
> >  {
> > -       cxgb_free_mem(t->tid_tab);
> > +       kvfree(t->tid_tab);
> >  }
> >
> >  static inline void add_adapter(struct adapter *adap)
> > diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> > index 5f226eda8cd6..c9b06501ee0c 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> > @@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
> >         struct l2t_data *d;
> >         int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
> >
> > -       d = cxgb_alloc_mem(size);
> > +       d = kvmalloc(size, GFP_KERNEL);
> >         if (!d)
> >                 return NULL;
> >
> > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > index 6f951877430b..671695cb3c15 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > @@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
> >         return err;
> >  }
> >
> > -/*
> > - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> > - * The allocated memory is cleared.
> > - */
> > -void *t4_alloc_mem(size_t size)
> > -{
> > -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -
> > -       if (!p)
> > -               p = vzalloc(size);
> > -       return p;
> > -}
> > -
> > -/*
> > - * Free memory allocated through alloc_mem().
> > - */
> > -void t4_free_mem(void *addr)
> > -{
> > -       kvfree(addr);
> > -}
> > -
> >  static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
> >                              void *accel_priv, select_queue_fallback_t fallback)
> >  {
> > @@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
> >                max_ftids * sizeof(*t->ftid_tab) +
> >                ftid_bmap_size * sizeof(long);
> >
> > -       t->tid_tab = t4_alloc_mem(size);
> > +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
> >         if (!t->tid_tab)
> >                 return -ENOMEM;
> >
> > @@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
> >                 /* allocate memory to read the header of the firmware on the
> >                  * card
> >                  */
> > -               card_fw = t4_alloc_mem(sizeof(*card_fw));
> > +               card_fw = kvmalloc(sizeof(*card_fw), GFP_KERNEL);
> >
> >                 /* Get FW from from /lib/firmware/ */
> >                 ret = request_firmware(&fw, fw_info->fw_mod_name,
> > @@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
> >
> >                 /* Cleaning up */
> >                 release_firmware(fw);
> > -               t4_free_mem(card_fw);
> > +               kvfree(card_fw);
> >
> >                 if (ret < 0)
> >                         goto bye;
> > @@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
> >  {
> >         unsigned int i;
> >
> > -       t4_free_mem(adapter->l2t);
> > +       kvfree(adapter->l2t);
> >         t4_cleanup_sched(adapter);
> > -       t4_free_mem(adapter->tids.tid_tab);
> > +       kvfree(adapter->tids.tid_tab);
> >         cxgb4_cleanup_tc_u32(adapter);
> >         kfree(adapter->sge.egr_map);
> >         kfree(adapter->sge.ingr_map);
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > index 5886ad78058f..a5c1b815145e 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > @@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
> >         ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
> >
> >         tmp = size * sizeof(struct mlx4_en_tx_info);
> > -       ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
> > +       ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
> >         if (!ring->tx_info) {
> > -               ring->tx_info = vmalloc(tmp);
> > -               if (!ring->tx_info) {
> > -                       err = -ENOMEM;
> > -                       goto err_ring;
> > -               }
> > +               err = -ENOMEM;
> > +               goto err_ring;
> >         }
> >
> >         en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
> > index 395b5463cfd9..82354fd0a87e 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/mr.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
> > @@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
> >
> >         for (i = 0; i <= buddy->max_order; ++i) {
> >                 s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > -               buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
> > -               if (!buddy->bits[i]) {
> > -                       buddy->bits[i] = vzalloc(s * sizeof(long));
> > -                       if (!buddy->bits[i])
> > -                               goto err_out_free;
> > -               }
> > +               buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
> > +               if (!buddy->bits[i])
> > +                       goto err_out_free;
> >         }
> >
> >         set_bit(0, buddy->bits[buddy->max_order]);
> > diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> > index 0eedc49e0d47..3bd332b167d9 100644
> > --- a/drivers/nvdimm/dimm_devs.c
> > +++ b/drivers/nvdimm/dimm_devs.c
> > @@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
> >                 return -ENXIO;
> >         }
> >
> > -       ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
> > -       if (!ndd->data)
> > -               ndd->data = vmalloc(ndd->nsarea.config_size);
> > -
> > +       ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
> >         if (!ndd->data)
> >                 return -ENOMEM;
> >
> > diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> > index a6a76a681ea9..8f638267e704 100644
> > --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> > +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> > @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
> >  void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
> >                           gfp_t flags)
> >  {
> > -       void *ret;
> > -
> > -       ret = kzalloc_node(size, flags | __GFP_NOWARN,
> > -                          cfs_cpt_spread_node(cptab, cpt));
> > -       if (!ret) {
> > -               WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
> > -               ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
> > -       }
> > -
> > -       return ret;
> > +       return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
> >  }
> >  EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
> > diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> > index 6890897a6f30..10f1ef582659 100644
> > --- a/drivers/xen/evtchn.c
> > +++ b/drivers/xen/evtchn.c
> > @@ -87,18 +87,6 @@ struct user_evtchn {
> >         bool enabled;
> >  };
> >
> > -static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
> > -{
> > -       evtchn_port_t *ring;
> > -       size_t s = size * sizeof(*ring);
> > -
> > -       ring = kmalloc(s, GFP_KERNEL);
> > -       if (!ring)
> > -               ring = vmalloc(s);
> > -
> > -       return ring;
> > -}
> > -
> >  static void evtchn_free_ring(evtchn_port_t *ring)
> >  {
> >         kvfree(ring);
> > @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
> >         else
> >                 new_size = 2 * u->ring_size;
> >
> > -       new_ring = evtchn_alloc_ring(new_size);
> > +       new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
> >         if (!new_ring)
> >                 return -ENOMEM;
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 146b2dc0d2cf..4fc9712d927d 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
> >                 goto out;
> >         }
> >
> > -       tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> > +       tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
> >         if (!tmp_buf) {
> > -               tmp_buf = vmalloc(fs_info->nodesize);
> > -               if (!tmp_buf) {
> > -                       ret = -ENOMEM;
> > -                       goto out;
> > -               }
> > +               ret = -ENOMEM;
> > +               goto out;
> >         }
> >
> >         left_path->search_commit_root = 1;
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 77dabfed3a5d..6f0b488c7428 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
> >         u64 last_dest_end = destoff;
> >
> >         ret = -ENOMEM;
> > -       buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> > -       if (!buf) {
> > -               buf = vmalloc(fs_info->nodesize);
> > -               if (!buf)
> > -                       return ret;
> > -       }
> > +       buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
> > +       if (!buf)
> > +               return ret;
> >
> >         path = btrfs_alloc_path();
> >         if (!path) {
> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > index d145ce804620..0621ca2a7b5d 100644
> > --- a/fs/btrfs/send.c
> > +++ b/fs/btrfs/send.c
> > @@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
> >         sctx->clone_roots_cnt = arg->clone_sources_count;
> >
> >         sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
> > -       sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
> > +       sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
> >         if (!sctx->send_buf) {
> > -               sctx->send_buf = vmalloc(sctx->send_max_size);
> > -               if (!sctx->send_buf) {
> > -                       ret = -ENOMEM;
> > -                       goto out;
> > -               }
> > +               ret = -ENOMEM;
> > +               goto out;
> >         }
> >
> > -       sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
> > +       sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
> >         if (!sctx->read_buf) {
> > -               sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
> > -               if (!sctx->read_buf) {
> > -                       ret = -ENOMEM;
> > -                       goto out;
> > -               }
> > +               ret = -ENOMEM;
> > +               goto out;
> >         }
> >
> >         sctx->pending_dir_moves = RB_ROOT;
> > @@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
> >         alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
> >
> >         if (arg->clone_sources_count) {
> > -               clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
> > +               clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
> >                 if (!clone_sources_tmp) {
> > -                       clone_sources_tmp = vmalloc(alloc_size);
> > -                       if (!clone_sources_tmp) {
> > -                               ret = -ENOMEM;
> > -                               goto out;
> > -                       }
> > +                       ret = -ENOMEM;
> > +                       goto out;
> >                 }
> >
> >                 ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 045d30d26624..78b18acf33ba 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
> >         align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
> >                 (PAGE_SIZE - 1);
> >         npages = calc_pages_for(align, nbytes);
> > -       pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> > -       if (!pages) {
> > -               pages = vmalloc(sizeof(*pages) * npages);
> > -               if (!pages)
> > -                       return ERR_PTR(-ENOMEM);
> > -       }
> > +       pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> > +       if (!pages)
> > +               return ERR_PTR(-ENOMEM);
> 
> ceph hunk looks fine:
> 
> Acked-by: Ilya Dryomov <idryomov@gmail.com>

thanks!

[...]

> However I noticed that in some cases you've dropped the zeroing part:
> fq_codel_init() and hhf_zalloc() zeroed both k and v, and some others
> were inconsistent and zeroed only k.  Given that the fallback branch
> was probably dead, I'd keep the k behaviour.  Was that intentional?

No, that is an omission. Thanks for noticing. I will send the updated
patch.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:18       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 17:18 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Alexei Starovoitov,
	Eric Dumazet, netdev

On Thu 12-01-17 17:54:34, Ilya Dryomov wrote:
> On Thu, Jan 12, 2017 at 4:37 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> > There are many code paths opencoding kvmalloc. Let's use the helper
> > instead. The main difference to kvmalloc is that those users are usually
> > not considering all the aspects of the memory allocator. E.g. allocation
> > requests < 64kB are basically never failing and invoke OOM killer to
> > satisfy the allocation. This sounds too disruptive for something that
> > has a reasonable fallback - the vmalloc. On the other hand those
> > requests might fallback to vmalloc even when the memory allocator would
> > succeed after several more reclaim/compaction attempts previously. There
> > is no guarantee something like that happens though.
> >
> > This patch converts many of those places to kv[mz]alloc* helpers because
> > they are more conservative.
> >
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: Anton Vorontsov <anton@enomsg.org>
> > Cc: Colin Cross <ccross@android.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Ben Skeggs <bskeggs@redhat.com>
> > Cc: Kent Overstreet <kent.overstreet@gmail.com>
> > Cc: Santosh Raspatur <santosh@chelsio.com>
> > Cc: Hariprasad S <hariprasad@chelsio.com>
> > Cc: Tariq Toukan <tariqt@mellanox.com>
> > Cc: Yishai Hadas <yishaih@mellanox.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Oleg Drokin <oleg.drokin@intel.com>
> > Cc: Andreas Dilger <andreas.dilger@intel.com>
> > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > Cc: David Sterba <dsterba@suse.com>
> > Cc: "Yan, Zheng" <zyan@redhat.com>
> > Cc: Ilya Dryomov <idryomov@gmail.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
> >  crypto/lzo.c                                       |  4 +--
> >  drivers/acpi/apei/erst.c                           |  8 ++---
> >  drivers/char/agp/generic.c                         |  8 +----
> >  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
> >  drivers/md/bcache/util.h                           | 12 ++------
> >  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
> >  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
> >  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
> >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
> >  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
> >  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
> >  drivers/nvdimm/dimm_devs.c                         |  5 +---
> >  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
> >  drivers/xen/evtchn.c                               | 14 +--------
> >  fs/btrfs/ctree.c                                   |  9 ++----
> >  fs/btrfs/ioctl.c                                   |  9 ++----
> >  fs/btrfs/send.c                                    | 27 ++++++-----------
> >  fs/ceph/file.c                                     |  9 ++----
> >  fs/select.c                                        |  5 +---
> >  fs/xattr.c                                         | 27 ++++++-----------
> >  kernel/bpf/hashtab.c                               | 11 ++-----
> >  lib/iov_iter.c                                     |  5 +---
> >  mm/frame_vector.c                                  |  5 +---
> >  net/ipv4/inet_hashtables.c                         |  6 +---
> >  net/ipv4/tcp_metrics.c                             |  5 +---
> >  net/mpls/af_mpls.c                                 |  5 +---
> >  net/netfilter/x_tables.c                           | 34 ++++++----------------
> >  net/netfilter/xt_recent.c                          |  5 +---
> >  net/sched/sch_choke.c                              |  5 +---
> >  net/sched/sch_fq_codel.c                           | 26 ++++-------------
> >  net/sched/sch_hhf.c                                | 33 ++++++---------------
> >  net/sched/sch_netem.c                              |  6 +---
> >  net/sched/sch_sfq.c                                |  6 +---
> >  security/keys/keyctl.c                             | 22 ++++----------
> >  35 files changed, 96 insertions(+), 319 deletions(-)
> >
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 4f74511015b8..e6bbb33d2956 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >                 return -EINVAL;
> >
> > -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> > -                            GFP_KERNEL | __GFP_NOWARN);
> > -       if (!keys)
> > -               keys = vmalloc(sizeof(uint8_t) * args->count);
> > +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
> >         if (!keys)
> >                 return -ENOMEM;
> >
> > @@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >                 return -EINVAL;
> >
> > -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> > -                            GFP_KERNEL | __GFP_NOWARN);
> > -       if (!keys)
> > -               keys = vmalloc(sizeof(uint8_t) * args->count);
> > +       keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
> >         if (!keys)
> >                 return -ENOMEM;
> >
> > diff --git a/crypto/lzo.c b/crypto/lzo.c
> > index 168df784da84..218567d717d6 100644
> > --- a/crypto/lzo.c
> > +++ b/crypto/lzo.c
> > @@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
> >  {
> >         void *ctx;
> >
> > -       ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
> > -       if (!ctx)
> > -               ctx = vmalloc(LZO1X_MEM_COMPRESS);
> > +       ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
> >         if (!ctx)
> >                 return ERR_PTR(-ENOMEM);
> >
> > diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
> > index ec4f507b524f..a2898df61744 100644
> > --- a/drivers/acpi/apei/erst.c
> > +++ b/drivers/acpi/apei/erst.c
> > @@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
> >         if (i < erst_record_id_cache.len)
> >                 goto retry;
> >         if (erst_record_id_cache.len >= erst_record_id_cache.size) {
> > -               int new_size, alloc_size;
> > +               int new_size;
> >                 u64 *new_entries;
> >
> >                 new_size = erst_record_id_cache.size * 2;
> > @@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
> >                                 pr_warn(FW_WARN "too many record IDs!\n");
> >                         return 0;
> >                 }
> > -               alloc_size = new_size * sizeof(entries[0]);
> > -               if (alloc_size < PAGE_SIZE)
> > -                       new_entries = kmalloc(alloc_size, GFP_KERNEL);
> > -               else
> > -                       new_entries = vmalloc(alloc_size);
> > +               new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
> >                 if (!new_entries)
> >                         return -ENOMEM;
> >                 memcpy(new_entries, entries,
> > diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
> > index f002fa5d1887..bdf418cac8ef 100644
> > --- a/drivers/char/agp/generic.c
> > +++ b/drivers/char/agp/generic.c
> > @@ -88,13 +88,7 @@ static int agp_get_key(void)
> >
> >  void agp_alloc_page_array(size_t size, struct agp_memory *mem)
> >  {
> > -       mem->pages = NULL;
> > -
> > -       if (size <= 2*PAGE_SIZE)
> > -               mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -       if (mem->pages == NULL) {
> > -               mem->pages = vmalloc(size);
> > -       }
> > +       mem->pages = kvmalloc(size, GFP_KERNEL);
> >  }
> >  EXPORT_SYMBOL(agp_alloc_page_array);
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> > index 201b52b750dd..77dd73ff126f 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> > @@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
> >
> >         size *= nmemb;
> >
> > -       mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -       if (!mem)
> > -               mem = vmalloc(size);
> > +       mem = kvmalloc(size, GFP_KERNEL);
> >         if (!mem)
> >                 return ERR_PTR(-ENOMEM);
> >
> > diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
> > index cf2cbc211d83..d00bcb64d3a8 100644
> > --- a/drivers/md/bcache/util.h
> > +++ b/drivers/md/bcache/util.h
> > @@ -43,11 +43,7 @@ struct closure;
> >         (heap)->used = 0;                                               \
> >         (heap)->size = (_size);                                         \
> >         _bytes = (heap)->size * sizeof(*(heap)->data);                  \
> > -       (heap)->data = NULL;                                            \
> > -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> > -               (heap)->data = kmalloc(_bytes, (gfp));                  \
> > -       if ((!(heap)->data) && ((gfp) & GFP_KERNEL))                    \
> > -               (heap)->data = vmalloc(_bytes);                         \
> > +       (heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
> >         (heap)->data;                                                   \
> >  })
> >
> > @@ -136,12 +132,8 @@ do {                                                                       \
> >                                                                         \
> >         (fifo)->mask = _allocated_size - 1;                             \
> >         (fifo)->front = (fifo)->back = 0;                               \
> > -       (fifo)->data = NULL;                                            \
> >                                                                         \
> > -       if (_bytes < KMALLOC_MAX_SIZE)                                  \
> > -               (fifo)->data = kmalloc(_bytes, (gfp));                  \
> > -       if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))                    \
> > -               (fifo)->data = vmalloc(_bytes);                         \
> > +       (fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);            \
> >         (fifo)->data;                                                   \
> >  })
> >
> > diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> > index 920d918ed193..f04e81f33795 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> > +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
> > @@ -41,9 +41,6 @@
> >
> >  #define VALIDATE_TID 1
> >
> > -void *cxgb_alloc_mem(unsigned long size);
> > -void cxgb_free_mem(void *addr);
> > -
> >  /*
> >   * Map an ATID or STID to their entries in the corresponding TID tables.
> >   */
> > diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> > index 76684dcb874c..606d4a3ade04 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
> > @@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
> >  }
> >
> >  /*
> > - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> > - * The allocated memory is cleared.
> > - */
> > -void *cxgb_alloc_mem(unsigned long size)
> > -{
> > -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -
> > -       if (!p)
> > -               p = vzalloc(size);
> > -       return p;
> > -}
> > -
> > -/*
> > - * Free memory allocated through t3_alloc_mem().
> > - */
> > -void cxgb_free_mem(void *addr)
> > -{
> > -       kvfree(addr);
> > -}
> > -
> > -/*
> >   * Allocate and initialize the TID tables.  Returns 0 on success.
> >   */
> >  static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> > @@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> >         unsigned long size = ntids * sizeof(*t->tid_tab) +
> >             natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
> >
> > -       t->tid_tab = cxgb_alloc_mem(size);
> > +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
> >         if (!t->tid_tab)
> >                 return -ENOMEM;
> >
> > @@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
> >
> >  static void free_tid_maps(struct tid_info *t)
> >  {
> > -       cxgb_free_mem(t->tid_tab);
> > +       kvfree(t->tid_tab);
> >  }
> >
> >  static inline void add_adapter(struct adapter *adap)
> > diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> > index 5f226eda8cd6..c9b06501ee0c 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
> > @@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
> >         struct l2t_data *d;
> >         int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
> >
> > -       d = cxgb_alloc_mem(size);
> > +       d = kvmalloc(size, GFP_KERNEL);
> >         if (!d)
> >                 return NULL;
> >
> > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > index 6f951877430b..671695cb3c15 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > @@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
> >         return err;
> >  }
> >
> > -/*
> > - * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
> > - * The allocated memory is cleared.
> > - */
> > -void *t4_alloc_mem(size_t size)
> > -{
> > -       void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > -
> > -       if (!p)
> > -               p = vzalloc(size);
> > -       return p;
> > -}
> > -
> > -/*
> > - * Free memory allocated through alloc_mem().
> > - */
> > -void t4_free_mem(void *addr)
> > -{
> > -       kvfree(addr);
> > -}
> > -
> >  static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
> >                              void *accel_priv, select_queue_fallback_t fallback)
> >  {
> > @@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
> >                max_ftids * sizeof(*t->ftid_tab) +
> >                ftid_bmap_size * sizeof(long);
> >
> > -       t->tid_tab = t4_alloc_mem(size);
> > +       t->tid_tab = kvmalloc(size, GFP_KERNEL);
> >         if (!t->tid_tab)
> >                 return -ENOMEM;
> >
> > @@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
> >                 /* allocate memory to read the header of the firmware on the
> >                  * card
> >                  */
> > -               card_fw = t4_alloc_mem(sizeof(*card_fw));
> > +               card_fw = kvmalloc(sizeof(*card_fw), GFP_KERNEL);
> >
> >                 /* Get FW from from /lib/firmware/ */
> >                 ret = request_firmware(&fw, fw_info->fw_mod_name,
> > @@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
> >
> >                 /* Cleaning up */
> >                 release_firmware(fw);
> > -               t4_free_mem(card_fw);
> > +               kvfree(card_fw);
> >
> >                 if (ret < 0)
> >                         goto bye;
> > @@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
> >  {
> >         unsigned int i;
> >
> > -       t4_free_mem(adapter->l2t);
> > +       kvfree(adapter->l2t);
> >         t4_cleanup_sched(adapter);
> > -       t4_free_mem(adapter->tids.tid_tab);
> > +       kvfree(adapter->tids.tid_tab);
> >         cxgb4_cleanup_tc_u32(adapter);
> >         kfree(adapter->sge.egr_map);
> >         kfree(adapter->sge.ingr_map);
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > index 5886ad78058f..a5c1b815145e 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > @@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
> >         ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
> >
> >         tmp = size * sizeof(struct mlx4_en_tx_info);
> > -       ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
> > +       ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
> >         if (!ring->tx_info) {
> > -               ring->tx_info = vmalloc(tmp);
> > -               if (!ring->tx_info) {
> > -                       err = -ENOMEM;
> > -                       goto err_ring;
> > -               }
> > +               err = -ENOMEM;
> > +               goto err_ring;
> >         }
> >
> >         en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
> > index 395b5463cfd9..82354fd0a87e 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/mr.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
> > @@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
> >
> >         for (i = 0; i <= buddy->max_order; ++i) {
> >                 s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > -               buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
> > -               if (!buddy->bits[i]) {
> > -                       buddy->bits[i] = vzalloc(s * sizeof(long));
> > -                       if (!buddy->bits[i])
> > -                               goto err_out_free;
> > -               }
> > +               buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
> > +               if (!buddy->bits[i])
> > +                       goto err_out_free;
> >         }
> >
> >         set_bit(0, buddy->bits[buddy->max_order]);
> > diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> > index 0eedc49e0d47..3bd332b167d9 100644
> > --- a/drivers/nvdimm/dimm_devs.c
> > +++ b/drivers/nvdimm/dimm_devs.c
> > @@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
> >                 return -ENXIO;
> >         }
> >
> > -       ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
> > -       if (!ndd->data)
> > -               ndd->data = vmalloc(ndd->nsarea.config_size);
> > -
> > +       ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
> >         if (!ndd->data)
> >                 return -ENOMEM;
> >
> > diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> > index a6a76a681ea9..8f638267e704 100644
> > --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> > +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> > @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
> >  void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
> >                           gfp_t flags)
> >  {
> > -       void *ret;
> > -
> > -       ret = kzalloc_node(size, flags | __GFP_NOWARN,
> > -                          cfs_cpt_spread_node(cptab, cpt));
> > -       if (!ret) {
> > -               WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
> > -               ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
> > -       }
> > -
> > -       return ret;
> > +       return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
> >  }
> >  EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
> > diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> > index 6890897a6f30..10f1ef582659 100644
> > --- a/drivers/xen/evtchn.c
> > +++ b/drivers/xen/evtchn.c
> > @@ -87,18 +87,6 @@ struct user_evtchn {
> >         bool enabled;
> >  };
> >
> > -static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
> > -{
> > -       evtchn_port_t *ring;
> > -       size_t s = size * sizeof(*ring);
> > -
> > -       ring = kmalloc(s, GFP_KERNEL);
> > -       if (!ring)
> > -               ring = vmalloc(s);
> > -
> > -       return ring;
> > -}
> > -
> >  static void evtchn_free_ring(evtchn_port_t *ring)
> >  {
> >         kvfree(ring);
> > @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
> >         else
> >                 new_size = 2 * u->ring_size;
> >
> > -       new_ring = evtchn_alloc_ring(new_size);
> > +       new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
> >         if (!new_ring)
> >                 return -ENOMEM;
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 146b2dc0d2cf..4fc9712d927d 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
> >                 goto out;
> >         }
> >
> > -       tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> > +       tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
> >         if (!tmp_buf) {
> > -               tmp_buf = vmalloc(fs_info->nodesize);
> > -               if (!tmp_buf) {
> > -                       ret = -ENOMEM;
> > -                       goto out;
> > -               }
> > +               ret = -ENOMEM;
> > +               goto out;
> >         }
> >
> >         left_path->search_commit_root = 1;
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 77dabfed3a5d..6f0b488c7428 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
> >         u64 last_dest_end = destoff;
> >
> >         ret = -ENOMEM;
> > -       buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
> > -       if (!buf) {
> > -               buf = vmalloc(fs_info->nodesize);
> > -               if (!buf)
> > -                       return ret;
> > -       }
> > +       buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
> > +       if (!buf)
> > +               return ret;
> >
> >         path = btrfs_alloc_path();
> >         if (!path) {
> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > index d145ce804620..0621ca2a7b5d 100644
> > --- a/fs/btrfs/send.c
> > +++ b/fs/btrfs/send.c
> > @@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
> >         sctx->clone_roots_cnt = arg->clone_sources_count;
> >
> >         sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
> > -       sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
> > +       sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
> >         if (!sctx->send_buf) {
> > -               sctx->send_buf = vmalloc(sctx->send_max_size);
> > -               if (!sctx->send_buf) {
> > -                       ret = -ENOMEM;
> > -                       goto out;
> > -               }
> > +               ret = -ENOMEM;
> > +               goto out;
> >         }
> >
> > -       sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
> > +       sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
> >         if (!sctx->read_buf) {
> > -               sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
> > -               if (!sctx->read_buf) {
> > -                       ret = -ENOMEM;
> > -                       goto out;
> > -               }
> > +               ret = -ENOMEM;
> > +               goto out;
> >         }
> >
> >         sctx->pending_dir_moves = RB_ROOT;
> > @@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
> >         alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
> >
> >         if (arg->clone_sources_count) {
> > -               clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
> > +               clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
> >                 if (!clone_sources_tmp) {
> > -                       clone_sources_tmp = vmalloc(alloc_size);
> > -                       if (!clone_sources_tmp) {
> > -                               ret = -ENOMEM;
> > -                               goto out;
> > -                       }
> > +                       ret = -ENOMEM;
> > +                       goto out;
> >                 }
> >
> >                 ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 045d30d26624..78b18acf33ba 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
> >         align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
> >                 (PAGE_SIZE - 1);
> >         npages = calc_pages_for(align, nbytes);
> > -       pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> > -       if (!pages) {
> > -               pages = vmalloc(sizeof(*pages) * npages);
> > -               if (!pages)
> > -                       return ERR_PTR(-ENOMEM);
> > -       }
> > +       pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
> > +       if (!pages)
> > +               return ERR_PTR(-ENOMEM);
> 
> ceph hunk looks fine:
> 
> Acked-by: Ilya Dryomov <idryomov@gmail.com>

thanks!

[...]

> However I noticed that in some cases you've dropped the zeroing part:
> fq_codel_init() and hhf_zalloc() zeroed both k and v, and some others
> were inconsistent and zeroed only k.  Given that the fallback branch
> was probably dead, I'd keep the k behaviour.  Was that intentional?

No, that is an omission. Thanks for noticing. I will send the updated
patch.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
  (?)
@ 2017-01-12 17:26     ` Kees Cook
  -1 siblings, 0 replies; 180+ messages in thread
From: Kees Cook @ 2017-01-12 17:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, Network Development

On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
>  crypto/lzo.c                                       |  4 +--
>  drivers/acpi/apei/erst.c                           |  8 ++---
>  drivers/char/agp/generic.c                         |  8 +----
>  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
>  drivers/md/bcache/util.h                           | 12 ++------
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
>  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
>  drivers/nvdimm/dimm_devs.c                         |  5 +---
>  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
>  drivers/xen/evtchn.c                               | 14 +--------
>  fs/btrfs/ctree.c                                   |  9 ++----
>  fs/btrfs/ioctl.c                                   |  9 ++----
>  fs/btrfs/send.c                                    | 27 ++++++-----------
>  fs/ceph/file.c                                     |  9 ++----
>  fs/select.c                                        |  5 +---
>  fs/xattr.c                                         | 27 ++++++-----------
>  kernel/bpf/hashtab.c                               | 11 ++-----
>  lib/iov_iter.c                                     |  5 +---
>  mm/frame_vector.c                                  |  5 +---
>  net/ipv4/inet_hashtables.c                         |  6 +---
>  net/ipv4/tcp_metrics.c                             |  5 +---
>  net/mpls/af_mpls.c                                 |  5 +---
>  net/netfilter/x_tables.c                           | 34 ++++++----------------
>  net/netfilter/xt_recent.c                          |  5 +---
>  net/sched/sch_choke.c                              |  5 +---
>  net/sched/sch_fq_codel.c                           | 26 ++++-------------
>  net/sched/sch_hhf.c                                | 33 ++++++---------------
>  net/sched/sch_netem.c                              |  6 +---
>  net/sched/sch_sfq.c                                |  6 +---
>  security/keys/keyctl.c                             | 22 ++++----------
>  35 files changed, 96 insertions(+), 319 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>                 return -EINVAL;
>
> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> -                            GFP_KERNEL | __GFP_NOWARN);
> -       if (!keys)
> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);

Before doing this conversion, can we add a kvmalloc_array() API? This
conversion could allow for the reintroduction of integer overflow
flaws. (This particular situation isn't at risk since ->count is
checked, but I'd prefer we not create a risky set of examples for
using kvmalloc.)

Besides that: yes please. Less open coding. :)

-Kees

-- 
Kees Cook
Nexus Security

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:26     ` Kees Cook
  0 siblings, 0 replies; 180+ messages in thread
From: Kees Cook @ 2017-01-12 17:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg

On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
>  crypto/lzo.c                                       |  4 +--
>  drivers/acpi/apei/erst.c                           |  8 ++---
>  drivers/char/agp/generic.c                         |  8 +----
>  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
>  drivers/md/bcache/util.h                           | 12 ++------
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
>  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
>  drivers/nvdimm/dimm_devs.c                         |  5 +---
>  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
>  drivers/xen/evtchn.c                               | 14 +--------
>  fs/btrfs/ctree.c                                   |  9 ++----
>  fs/btrfs/ioctl.c                                   |  9 ++----
>  fs/btrfs/send.c                                    | 27 ++++++-----------
>  fs/ceph/file.c                                     |  9 ++----
>  fs/select.c                                        |  5 +---
>  fs/xattr.c                                         | 27 ++++++-----------
>  kernel/bpf/hashtab.c                               | 11 ++-----
>  lib/iov_iter.c                                     |  5 +---
>  mm/frame_vector.c                                  |  5 +---
>  net/ipv4/inet_hashtables.c                         |  6 +---
>  net/ipv4/tcp_metrics.c                             |  5 +---
>  net/mpls/af_mpls.c                                 |  5 +---
>  net/netfilter/x_tables.c                           | 34 ++++++----------------
>  net/netfilter/xt_recent.c                          |  5 +---
>  net/sched/sch_choke.c                              |  5 +---
>  net/sched/sch_fq_codel.c                           | 26 ++++-------------
>  net/sched/sch_hhf.c                                | 33 ++++++---------------
>  net/sched/sch_netem.c                              |  6 +---
>  net/sched/sch_sfq.c                                |  6 +---
>  security/keys/keyctl.c                             | 22 ++++----------
>  35 files changed, 96 insertions(+), 319 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>                 return -EINVAL;
>
> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> -                            GFP_KERNEL | __GFP_NOWARN);
> -       if (!keys)
> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);

Before doing this conversion, can we add a kvmalloc_array() API? This
conversion could allow for the reintroduction of integer overflow
flaws. (This particular situation isn't at risk since ->count is
checked, but I'd prefer we not create a risky set of examples for
using kvmalloc.)

Besides that: yes please. Less open coding. :)

-Kees

-- 
Kees Cook
Nexus Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:26     ` Kees Cook
  0 siblings, 0 replies; 180+ messages in thread
From: Kees Cook @ 2017-01-12 17:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, Network Development

On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
>  crypto/lzo.c                                       |  4 +--
>  drivers/acpi/apei/erst.c                           |  8 ++---
>  drivers/char/agp/generic.c                         |  8 +----
>  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
>  drivers/md/bcache/util.h                           | 12 ++------
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
>  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
>  drivers/nvdimm/dimm_devs.c                         |  5 +---
>  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
>  drivers/xen/evtchn.c                               | 14 +--------
>  fs/btrfs/ctree.c                                   |  9 ++----
>  fs/btrfs/ioctl.c                                   |  9 ++----
>  fs/btrfs/send.c                                    | 27 ++++++-----------
>  fs/ceph/file.c                                     |  9 ++----
>  fs/select.c                                        |  5 +---
>  fs/xattr.c                                         | 27 ++++++-----------
>  kernel/bpf/hashtab.c                               | 11 ++-----
>  lib/iov_iter.c                                     |  5 +---
>  mm/frame_vector.c                                  |  5 +---
>  net/ipv4/inet_hashtables.c                         |  6 +---
>  net/ipv4/tcp_metrics.c                             |  5 +---
>  net/mpls/af_mpls.c                                 |  5 +---
>  net/netfilter/x_tables.c                           | 34 ++++++----------------
>  net/netfilter/xt_recent.c                          |  5 +---
>  net/sched/sch_choke.c                              |  5 +---
>  net/sched/sch_fq_codel.c                           | 26 ++++-------------
>  net/sched/sch_hhf.c                                | 33 ++++++---------------
>  net/sched/sch_netem.c                              |  6 +---
>  net/sched/sch_sfq.c                                |  6 +---
>  security/keys/keyctl.c                             | 22 ++++----------
>  35 files changed, 96 insertions(+), 319 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4f74511015b8..e6bbb33d2956 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>                 return -EINVAL;
>
> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> -                            GFP_KERNEL | __GFP_NOWARN);
> -       if (!keys)
> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);

Before doing this conversion, can we add a kvmalloc_array() API? This
conversion could allow for the reintroduction of integer overflow
flaws. (This particular situation isn't at risk since ->count is
checked, but I'd prefer we not create a risky set of examples for
using kvmalloc.)

Besides that: yes please. Less open coding. :)

-Kees

-- 
Kees Cook
Nexus Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
  (?)
@ 2017-01-12 17:29     ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Martin Schwidefsky, Heiko Carstens,
	Herbert Xu, Anton Vorontsov, Colin Cross, Kees Cook, Tony Luck,
	Rafael J. Wysocki, Ben Skeggs, Kent Overstreet, Santosh Raspatur,
	Hariprasad S, Tariq Toukan, Yishai Hadas, Dan Williams,
	Oleg Drokin, Andreas Dilger, Boris Ostrovsky, David Sterba, Yan,
	Zheng, Ilya Dryomov, Alexei Starovoitov, Eric Dumazet, netdev

Ilya has noticed that I've screwed up some k[zc]alloc conversions and
didn't use the kvzalloc. This is an updated patch with some acks
collected on the way
---
>From a7b89c6d0a3c685045e37740c8f97b065f37e0a4 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 4 Jan 2017 13:30:32 +0100
Subject: [PATCH] treewide: use kv[mz]alloc* rather than opencoded variants

There are many code paths opencoding kvmalloc. Let's use the helper
instead. The main difference to kvmalloc is that those users are usually
not considering all the aspects of the memory allocator. E.g. allocation
requests < 64kB are basically never failing and invoke OOM killer to
satisfy the allocation. This sounds too disruptive for something that
has a reasonable fallback - the vmalloc. On the other hand those
requests might fallback to vmalloc even when the memory allocator would
succeed after several more reclaim/compaction attempts previously. There
is no guarantee something like that happens though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/kvm/kvm-s390.c                           | 10 ++-----
 crypto/lzo.c                                       |  4 +--
 drivers/acpi/apei/erst.c                           |  8 ++---
 drivers/char/agp/generic.c                         |  8 +----
 drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
 drivers/md/bcache/util.h                           | 12 ++------
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
 drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
 drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
 drivers/nvdimm/dimm_devs.c                         |  5 +---
 .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
 drivers/xen/evtchn.c                               | 14 +--------
 fs/btrfs/ctree.c                                   |  9 ++----
 fs/btrfs/ioctl.c                                   |  9 ++----
 fs/btrfs/send.c                                    | 27 ++++++-----------
 fs/ceph/file.c                                     |  9 ++----
 fs/select.c                                        |  5 +---
 fs/xattr.c                                         | 27 ++++++-----------
 kernel/bpf/hashtab.c                               | 11 ++-----
 lib/iov_iter.c                                     |  5 +---
 mm/frame_vector.c                                  |  5 +---
 net/ipv4/inet_hashtables.c                         |  6 +---
 net/ipv4/tcp_metrics.c                             |  5 +---
 net/mpls/af_mpls.c                                 |  5 +---
 net/netfilter/x_tables.c                           | 34 ++++++----------------
 net/netfilter/xt_recent.c                          |  5 +---
 net/sched/sch_choke.c                              |  5 +---
 net/sched/sch_fq_codel.c                           | 26 ++++-------------
 net/sched/sch_hhf.c                                | 33 ++++++---------------
 net/sched/sch_netem.c                              |  6 +---
 net/sched/sch_sfq.c                                |  6 +---
 security/keys/keyctl.c                             | 22 ++++----------
 35 files changed, 96 insertions(+), 319 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4f74511015b8..e6bbb33d2956 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
@@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
diff --git a/crypto/lzo.c b/crypto/lzo.c
index 168df784da84..218567d717d6 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
 {
 	void *ctx;
 
-	ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
-	if (!ctx)
-		ctx = vmalloc(LZO1X_MEM_COMPRESS);
+	ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index ec4f507b524f..a2898df61744 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
 	if (i < erst_record_id_cache.len)
 		goto retry;
 	if (erst_record_id_cache.len >= erst_record_id_cache.size) {
-		int new_size, alloc_size;
+		int new_size;
 		u64 *new_entries;
 
 		new_size = erst_record_id_cache.size * 2;
@@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
 				pr_warn(FW_WARN "too many record IDs!\n");
 			return 0;
 		}
-		alloc_size = new_size * sizeof(entries[0]);
-		if (alloc_size < PAGE_SIZE)
-			new_entries = kmalloc(alloc_size, GFP_KERNEL);
-		else
-			new_entries = vmalloc(alloc_size);
+		new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
 		if (!new_entries)
 			return -ENOMEM;
 		memcpy(new_entries, entries,
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index f002fa5d1887..bdf418cac8ef 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -88,13 +88,7 @@ static int agp_get_key(void)
 
 void agp_alloc_page_array(size_t size, struct agp_memory *mem)
 {
-	mem->pages = NULL;
-
-	if (size <= 2*PAGE_SIZE)
-		mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (mem->pages == NULL) {
-		mem->pages = vmalloc(size);
-	}
+	mem->pages = kvmalloc(size, GFP_KERNEL);
 }
 EXPORT_SYMBOL(agp_alloc_page_array);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 201b52b750dd..77dd73ff126f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
 
 	size *= nmemb;
 
-	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!mem)
-		mem = vmalloc(size);
+	mem = kvmalloc(size, GFP_KERNEL);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index cf2cbc211d83..d00bcb64d3a8 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -43,11 +43,7 @@ struct closure;
 	(heap)->used = 0;						\
 	(heap)->size = (_size);						\
 	_bytes = (heap)->size * sizeof(*(heap)->data);			\
-	(heap)->data = NULL;						\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(heap)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(heap)->data) && ((gfp) & GFP_KERNEL))			\
-		(heap)->data = vmalloc(_bytes);				\
+	(heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(heap)->data;							\
 })
 
@@ -136,12 +132,8 @@ do {									\
 									\
 	(fifo)->mask = _allocated_size - 1;				\
 	(fifo)->front = (fifo)->back = 0;				\
-	(fifo)->data = NULL;						\
 									\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(fifo)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))			\
-		(fifo)->data = vmalloc(_bytes);				\
+	(fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(fifo)->data;							\
 })
 
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
index 920d918ed193..f04e81f33795 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
@@ -41,9 +41,6 @@
 
 #define VALIDATE_TID 1
 
-void *cxgb_alloc_mem(unsigned long size);
-void cxgb_free_mem(void *addr);
-
 /*
  * Map an ATID or STID to their entries in the corresponding TID tables.
  */
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
index 76684dcb874c..4d80bccf9c01 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
@@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
 }
 
 /*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *cxgb_alloc_mem(unsigned long size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through t3_alloc_mem().
- */
-void cxgb_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
-/*
  * Allocate and initialize the TID tables.  Returns 0 on success.
  */
 static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
@@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 	unsigned long size = ntids * sizeof(*t->tid_tab) +
 	    natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
 
-	t->tid_tab = cxgb_alloc_mem(size);
+	t->tid_tab = kvzalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 
 static void free_tid_maps(struct tid_info *t)
 {
-	cxgb_free_mem(t->tid_tab);
+	kvfree(t->tid_tab);
 }
 
 static inline void add_adapter(struct adapter *adap)
diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
index 5f226eda8cd6..f5c92acd52b4 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
@@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
 	struct l2t_data *d;
 	int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
 
-	d = cxgb_alloc_mem(size);
+	d = kzmalloc(size, GFP_KERNEL);
 	if (!d)
 		return NULL;
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 6f951877430b..a64c2a3d39fc 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
 	return err;
 }
 
-/*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *t4_alloc_mem(size_t size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through alloc_mem().
- */
-void t4_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
 static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
 			     void *accel_priv, select_queue_fallback_t fallback)
 {
@@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
 	       max_ftids * sizeof(*t->ftid_tab) +
 	       ftid_bmap_size * sizeof(long);
 
-	t->tid_tab = t4_alloc_mem(size);
+	t->tid_tab = kvzalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
 		/* allocate memory to read the header of the firmware on the
 		 * card
 		 */
-		card_fw = t4_alloc_mem(sizeof(*card_fw));
+		card_fw = kvzalloc(sizeof(*card_fw), GFP_KERNEL);
 
 		/* Get FW from from /lib/firmware/ */
 		ret = request_firmware(&fw, fw_info->fw_mod_name,
@@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
 
 		/* Cleaning up */
 		release_firmware(fw);
-		t4_free_mem(card_fw);
+		kvfree(card_fw);
 
 		if (ret < 0)
 			goto bye;
@@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
 {
 	unsigned int i;
 
-	t4_free_mem(adapter->l2t);
+	kvfree(adapter->l2t);
 	t4_cleanup_sched(adapter);
-	t4_free_mem(adapter->tids.tid_tab);
+	kvfree(adapter->tids.tid_tab);
 	cxgb4_cleanup_tc_u32(adapter);
 	kfree(adapter->sge.egr_map);
 	kfree(adapter->sge.ingr_map);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 5886ad78058f..a5c1b815145e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
 
 	tmp = size * sizeof(struct mlx4_en_tx_info);
-	ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
+	ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
 	if (!ring->tx_info) {
-		ring->tx_info = vmalloc(tmp);
-		if (!ring->tx_info) {
-			err = -ENOMEM;
-			goto err_ring;
-		}
+		err = -ENOMEM;
+		goto err_ring;
 	}
 
 	en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index 395b5463cfd9..82354fd0a87e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
 
 	for (i = 0; i <= buddy->max_order; ++i) {
 		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-		buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
-		if (!buddy->bits[i]) {
-			buddy->bits[i] = vzalloc(s * sizeof(long));
-			if (!buddy->bits[i])
-				goto err_out_free;
-		}
+		buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
+		if (!buddy->bits[i])
+			goto err_out_free;
 	}
 
 	set_bit(0, buddy->bits[buddy->max_order]);
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 0eedc49e0d47..3bd332b167d9 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
 		return -ENXIO;
 	}
 
-	ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
-	if (!ndd->data)
-		ndd->data = vmalloc(ndd->nsarea.config_size);
-
+	ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
 	if (!ndd->data)
 		return -ENOMEM;
 
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
index a6a76a681ea9..8f638267e704 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
@@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
 void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
 			  gfp_t flags)
 {
-	void *ret;
-
-	ret = kzalloc_node(size, flags | __GFP_NOWARN,
-			   cfs_cpt_spread_node(cptab, cpt));
-	if (!ret) {
-		WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
-		ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
-	}
-
-	return ret;
+	return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
 }
 EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 6890897a6f30..10f1ef582659 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -87,18 +87,6 @@ struct user_evtchn {
 	bool enabled;
 };
 
-static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
-{
-	evtchn_port_t *ring;
-	size_t s = size * sizeof(*ring);
-
-	ring = kmalloc(s, GFP_KERNEL);
-	if (!ring)
-		ring = vmalloc(s);
-
-	return ring;
-}
-
 static void evtchn_free_ring(evtchn_port_t *ring)
 {
 	kvfree(ring);
@@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
 	else
 		new_size = 2 * u->ring_size;
 
-	new_ring = evtchn_alloc_ring(new_size);
+	new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
 	if (!new_ring)
 		return -ENOMEM;
 
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 146b2dc0d2cf..4fc9712d927d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
 		goto out;
 	}
 
-	tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
+	tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
 	if (!tmp_buf) {
-		tmp_buf = vmalloc(fs_info->nodesize);
-		if (!tmp_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	left_path->search_commit_root = 1;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 77dabfed3a5d..6f0b488c7428 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 	u64 last_dest_end = destoff;
 
 	ret = -ENOMEM;
-	buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
-	if (!buf) {
-		buf = vmalloc(fs_info->nodesize);
-		if (!buf)
-			return ret;
-	}
+	buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
+	if (!buf)
+		return ret;
 
 	path = btrfs_alloc_path();
 	if (!path) {
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index d145ce804620..0621ca2a7b5d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
 	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
-	sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
+	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
-		sctx->send_buf = vmalloc(sctx->send_max_size);
-		if (!sctx->send_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
-	sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
+	sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
 	if (!sctx->read_buf) {
-		sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
-		if (!sctx->read_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	sctx->pending_dir_moves = RB_ROOT;
@@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
 
 	if (arg->clone_sources_count) {
-		clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
+		clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!clone_sources_tmp) {
-			clone_sources_tmp = vmalloc(alloc_size);
-			if (!clone_sources_tmp) {
-				ret = -ENOMEM;
-				goto out;
-			}
+			ret = -ENOMEM;
+			goto out;
 		}
 
 		ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 045d30d26624..78b18acf33ba 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
 	align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
 		(PAGE_SIZE - 1);
 	npages = calc_pages_for(align, nbytes);
-	pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
-	if (!pages) {
-		pages = vmalloc(sizeof(*pages) * npages);
-		if (!pages)
-			return ERR_PTR(-ENOMEM);
-	}
+	pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
+	if (!pages)
+		return ERR_PTR(-ENOMEM);
 
 	for (idx = 0; idx < npages; ) {
 		size_t start;
diff --git a/fs/select.c b/fs/select.c
index 305c0daf5d67..9e8e1189eb99 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -586,10 +586,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
 			goto out_nofds;
 
 		alloc_size = 6 * size;
-		bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
-		if (!bits && alloc_size > PAGE_SIZE)
-			bits = vmalloc(alloc_size);
-
+		bits = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!bits)
 			goto out_nofds;
 	}
diff --git a/fs/xattr.c b/fs/xattr.c
index 7e3317cf4045..967542e1521b 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -431,12 +431,9 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			return -E2BIG;
-		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 		if (copy_from_user(kvalue, value, size)) {
 			error = -EFAULT;
 			goto out;
@@ -528,12 +525,9 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			size = XATTR_SIZE_MAX;
-		kvalue = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kzmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 	}
 
 	error = vfs_getxattr(d, kname, kvalue, size);
@@ -611,12 +605,9 @@ listxattr(struct dentry *d, char __user *list, size_t size)
 	if (size) {
 		if (size > XATTR_LIST_MAX)
 			size = XATTR_LIST_MAX;
-		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
-		if (!klist) {
-			klist = vmalloc(size);
-			if (!klist)
-				return -ENOMEM;
-		}
+		klist = kvmalloc(size, GFP_KERNEL);
+		if (!klist)
+			return -ENOMEM;
 	}
 
 	error = vfs_listxattr(d, klist, size);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 34debc1a9641..4ca30a951bbc 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -320,14 +320,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		goto free_htab;
 
 	err = -ENOMEM;
-	htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket),
-				      GFP_USER | __GFP_NOWARN);
-
-	if (!htab->buckets) {
-		htab->buckets = vmalloc(htab->n_buckets * sizeof(struct bucket));
-		if (!htab->buckets)
-			goto free_htab;
-	}
+	htab->buckets = kvmalloc(htab->n_buckets * sizeof(struct bucket), GFP_USER);
+	if (!htab->buckets)
+		goto free_htab;
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_HEAD(&htab->buckets[i].head);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 25f572303801..45c17b5562b5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -957,10 +957,7 @@ EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
 {
-	struct page **p = kmalloc(n * sizeof(struct page *), GFP_KERNEL);
-	if (!p)
-		p = vmalloc(n * sizeof(struct page *));
-	return p;
+	return kvmalloc(n * sizeof(struct page *), GFP_KERNEL);
 }
 
 static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index db77dcb38afd..72ebec18629c 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -200,10 +200,7 @@ struct frame_vector *frame_vector_create(unsigned int nr_frames)
 	 * Avoid higher order allocations, use vmalloc instead. It should
 	 * be rare anyway.
 	 */
-	if (size <= PAGE_SIZE)
-		vec = kmalloc(size, GFP_KERNEL);
-	else
-		vec = vmalloc(size);
+	vec = kvmalloc(size, GFP_KERNEL);
 	if (!vec)
 		return NULL;
 	vec->nr_allocated = nr_frames;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ca97835bfec4..a46a9fd8b540 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -687,11 +687,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 		/* no more locks than number of hash buckets */
 		nblocks = min(nblocks, hashinfo->ehash_mask + 1);
 
-		hashinfo->ehash_locks =	kmalloc_array(nblocks, locksz,
-						      GFP_KERNEL | __GFP_NOWARN);
-		if (!hashinfo->ehash_locks)
-			hashinfo->ehash_locks = vmalloc(nblocks * locksz);
-
+		hashinfo->ehash_locks = kvmalloc(nblocks * locksz, GFP_KERNEL);
 		if (!hashinfo->ehash_locks)
 			return -ENOMEM;
 
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index d46f4d5b1c62..39b2166d3be8 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -1155,10 +1155,7 @@ static int __net_init tcp_net_metrics_init(struct net *net)
 	tcp_metrics_hash_log = order_base_2(slots);
 	size = sizeof(struct tcpm_hash_bucket) << tcp_metrics_hash_log;
 
-	tcp_metrics_hash = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!tcp_metrics_hash)
-		tcp_metrics_hash = vzalloc(size);
-
+	tcp_metrics_hash = kvzalloc(size, GFP_KERNEL);
 	if (!tcp_metrics_hash)
 		return -ENOMEM;
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 15fe97644ffe..a0c82ef74389 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1525,10 +1525,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 	unsigned index;
 
 	if (size) {
-		labels = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-		if (!labels)
-			labels = vzalloc(size);
-
+		labels = kvzalloc(size, GFP_KERNEL);
 		if (!labels)
 			goto nolabels;
 	}
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index a011322a027d..cdc55d5ee4ad 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,17 +712,11 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-	unsigned int *off;
-
-	off = kcalloc(size, sizeof(unsigned int), GFP_KERNEL | __GFP_NOWARN);
-
-	if (off)
-		return off;
-
 	if (size < (SIZE_MAX / sizeof(unsigned int)))
-		off = vmalloc(size * sizeof(unsigned int));
+		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
+
+	return NULL;
 
-	return off;
 }
 EXPORT_SYMBOL(xt_alloc_entry_offsets);
 
@@ -956,15 +950,9 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (!info) {
-		info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN |
-				     __GFP_NORETRY | __GFP_HIGHMEM,
-				 PAGE_KERNEL);
-		if (!info)
-			return NULL;
-	}
+	info = kvmalloc(sz, GFP_KERNEL);
+	if (!info)
+		return NULL;
 	memset(info, 0, sizeof(*info));
 	info->size = size;
 	return info;
@@ -1066,7 +1054,7 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 
 	size = sizeof(void **) * nr_cpu_ids;
 	if (size > PAGE_SIZE)
-		i->jumpstack = vzalloc(size);
+		i->jumpstack = kvzalloc(size, GFP_KERNEL);
 	else
 		i->jumpstack = kzalloc(size, GFP_KERNEL);
 	if (i->jumpstack == NULL)
@@ -1088,12 +1076,8 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	 */
 	size = sizeof(void *) * i->stacksize * 2u;
 	for_each_possible_cpu(cpu) {
-		if (size > PAGE_SIZE)
-			i->jumpstack[cpu] = vmalloc_node(size,
-				cpu_to_node(cpu));
-		else
-			i->jumpstack[cpu] = kmalloc_node(size,
-				GFP_KERNEL, cpu_to_node(cpu));
+		i->jumpstack[cpu] = kvmalloc_node(size, GFP_KERNEL,
+			cpu_to_node(cpu));
 		if (i->jumpstack[cpu] == NULL)
 			/*
 			 * Freeing will be done later on by the callers. The
diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index 1d89a4eaf841..d6aa8f63ed2e 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -388,10 +388,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par,
 	}
 
 	sz = sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size;
-	if (sz <= PAGE_SIZE)
-		t = kzalloc(sz, GFP_KERNEL);
-	else
-		t = vzalloc(sz);
+	t = kvzalloc(sz, GFP_KERNEL);
 	if (t == NULL) {
 		ret = -ENOMEM;
 		goto out;
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 3b6d5bd69101..30d6a39fd2c8 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -431,10 +431,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 	if (mask != q->tab_mask) {
 		struct sk_buff **ntab;
 
-		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *),
-			       GFP_KERNEL | __GFP_NOWARN);
-		if (!ntab)
-			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		ntab = kvzalloc((mask + 1) * sizeof(struct sk_buff *), GFP_KERNEL);
 		if (!ntab)
 			return -ENOMEM;
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index a5ea0e9b6be4..c580f0d406c2 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -449,27 +449,13 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
 	return 0;
 }
 
-static void *fq_codel_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-	return ptr;
-}
-
-static void fq_codel_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void fq_codel_destroy(struct Qdisc *sch)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 
 	tcf_destroy_chain(&q->filter_list);
-	fq_codel_free(q->backlogs);
-	fq_codel_free(q->flows);
+	kvfree(q->backlogs);
+	kvfree(q->flows);
 }
 
 static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
@@ -497,13 +483,13 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
 	}
 
 	if (!q->flows) {
-		q->flows = fq_codel_zalloc(q->flows_cnt *
-					   sizeof(struct fq_codel_flow));
+		q->flows = kvzalloc(q->flows_cnt *
+					   sizeof(struct fq_codel_flow), GFP_KERNEL);
 		if (!q->flows)
 			return -ENOMEM;
-		q->backlogs = fq_codel_zalloc(q->flows_cnt * sizeof(u32));
+		q->backlogs = kvzalloc(q->flows_cnt * sizeof(u32), GFP_KERNEL);
 		if (!q->backlogs) {
-			fq_codel_free(q->flows);
+			kvfree(q->flows);
 			return -ENOMEM;
 		}
 		for (i = 0; i < q->flows_cnt; i++) {
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index e3d0458af17b..2454055c737e 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -467,29 +467,14 @@ static void hhf_reset(struct Qdisc *sch)
 		rtnl_kfree_skbs(skb, skb);
 }
 
-static void *hhf_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-
-	return ptr;
-}
-
-static void hhf_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void hhf_destroy(struct Qdisc *sch)
 {
 	int i;
 	struct hhf_sched_data *q = qdisc_priv(sch);
 
 	for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-		hhf_free(q->hhf_arrays[i]);
-		hhf_free(q->hhf_valid_bits[i]);
+		kvfree(q->hhf_arrays[i]);
+		kvfree(q->hhf_valid_bits[i]);
 	}
 
 	for (i = 0; i < HH_FLOWS_CNT; i++) {
@@ -503,7 +488,7 @@ static void hhf_destroy(struct Qdisc *sch)
 			kfree(flow);
 		}
 	}
-	hhf_free(q->hh_flows);
+	kvfree(q->hh_flows);
 }
 
 static const struct nla_policy hhf_policy[TCA_HHF_MAX + 1] = {
@@ -609,8 +594,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 	if (!q->hh_flows) {
 		/* Initialize heavy-hitter flow table. */
-		q->hh_flows = hhf_zalloc(HH_FLOWS_CNT *
-					 sizeof(struct list_head));
+		q->hh_flows = kvzalloc(HH_FLOWS_CNT *
+					 sizeof(struct list_head), GFP_KERNEL);
 		if (!q->hh_flows)
 			return -ENOMEM;
 		for (i = 0; i < HH_FLOWS_CNT; i++)
@@ -624,8 +609,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_arrays[i] = hhf_zalloc(HHF_ARRAYS_LEN *
-						      sizeof(u32));
+			q->hhf_arrays[i] = kvzalloc(HHF_ARRAYS_LEN *
+						      sizeof(u32), GFP_KERNEL);
 			if (!q->hhf_arrays[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
@@ -635,8 +620,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize valid bits of heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_valid_bits[i] = hhf_zalloc(HHF_ARRAYS_LEN /
-							  BITS_PER_BYTE);
+			q->hhf_valid_bits[i] = kvzalloc(HHF_ARRAYS_LEN /
+							  BITS_PER_BYTE, GFP_KERNEL);
 			if (!q->hhf_valid_bits[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bcfadfdea8e0..08a3d2af1792 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -692,15 +692,11 @@ static int get_dist_table(struct Qdisc *sch, const struct nlattr *attr)
 	spinlock_t *root_lock;
 	struct disttable *d;
 	int i;
-	size_t s;
 
 	if (n > NETEM_DIST_MAX)
 		return -EINVAL;
 
-	s = sizeof(struct disttable) + n * sizeof(s16);
-	d = kmalloc(s, GFP_KERNEL | __GFP_NOWARN);
-	if (!d)
-		d = vmalloc(s);
+	d = kvmalloc(sizeof(struct disttable) + n * sizeof(s16), GFP_KERNEL);
 	if (!d)
 		return -ENOMEM;
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 7f195ed4d568..5d70cd6a032d 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -684,11 +684,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
 static void *sfq_alloc(size_t sz)
 {
-	void *ptr = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vmalloc(sz);
-	return ptr;
+	return  kvmalloc(sz, GFP_KERNEL);
 }
 
 static void sfq_free(void *addr)
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 38c00e867bda..a5c21f05ece4 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -99,14 +99,9 @@ SYSCALL_DEFINE5(add_key, const char __user *, _type,
 
 	if (_payload) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error2;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error2;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error2;
 
 		ret = -EFAULT;
 		if (copy_from_user(payload, _payload, plen) != 0)
@@ -1064,14 +1059,9 @@ long keyctl_instantiate_key_common(key_serial_t id,
 
 	if (from) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error;
 
 		ret = -EFAULT;
 		if (!copy_from_iter_full(payload, plen, from))
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:29     ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Martin Schwidefsky, Heiko Carstens,
	Herbert Xu, Anton Vorontsov, Colin Cross, Kees Cook, Tony Luck,
	Rafael J. Wysocki, Ben Skeggs, Kent Overstreet, Santosh Raspatur,
	Hariprasad S, Tariq Toukan, Yishai Hadas, Dan Williams,
	Oleg Drokin, Andreas Dilger

Ilya has noticed that I've screwed up some k[zc]alloc conversions and
didn't use the kvzalloc. This is an updated patch with some acks
collected on the way
---
>From a7b89c6d0a3c685045e37740c8f97b065f37e0a4 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 4 Jan 2017 13:30:32 +0100
Subject: [PATCH] treewide: use kv[mz]alloc* rather than opencoded variants

There are many code paths opencoding kvmalloc. Let's use the helper
instead. The main difference to kvmalloc is that those users are usually
not considering all the aspects of the memory allocator. E.g. allocation
requests < 64kB are basically never failing and invoke OOM killer to
satisfy the allocation. This sounds too disruptive for something that
has a reasonable fallback - the vmalloc. On the other hand those
requests might fallback to vmalloc even when the memory allocator would
succeed after several more reclaim/compaction attempts previously. There
is no guarantee something like that happens though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/kvm/kvm-s390.c                           | 10 ++-----
 crypto/lzo.c                                       |  4 +--
 drivers/acpi/apei/erst.c                           |  8 ++---
 drivers/char/agp/generic.c                         |  8 +----
 drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
 drivers/md/bcache/util.h                           | 12 ++------
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
 drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
 drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
 drivers/nvdimm/dimm_devs.c                         |  5 +---
 .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
 drivers/xen/evtchn.c                               | 14 +--------
 fs/btrfs/ctree.c                                   |  9 ++----
 fs/btrfs/ioctl.c                                   |  9 ++----
 fs/btrfs/send.c                                    | 27 ++++++-----------
 fs/ceph/file.c                                     |  9 ++----
 fs/select.c                                        |  5 +---
 fs/xattr.c                                         | 27 ++++++-----------
 kernel/bpf/hashtab.c                               | 11 ++-----
 lib/iov_iter.c                                     |  5 +---
 mm/frame_vector.c                                  |  5 +---
 net/ipv4/inet_hashtables.c                         |  6 +---
 net/ipv4/tcp_metrics.c                             |  5 +---
 net/mpls/af_mpls.c                                 |  5 +---
 net/netfilter/x_tables.c                           | 34 ++++++----------------
 net/netfilter/xt_recent.c                          |  5 +---
 net/sched/sch_choke.c                              |  5 +---
 net/sched/sch_fq_codel.c                           | 26 ++++-------------
 net/sched/sch_hhf.c                                | 33 ++++++---------------
 net/sched/sch_netem.c                              |  6 +---
 net/sched/sch_sfq.c                                |  6 +---
 security/keys/keyctl.c                             | 22 ++++----------
 35 files changed, 96 insertions(+), 319 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4f74511015b8..e6bbb33d2956 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
@@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
diff --git a/crypto/lzo.c b/crypto/lzo.c
index 168df784da84..218567d717d6 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
 {
 	void *ctx;
 
-	ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
-	if (!ctx)
-		ctx = vmalloc(LZO1X_MEM_COMPRESS);
+	ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index ec4f507b524f..a2898df61744 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
 	if (i < erst_record_id_cache.len)
 		goto retry;
 	if (erst_record_id_cache.len >= erst_record_id_cache.size) {
-		int new_size, alloc_size;
+		int new_size;
 		u64 *new_entries;
 
 		new_size = erst_record_id_cache.size * 2;
@@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
 				pr_warn(FW_WARN "too many record IDs!\n");
 			return 0;
 		}
-		alloc_size = new_size * sizeof(entries[0]);
-		if (alloc_size < PAGE_SIZE)
-			new_entries = kmalloc(alloc_size, GFP_KERNEL);
-		else
-			new_entries = vmalloc(alloc_size);
+		new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
 		if (!new_entries)
 			return -ENOMEM;
 		memcpy(new_entries, entries,
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index f002fa5d1887..bdf418cac8ef 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -88,13 +88,7 @@ static int agp_get_key(void)
 
 void agp_alloc_page_array(size_t size, struct agp_memory *mem)
 {
-	mem->pages = NULL;
-
-	if (size <= 2*PAGE_SIZE)
-		mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (mem->pages == NULL) {
-		mem->pages = vmalloc(size);
-	}
+	mem->pages = kvmalloc(size, GFP_KERNEL);
 }
 EXPORT_SYMBOL(agp_alloc_page_array);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 201b52b750dd..77dd73ff126f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
 
 	size *= nmemb;
 
-	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!mem)
-		mem = vmalloc(size);
+	mem = kvmalloc(size, GFP_KERNEL);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index cf2cbc211d83..d00bcb64d3a8 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -43,11 +43,7 @@ struct closure;
 	(heap)->used = 0;						\
 	(heap)->size = (_size);						\
 	_bytes = (heap)->size * sizeof(*(heap)->data);			\
-	(heap)->data = NULL;						\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(heap)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(heap)->data) && ((gfp) & GFP_KERNEL))			\
-		(heap)->data = vmalloc(_bytes);				\
+	(heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(heap)->data;							\
 })
 
@@ -136,12 +132,8 @@ do {									\
 									\
 	(fifo)->mask = _allocated_size - 1;				\
 	(fifo)->front = (fifo)->back = 0;				\
-	(fifo)->data = NULL;						\
 									\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(fifo)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))			\
-		(fifo)->data = vmalloc(_bytes);				\
+	(fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(fifo)->data;							\
 })
 
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
index 920d918ed193..f04e81f33795 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h
@@ -41,9 +41,6 @@
 
 #define VALIDATE_TID 1
 
-void *cxgb_alloc_mem(unsigned long size);
-void cxgb_free_mem(void *addr);
-
 /*
  * Map an ATID or STID to their entries in the corresponding TID tables.
  */
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
index 76684dcb874c..4d80bccf9c01 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
@@ -1152,27 +1152,6 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new,
 }
 
 /*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *cxgb_alloc_mem(unsigned long size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through t3_alloc_mem().
- */
-void cxgb_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
-/*
  * Allocate and initialize the TID tables.  Returns 0 on success.
  */
 static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
@@ -1182,7 +1161,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 	unsigned long size = ntids * sizeof(*t->tid_tab) +
 	    natids * sizeof(*t->atid_tab) + nstids * sizeof(*t->stid_tab);
 
-	t->tid_tab = cxgb_alloc_mem(size);
+	t->tid_tab = kvzalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -1218,7 +1197,7 @@ static int init_tid_tabs(struct tid_info *t, unsigned int ntids,
 
 static void free_tid_maps(struct tid_info *t)
 {
-	cxgb_free_mem(t->tid_tab);
+	kvfree(t->tid_tab);
 }
 
 static inline void add_adapter(struct adapter *adap)
diff --git a/drivers/net/ethernet/chelsio/cxgb3/l2t.c b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
index 5f226eda8cd6..f5c92acd52b4 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/l2t.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/l2t.c
@@ -444,7 +444,7 @@ struct l2t_data *t3_init_l2t(unsigned int l2t_capacity)
 	struct l2t_data *d;
 	int i, size = sizeof(*d) + l2t_capacity * sizeof(struct l2t_entry);
 
-	d = cxgb_alloc_mem(size);
+	d = kzmalloc(size, GFP_KERNEL);
 	if (!d)
 		return NULL;
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 6f951877430b..a64c2a3d39fc 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -881,27 +881,6 @@ static int setup_sge_queues(struct adapter *adap)
 	return err;
 }
 
-/*
- * Allocate a chunk of memory using kmalloc or, if that fails, vmalloc.
- * The allocated memory is cleared.
- */
-void *t4_alloc_mem(size_t size)
-{
-	void *p = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!p)
-		p = vzalloc(size);
-	return p;
-}
-
-/*
- * Free memory allocated through alloc_mem().
- */
-void t4_free_mem(void *addr)
-{
-	kvfree(addr);
-}
-
 static u16 cxgb_select_queue(struct net_device *dev, struct sk_buff *skb,
 			     void *accel_priv, select_queue_fallback_t fallback)
 {
@@ -1300,7 +1279,7 @@ static int tid_init(struct tid_info *t)
 	       max_ftids * sizeof(*t->ftid_tab) +
 	       ftid_bmap_size * sizeof(long);
 
-	t->tid_tab = t4_alloc_mem(size);
+	t->tid_tab = kvzalloc(size, GFP_KERNEL);
 	if (!t->tid_tab)
 		return -ENOMEM;
 
@@ -3416,7 +3395,7 @@ static int adap_init0(struct adapter *adap)
 		/* allocate memory to read the header of the firmware on the
 		 * card
 		 */
-		card_fw = t4_alloc_mem(sizeof(*card_fw));
+		card_fw = kvzalloc(sizeof(*card_fw), GFP_KERNEL);
 
 		/* Get FW from from /lib/firmware/ */
 		ret = request_firmware(&fw, fw_info->fw_mod_name,
@@ -3436,7 +3415,7 @@ static int adap_init0(struct adapter *adap)
 
 		/* Cleaning up */
 		release_firmware(fw);
-		t4_free_mem(card_fw);
+		kvfree(card_fw);
 
 		if (ret < 0)
 			goto bye;
@@ -4432,9 +4411,9 @@ static void free_some_resources(struct adapter *adapter)
 {
 	unsigned int i;
 
-	t4_free_mem(adapter->l2t);
+	kvfree(adapter->l2t);
 	t4_cleanup_sched(adapter);
-	t4_free_mem(adapter->tids.tid_tab);
+	kvfree(adapter->tids.tid_tab);
 	cxgb4_cleanup_tc_u32(adapter);
 	kfree(adapter->sge.egr_map);
 	kfree(adapter->sge.ingr_map);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 5886ad78058f..a5c1b815145e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
 
 	tmp = size * sizeof(struct mlx4_en_tx_info);
-	ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
+	ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
 	if (!ring->tx_info) {
-		ring->tx_info = vmalloc(tmp);
-		if (!ring->tx_info) {
-			err = -ENOMEM;
-			goto err_ring;
-		}
+		err = -ENOMEM;
+		goto err_ring;
 	}
 
 	en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index 395b5463cfd9..82354fd0a87e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -115,12 +115,9 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
 
 	for (i = 0; i <= buddy->max_order; ++i) {
 		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-		buddy->bits[i] = kcalloc(s, sizeof (long), GFP_KERNEL | __GFP_NOWARN);
-		if (!buddy->bits[i]) {
-			buddy->bits[i] = vzalloc(s * sizeof(long));
-			if (!buddy->bits[i])
-				goto err_out_free;
-		}
+		buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
+		if (!buddy->bits[i])
+			goto err_out_free;
 	}
 
 	set_bit(0, buddy->bits[buddy->max_order]);
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 0eedc49e0d47..3bd332b167d9 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
 		return -ENXIO;
 	}
 
-	ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
-	if (!ndd->data)
-		ndd->data = vmalloc(ndd->nsarea.config_size);
-
+	ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
 	if (!ndd->data)
 		return -ENOMEM;
 
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
index a6a76a681ea9..8f638267e704 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
@@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
 void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
 			  gfp_t flags)
 {
-	void *ret;
-
-	ret = kzalloc_node(size, flags | __GFP_NOWARN,
-			   cfs_cpt_spread_node(cptab, cpt));
-	if (!ret) {
-		WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
-		ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
-	}
-
-	return ret;
+	return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
 }
 EXPORT_SYMBOL(libcfs_kvzalloc_cpt);
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 6890897a6f30..10f1ef582659 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -87,18 +87,6 @@ struct user_evtchn {
 	bool enabled;
 };
 
-static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
-{
-	evtchn_port_t *ring;
-	size_t s = size * sizeof(*ring);
-
-	ring = kmalloc(s, GFP_KERNEL);
-	if (!ring)
-		ring = vmalloc(s);
-
-	return ring;
-}
-
 static void evtchn_free_ring(evtchn_port_t *ring)
 {
 	kvfree(ring);
@@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
 	else
 		new_size = 2 * u->ring_size;
 
-	new_ring = evtchn_alloc_ring(new_size);
+	new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
 	if (!new_ring)
 		return -ENOMEM;
 
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 146b2dc0d2cf..4fc9712d927d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5391,13 +5391,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
 		goto out;
 	}
 
-	tmp_buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
+	tmp_buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
 	if (!tmp_buf) {
-		tmp_buf = vmalloc(fs_info->nodesize);
-		if (!tmp_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	left_path->search_commit_root = 1;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 77dabfed3a5d..6f0b488c7428 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3547,12 +3547,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 	u64 last_dest_end = destoff;
 
 	ret = -ENOMEM;
-	buf = kmalloc(fs_info->nodesize, GFP_KERNEL | __GFP_NOWARN);
-	if (!buf) {
-		buf = vmalloc(fs_info->nodesize);
-		if (!buf)
-			return ret;
-	}
+	buf = kvmalloc(fs_info->nodesize, GFP_KERNEL);
+	if (!buf)
+		return ret;
 
 	path = btrfs_alloc_path();
 	if (!path) {
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index d145ce804620..0621ca2a7b5d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6242,22 +6242,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
 	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
-	sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
+	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
-		sctx->send_buf = vmalloc(sctx->send_max_size);
-		if (!sctx->send_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
-	sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
+	sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
 	if (!sctx->read_buf) {
-		sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
-		if (!sctx->read_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	sctx->pending_dir_moves = RB_ROOT;
@@ -6278,13 +6272,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
 
 	if (arg->clone_sources_count) {
-		clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
+		clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!clone_sources_tmp) {
-			clone_sources_tmp = vmalloc(alloc_size);
-			if (!clone_sources_tmp) {
-				ret = -ENOMEM;
-				goto out;
-			}
+			ret = -ENOMEM;
+			goto out;
 		}
 
 		ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 045d30d26624..78b18acf33ba 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
 	align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
 		(PAGE_SIZE - 1);
 	npages = calc_pages_for(align, nbytes);
-	pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
-	if (!pages) {
-		pages = vmalloc(sizeof(*pages) * npages);
-		if (!pages)
-			return ERR_PTR(-ENOMEM);
-	}
+	pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
+	if (!pages)
+		return ERR_PTR(-ENOMEM);
 
 	for (idx = 0; idx < npages; ) {
 		size_t start;
diff --git a/fs/select.c b/fs/select.c
index 305c0daf5d67..9e8e1189eb99 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -586,10 +586,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
 			goto out_nofds;
 
 		alloc_size = 6 * size;
-		bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
-		if (!bits && alloc_size > PAGE_SIZE)
-			bits = vmalloc(alloc_size);
-
+		bits = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!bits)
 			goto out_nofds;
 	}
diff --git a/fs/xattr.c b/fs/xattr.c
index 7e3317cf4045..967542e1521b 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -431,12 +431,9 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			return -E2BIG;
-		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 		if (copy_from_user(kvalue, value, size)) {
 			error = -EFAULT;
 			goto out;
@@ -528,12 +525,9 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			size = XATTR_SIZE_MAX;
-		kvalue = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kzmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 	}
 
 	error = vfs_getxattr(d, kname, kvalue, size);
@@ -611,12 +605,9 @@ listxattr(struct dentry *d, char __user *list, size_t size)
 	if (size) {
 		if (size > XATTR_LIST_MAX)
 			size = XATTR_LIST_MAX;
-		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
-		if (!klist) {
-			klist = vmalloc(size);
-			if (!klist)
-				return -ENOMEM;
-		}
+		klist = kvmalloc(size, GFP_KERNEL);
+		if (!klist)
+			return -ENOMEM;
 	}
 
 	error = vfs_listxattr(d, klist, size);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 34debc1a9641..4ca30a951bbc 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -320,14 +320,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		goto free_htab;
 
 	err = -ENOMEM;
-	htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket),
-				      GFP_USER | __GFP_NOWARN);
-
-	if (!htab->buckets) {
-		htab->buckets = vmalloc(htab->n_buckets * sizeof(struct bucket));
-		if (!htab->buckets)
-			goto free_htab;
-	}
+	htab->buckets = kvmalloc(htab->n_buckets * sizeof(struct bucket), GFP_USER);
+	if (!htab->buckets)
+		goto free_htab;
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_HEAD(&htab->buckets[i].head);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 25f572303801..45c17b5562b5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -957,10 +957,7 @@ EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
 {
-	struct page **p = kmalloc(n * sizeof(struct page *), GFP_KERNEL);
-	if (!p)
-		p = vmalloc(n * sizeof(struct page *));
-	return p;
+	return kvmalloc(n * sizeof(struct page *), GFP_KERNEL);
 }
 
 static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index db77dcb38afd..72ebec18629c 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -200,10 +200,7 @@ struct frame_vector *frame_vector_create(unsigned int nr_frames)
 	 * Avoid higher order allocations, use vmalloc instead. It should
 	 * be rare anyway.
 	 */
-	if (size <= PAGE_SIZE)
-		vec = kmalloc(size, GFP_KERNEL);
-	else
-		vec = vmalloc(size);
+	vec = kvmalloc(size, GFP_KERNEL);
 	if (!vec)
 		return NULL;
 	vec->nr_allocated = nr_frames;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ca97835bfec4..a46a9fd8b540 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -687,11 +687,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 		/* no more locks than number of hash buckets */
 		nblocks = min(nblocks, hashinfo->ehash_mask + 1);
 
-		hashinfo->ehash_locks =	kmalloc_array(nblocks, locksz,
-						      GFP_KERNEL | __GFP_NOWARN);
-		if (!hashinfo->ehash_locks)
-			hashinfo->ehash_locks = vmalloc(nblocks * locksz);
-
+		hashinfo->ehash_locks = kvmalloc(nblocks * locksz, GFP_KERNEL);
 		if (!hashinfo->ehash_locks)
 			return -ENOMEM;
 
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index d46f4d5b1c62..39b2166d3be8 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -1155,10 +1155,7 @@ static int __net_init tcp_net_metrics_init(struct net *net)
 	tcp_metrics_hash_log = order_base_2(slots);
 	size = sizeof(struct tcpm_hash_bucket) << tcp_metrics_hash_log;
 
-	tcp_metrics_hash = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!tcp_metrics_hash)
-		tcp_metrics_hash = vzalloc(size);
-
+	tcp_metrics_hash = kvzalloc(size, GFP_KERNEL);
 	if (!tcp_metrics_hash)
 		return -ENOMEM;
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 15fe97644ffe..a0c82ef74389 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1525,10 +1525,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 	unsigned index;
 
 	if (size) {
-		labels = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-		if (!labels)
-			labels = vzalloc(size);
-
+		labels = kvzalloc(size, GFP_KERNEL);
 		if (!labels)
 			goto nolabels;
 	}
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index a011322a027d..cdc55d5ee4ad 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,17 +712,11 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-	unsigned int *off;
-
-	off = kcalloc(size, sizeof(unsigned int), GFP_KERNEL | __GFP_NOWARN);
-
-	if (off)
-		return off;
-
 	if (size < (SIZE_MAX / sizeof(unsigned int)))
-		off = vmalloc(size * sizeof(unsigned int));
+		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
+
+	return NULL;
 
-	return off;
 }
 EXPORT_SYMBOL(xt_alloc_entry_offsets);
 
@@ -956,15 +950,9 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (!info) {
-		info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN |
-				     __GFP_NORETRY | __GFP_HIGHMEM,
-				 PAGE_KERNEL);
-		if (!info)
-			return NULL;
-	}
+	info = kvmalloc(sz, GFP_KERNEL);
+	if (!info)
+		return NULL;
 	memset(info, 0, sizeof(*info));
 	info->size = size;
 	return info;
@@ -1066,7 +1054,7 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 
 	size = sizeof(void **) * nr_cpu_ids;
 	if (size > PAGE_SIZE)
-		i->jumpstack = vzalloc(size);
+		i->jumpstack = kvzalloc(size, GFP_KERNEL);
 	else
 		i->jumpstack = kzalloc(size, GFP_KERNEL);
 	if (i->jumpstack == NULL)
@@ -1088,12 +1076,8 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	 */
 	size = sizeof(void *) * i->stacksize * 2u;
 	for_each_possible_cpu(cpu) {
-		if (size > PAGE_SIZE)
-			i->jumpstack[cpu] = vmalloc_node(size,
-				cpu_to_node(cpu));
-		else
-			i->jumpstack[cpu] = kmalloc_node(size,
-				GFP_KERNEL, cpu_to_node(cpu));
+		i->jumpstack[cpu] = kvmalloc_node(size, GFP_KERNEL,
+			cpu_to_node(cpu));
 		if (i->jumpstack[cpu] == NULL)
 			/*
 			 * Freeing will be done later on by the callers. The
diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index 1d89a4eaf841..d6aa8f63ed2e 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -388,10 +388,7 @@ static int recent_mt_check(const struct xt_mtchk_param *par,
 	}
 
 	sz = sizeof(*t) + sizeof(t->iphash[0]) * ip_list_hash_size;
-	if (sz <= PAGE_SIZE)
-		t = kzalloc(sz, GFP_KERNEL);
-	else
-		t = vzalloc(sz);
+	t = kvzalloc(sz, GFP_KERNEL);
 	if (t == NULL) {
 		ret = -ENOMEM;
 		goto out;
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 3b6d5bd69101..30d6a39fd2c8 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -431,10 +431,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 	if (mask != q->tab_mask) {
 		struct sk_buff **ntab;
 
-		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *),
-			       GFP_KERNEL | __GFP_NOWARN);
-		if (!ntab)
-			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		ntab = kvzalloc((mask + 1) * sizeof(struct sk_buff *), GFP_KERNEL);
 		if (!ntab)
 			return -ENOMEM;
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index a5ea0e9b6be4..c580f0d406c2 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -449,27 +449,13 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
 	return 0;
 }
 
-static void *fq_codel_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-	return ptr;
-}
-
-static void fq_codel_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void fq_codel_destroy(struct Qdisc *sch)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 
 	tcf_destroy_chain(&q->filter_list);
-	fq_codel_free(q->backlogs);
-	fq_codel_free(q->flows);
+	kvfree(q->backlogs);
+	kvfree(q->flows);
 }
 
 static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
@@ -497,13 +483,13 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
 	}
 
 	if (!q->flows) {
-		q->flows = fq_codel_zalloc(q->flows_cnt *
-					   sizeof(struct fq_codel_flow));
+		q->flows = kvzalloc(q->flows_cnt *
+					   sizeof(struct fq_codel_flow), GFP_KERNEL);
 		if (!q->flows)
 			return -ENOMEM;
-		q->backlogs = fq_codel_zalloc(q->flows_cnt * sizeof(u32));
+		q->backlogs = kvzalloc(q->flows_cnt * sizeof(u32), GFP_KERNEL);
 		if (!q->backlogs) {
-			fq_codel_free(q->flows);
+			kvfree(q->flows);
 			return -ENOMEM;
 		}
 		for (i = 0; i < q->flows_cnt; i++) {
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index e3d0458af17b..2454055c737e 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -467,29 +467,14 @@ static void hhf_reset(struct Qdisc *sch)
 		rtnl_kfree_skbs(skb, skb);
 }
 
-static void *hhf_zalloc(size_t sz)
-{
-	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vzalloc(sz);
-
-	return ptr;
-}
-
-static void hhf_free(void *addr)
-{
-	kvfree(addr);
-}
-
 static void hhf_destroy(struct Qdisc *sch)
 {
 	int i;
 	struct hhf_sched_data *q = qdisc_priv(sch);
 
 	for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-		hhf_free(q->hhf_arrays[i]);
-		hhf_free(q->hhf_valid_bits[i]);
+		kvfree(q->hhf_arrays[i]);
+		kvfree(q->hhf_valid_bits[i]);
 	}
 
 	for (i = 0; i < HH_FLOWS_CNT; i++) {
@@ -503,7 +488,7 @@ static void hhf_destroy(struct Qdisc *sch)
 			kfree(flow);
 		}
 	}
-	hhf_free(q->hh_flows);
+	kvfree(q->hh_flows);
 }
 
 static const struct nla_policy hhf_policy[TCA_HHF_MAX + 1] = {
@@ -609,8 +594,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 	if (!q->hh_flows) {
 		/* Initialize heavy-hitter flow table. */
-		q->hh_flows = hhf_zalloc(HH_FLOWS_CNT *
-					 sizeof(struct list_head));
+		q->hh_flows = kvzalloc(HH_FLOWS_CNT *
+					 sizeof(struct list_head), GFP_KERNEL);
 		if (!q->hh_flows)
 			return -ENOMEM;
 		for (i = 0; i < HH_FLOWS_CNT; i++)
@@ -624,8 +609,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_arrays[i] = hhf_zalloc(HHF_ARRAYS_LEN *
-						      sizeof(u32));
+			q->hhf_arrays[i] = kvzalloc(HHF_ARRAYS_LEN *
+						      sizeof(u32), GFP_KERNEL);
 			if (!q->hhf_arrays[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
@@ -635,8 +620,8 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
 
 		/* Initialize valid bits of heavy-hitter filter arrays. */
 		for (i = 0; i < HHF_ARRAYS_CNT; i++) {
-			q->hhf_valid_bits[i] = hhf_zalloc(HHF_ARRAYS_LEN /
-							  BITS_PER_BYTE);
+			q->hhf_valid_bits[i] = kvzalloc(HHF_ARRAYS_LEN /
+							  BITS_PER_BYTE, GFP_KERNEL);
 			if (!q->hhf_valid_bits[i]) {
 				hhf_destroy(sch);
 				return -ENOMEM;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bcfadfdea8e0..08a3d2af1792 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -692,15 +692,11 @@ static int get_dist_table(struct Qdisc *sch, const struct nlattr *attr)
 	spinlock_t *root_lock;
 	struct disttable *d;
 	int i;
-	size_t s;
 
 	if (n > NETEM_DIST_MAX)
 		return -EINVAL;
 
-	s = sizeof(struct disttable) + n * sizeof(s16);
-	d = kmalloc(s, GFP_KERNEL | __GFP_NOWARN);
-	if (!d)
-		d = vmalloc(s);
+	d = kvmalloc(sizeof(struct disttable) + n * sizeof(s16), GFP_KERNEL);
 	if (!d)
 		return -ENOMEM;
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 7f195ed4d568..5d70cd6a032d 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -684,11 +684,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
 static void *sfq_alloc(size_t sz)
 {
-	void *ptr = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vmalloc(sz);
-	return ptr;
+	return  kvmalloc(sz, GFP_KERNEL);
 }
 
 static void sfq_free(void *addr)
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 38c00e867bda..a5c21f05ece4 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -99,14 +99,9 @@ SYSCALL_DEFINE5(add_key, const char __user *, _type,
 
 	if (_payload) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error2;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error2;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error2;
 
 		ret = -EFAULT;
 		if (copy_from_user(payload, _payload, plen) != 0)
@@ -1064,14 +1059,9 @@ long keyctl_instantiate_key_common(key_serial_t id,
 
 	if (from) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error;
 
 		ret = -EFAULT;
 		if (!copy_from_iter_full(payload, plen, from))
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:29     ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Martin Schwidefsky, Heiko Carstens,
	Herbert Xu, Anton Vorontsov, Colin Cross, Kees Cook, Tony Luck,
	Rafael J. Wysocki, Ben Skeggs, Kent Overstreet, Santosh Raspatur,
	Hariprasad S, Tariq Toukan, Yishai Hadas, Dan Williams,
	Oleg Drokin, Andreas Dilger, Boris Ostrovsky, David Sterba, Yan,
	Zheng, Ilya Dryomov, Alexei Starovoitov, Eric Dumazet, netdev

Ilya has noticed that I've screwed up some k[zc]alloc conversions and
didn't use the kvzalloc. This is an updated patch with some acks
collected on the way
---

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 17:26     ` Kees Cook
  (?)
@ 2017-01-12 17:37       ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 17:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Network Development

On Thu 12-01-17 09:26:09, Kees Cook wrote:
> On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 4f74511015b8..e6bbb33d2956 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >                 return -EINVAL;
> >
> > -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> > -                            GFP_KERNEL | __GFP_NOWARN);
> > -       if (!keys)
> > -               keys = vmalloc(sizeof(uint8_t) * args->count);
> > +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
> 
> Before doing this conversion, can we add a kvmalloc_array() API? This
> conversion could allow for the reintroduction of integer overflow
> flaws. (This particular situation isn't at risk since ->count is
> checked, but I'd prefer we not create a risky set of examples for
> using kvmalloc.)

Well, I am not opposed to kvmalloc_array but I would argue that this
conversion cannot introduce new overflow issues. The code would have
to be broken already because even though kmalloc_array checks for the
overflow but vmalloc fallback doesn't...

If there is a general interest for this API I can add it.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:37       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 17:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas

On Thu 12-01-17 09:26:09, Kees Cook wrote:
> On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 4f74511015b8..e6bbb33d2956 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >                 return -EINVAL;
> >
> > -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> > -                            GFP_KERNEL | __GFP_NOWARN);
> > -       if (!keys)
> > -               keys = vmalloc(sizeof(uint8_t) * args->count);
> > +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
> 
> Before doing this conversion, can we add a kvmalloc_array() API? This
> conversion could allow for the reintroduction of integer overflow
> flaws. (This particular situation isn't at risk since ->count is
> checked, but I'd prefer we not create a risky set of examples for
> using kvmalloc.)

Well, I am not opposed to kvmalloc_array but I would argue that this
conversion cannot introduce new overflow issues. The code would have
to be broken already because even though kmalloc_array checks for the
overflow but vmalloc fallback doesn't...

If there is a general interest for this API I can add it.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 17:37       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-12 17:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Network Development

On Thu 12-01-17 09:26:09, Kees Cook wrote:
> On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 4f74511015b8..e6bbb33d2956 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >                 return -EINVAL;
> >
> > -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> > -                            GFP_KERNEL | __GFP_NOWARN);
> > -       if (!keys)
> > -               keys = vmalloc(sizeof(uint8_t) * args->count);
> > +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
> 
> Before doing this conversion, can we add a kvmalloc_array() API? This
> conversion could allow for the reintroduction of integer overflow
> flaws. (This particular situation isn't at risk since ->count is
> checked, but I'd prefer we not create a risky set of examples for
> using kvmalloc.)

Well, I am not opposed to kvmalloc_array but I would argue that this
conversion cannot introduce new overflow issues. The code would have
to be broken already because even though kmalloc_array checks for the
overflow but vmalloc fallback doesn't...

If there is a general interest for this API I can add it.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
  (?)
@ 2017-01-12 20:14     ` Boris Ostrovsky
  -1 siblings, 0 replies; 180+ messages in thread
From: Boris Ostrovsky @ 2017-01-12 20:14 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, netdev


> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 6890897a6f30..10f1ef582659 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -87,18 +87,6 @@ struct user_evtchn {
>  	bool enabled;
>  };
>  
> -static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
> -{
> -	evtchn_port_t *ring;
> -	size_t s = size * sizeof(*ring);
> -
> -	ring = kmalloc(s, GFP_KERNEL);
> -	if (!ring)
> -		ring = vmalloc(s);
> -
> -	return ring;
> -}
> -
>  static void evtchn_free_ring(evtchn_port_t *ring)
>  {
>  	kvfree(ring);
> @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
>  	else
>  		new_size = 2 * u->ring_size;
>  
> -	new_ring = evtchn_alloc_ring(new_size);
> +	new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
>  	if (!new_ring)
>  		return -ENOMEM;
>  

Xen bits:

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 20:14     ` Boris Ostrovsky
  0 siblings, 0 replies; 180+ messages in thread
From: Boris Ostrovsky @ 2017-01-12 20:14 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai


> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 6890897a6f30..10f1ef582659 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -87,18 +87,6 @@ struct user_evtchn {
>  	bool enabled;
>  };
>  
> -static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
> -{
> -	evtchn_port_t *ring;
> -	size_t s = size * sizeof(*ring);
> -
> -	ring = kmalloc(s, GFP_KERNEL);
> -	if (!ring)
> -		ring = vmalloc(s);
> -
> -	return ring;
> -}
> -
>  static void evtchn_free_ring(evtchn_port_t *ring)
>  {
>  	kvfree(ring);
> @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
>  	else
>  		new_size = 2 * u->ring_size;
>  
> -	new_ring = evtchn_alloc_ring(new_size);
> +	new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
>  	if (!new_ring)
>  		return -ENOMEM;
>  

Xen bits:

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-12 20:14     ` Boris Ostrovsky
  0 siblings, 0 replies; 180+ messages in thread
From: Boris Ostrovsky @ 2017-01-12 20:14 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, netdev


> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 6890897a6f30..10f1ef582659 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -87,18 +87,6 @@ struct user_evtchn {
>  	bool enabled;
>  };
>  
> -static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
> -{
> -	evtchn_port_t *ring;
> -	size_t s = size * sizeof(*ring);
> -
> -	ring = kmalloc(s, GFP_KERNEL);
> -	if (!ring)
> -		ring = vmalloc(s);
> -
> -	return ring;
> -}
> -
>  static void evtchn_free_ring(evtchn_port_t *ring)
>  {
>  	kvfree(ring);
> @@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
>  	else
>  		new_size = 2 * u->ring_size;
>  
> -	new_ring = evtchn_alloc_ring(new_size);
> +	new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
>  	if (!new_ring)
>  		return -ENOMEM;
>  

Xen bits:

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
  (?)
@ 2017-01-13  1:11     ` Dilger, Andreas
  -1 siblings, 0 replies; 180+ messages in thread
From: Dilger, Andreas @ 2017-01-13  1:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Luck, Tony, Rafael J. Wysocki,
	Ben Skeggs, Kent Overstreet, Santosh Raspatur, Hariprasad S,
	Tariq Toukan, Yishai Hadas, Williams, Dan J, Drokin, Oleg,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev


> On Jan 12, 2017, at 08:37, Michal Hocko <mhocko@kernel.org> wrote:
> 
> From: Michal Hocko <mhocko@suse.com>
> 
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
> 
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Lustre part can be
Acked-by: Andreas Dilger <andreas.dilger@intel.com>

[snip]

> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> index a6a76a681ea9..8f638267e704 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
> void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
> 			  gfp_t flags)
> {
> -	void *ret;
> -
> -	ret = kzalloc_node(size, flags | __GFP_NOWARN,
> -			   cfs_cpt_spread_node(cptab, cpt));
> -	if (!ret) {
> -		WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
> -		ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
> -	}
> -
> -	return ret;
> +	return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
> }
> EXPORT_SYMBOL(libcfs_kvzalloc_cpt);

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-13  1:11     ` Dilger, Andreas
  0 siblings, 0 replies; 180+ messages in thread
From: Dilger, Andreas @ 2017-01-13  1:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Luck, Tony, Rafael J. Wysocki,
	Ben Skeggs, Kent Overstreet, Santosh Raspatur, Hariprasad S,
	Tariq Toukan, Yishai Hadas,


> On Jan 12, 2017, at 08:37, Michal Hocko <mhocko@kernel.org> wrote:
> 
> From: Michal Hocko <mhocko@suse.com>
> 
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
> 
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Lustre part can be
Acked-by: Andreas Dilger <andreas.dilger@intel.com>

[snip]

> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> index a6a76a681ea9..8f638267e704 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
> void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
> 			  gfp_t flags)
> {
> -	void *ret;
> -
> -	ret = kzalloc_node(size, flags | __GFP_NOWARN,
> -			   cfs_cpt_spread_node(cptab, cpt));
> -	if (!ret) {
> -		WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
> -		ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
> -	}
> -
> -	return ret;
> +	return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
> }
> EXPORT_SYMBOL(libcfs_kvzalloc_cpt);



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-13  1:11     ` Dilger, Andreas
  0 siblings, 0 replies; 180+ messages in thread
From: Dilger, Andreas @ 2017-01-13  1:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Luck, Tony, Rafael J. Wysocki,
	Ben Skeggs, Kent Overstreet, Santosh Raspatur, Hariprasad S,
	Tariq Toukan, Yishai Hadas, Williams, Dan J, Drokin, Oleg,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev


> On Jan 12, 2017, at 08:37, Michal Hocko <mhocko@kernel.org> wrote:
> 
> From: Michal Hocko <mhocko@suse.com>
> 
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
> 
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Lustre part can be
Acked-by: Andreas Dilger <andreas.dilger@intel.com>

[snip]

> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> index a6a76a681ea9..8f638267e704 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-mem.c
> @@ -45,15 +45,6 @@ EXPORT_SYMBOL(libcfs_kvzalloc);
> void *libcfs_kvzalloc_cpt(struct cfs_cpt_table *cptab, int cpt, size_t size,
> 			  gfp_t flags)
> {
> -	void *ret;
> -
> -	ret = kzalloc_node(size, flags | __GFP_NOWARN,
> -			   cfs_cpt_spread_node(cptab, cpt));
> -	if (!ret) {
> -		WARN_ON(!(flags & (__GFP_FS | __GFP_HIGH)));
> -		ret = vmalloc_node(size, cfs_cpt_spread_node(cptab, cpt));
> -	}
> -
> -	return ret;
> +	return kvzalloc_node(size, flags, cfs_cpt_spread_node(cptab, cpt));
> }
> EXPORT_SYMBOL(libcfs_kvzalloc_cpt);



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
  2017-01-12 15:37   ` Michal Hocko
@ 2017-01-14  2:42     ` Tetsuo Handa
  -1 siblings, 0 replies; 180+ messages in thread
From: Tetsuo Handa @ 2017-01-14  2:42 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Michael S. Tsirkin

On 2017/01/13 0:37, Michal Hocko wrote:
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc34653274a..105cd04c7414 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  	struct vhost_virtqueue **vqs;
>  	int i;
>  
> -	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!n) {
> -		n = vmalloc(sizeof *n);
> -		if (!n)
> -			return -ENOMEM;
> -	}
> +	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);

An opportunity to standardize as sizeof(*n) like other allocations.

> diff --git a/mm/util.c b/mm/util.c
> index 7e0c240b5760..9306244b9f41 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
>   * Uses kmalloc to get the memory but if the allocation fails then falls back
>   * to the vmalloc allocator. Use kvfree for freeing the memory.
>   *
> - * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> + * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
> + * is supported only for large (>64kB) allocations

Isn't this ">32kB" (i.e. __GFP_REPEAT is supported for 64kB allocation) ?

>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> @@ -350,8 +351,18 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	 * Make sure that larger requests are not too disruptive - no OOM
>  	 * killer and no allocation failure warnings as we have a fallback
>  	 */
> -	if (size > PAGE_SIZE)
> -		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +	if (size > PAGE_SIZE) {
> +		kmalloc_flags |= __GFP_NOWARN;
> +
> +		/*
> +		 * We have to override __GFP_REPEAT by __GFP_NORETRY for !costly
> +		 * requests because there is no other way to tell the allocator
> +		 * that we want to fail rather than retry endlessly.
> +		 */
> +		if (!(kmalloc_flags & __GFP_REPEAT) ||
> +				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> +			kmalloc_flags |= __GFP_NORETRY;
> +	}
>  
>  	ret = kmalloc_node(size, kmalloc_flags, node);
>  
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
@ 2017-01-14  2:42     ` Tetsuo Handa
  0 siblings, 0 replies; 180+ messages in thread
From: Tetsuo Handa @ 2017-01-14  2:42 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Michael S. Tsirkin

On 2017/01/13 0:37, Michal Hocko wrote:
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc34653274a..105cd04c7414 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  	struct vhost_virtqueue **vqs;
>  	int i;
>  
> -	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!n) {
> -		n = vmalloc(sizeof *n);
> -		if (!n)
> -			return -ENOMEM;
> -	}
> +	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);

An opportunity to standardize as sizeof(*n) like other allocations.

> diff --git a/mm/util.c b/mm/util.c
> index 7e0c240b5760..9306244b9f41 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
>   * Uses kmalloc to get the memory but if the allocation fails then falls back
>   * to the vmalloc allocator. Use kvfree for freeing the memory.
>   *
> - * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> + * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
> + * is supported only for large (>64kB) allocations

Isn't this ">32kB" (i.e. __GFP_REPEAT is supported for 64kB allocation) ?

>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> @@ -350,8 +351,18 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	 * Make sure that larger requests are not too disruptive - no OOM
>  	 * killer and no allocation failure warnings as we have a fallback
>  	 */
> -	if (size > PAGE_SIZE)
> -		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +	if (size > PAGE_SIZE) {
> +		kmalloc_flags |= __GFP_NOWARN;
> +
> +		/*
> +		 * We have to override __GFP_REPEAT by __GFP_NORETRY for !costly
> +		 * requests because there is no other way to tell the allocator
> +		 * that we want to fail rather than retry endlessly.
> +		 */
> +		if (!(kmalloc_flags & __GFP_REPEAT) ||
> +				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> +			kmalloc_flags |= __GFP_NORETRY;
> +	}
>  
>  	ret = kmalloc_node(size, kmalloc_flags, node);
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 17:29     ` Michal Hocko
@ 2017-01-14  3:01       ` Tetsuo Handa
  -1 siblings, 0 replies; 180+ messages in thread
From: Tetsuo Handa @ 2017-01-14  3:01 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton; +Cc: linux-mm, LKML

On 2017/01/13 2:29, Michal Hocko wrote:
> Ilya has noticed that I've screwed up some k[zc]alloc conversions and
> didn't use the kvzalloc. This is an updated patch with some acks
> collected on the way
> ---
> From a7b89c6d0a3c685045e37740c8f97b065f37e0a4 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 4 Jan 2017 13:30:32 +0100
> Subject: [PATCH] treewide: use kv[mz]alloc* rather than opencoded variants
> 
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to

Isn't this "requests <= 32kB" because allocation requests for 33kB will be
rounded up to 64kB?

Same for "smaller than 64kB" in PATCH 6/6. But strictly speaking, isn't
it bogus to refer actual size because PAGE_SIZE is not always 4096?

---------- arch/ia64/include/asm/page.h ----------
/*
 * PAGE_SHIFT determines the actual kernel page size.
 */
#if defined(CONFIG_IA64_PAGE_SIZE_4KB)
# define PAGE_SHIFT     12
#elif defined(CONFIG_IA64_PAGE_SIZE_8KB)
# define PAGE_SHIFT     13
#elif defined(CONFIG_IA64_PAGE_SIZE_16KB)
# define PAGE_SHIFT     14
#elif defined(CONFIG_IA64_PAGE_SIZE_64KB)
# define PAGE_SHIFT     16
#else
# error Unsupported page size!
#endif

#define PAGE_SIZE               (__IA64_UL_CONST(1) << PAGE_SHIFT)


> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
> 
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-14  3:01       ` Tetsuo Handa
  0 siblings, 0 replies; 180+ messages in thread
From: Tetsuo Handa @ 2017-01-14  3:01 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton; +Cc: linux-mm, LKML

On 2017/01/13 2:29, Michal Hocko wrote:
> Ilya has noticed that I've screwed up some k[zc]alloc conversions and
> didn't use the kvzalloc. This is an updated patch with some acks
> collected on the way
> ---
> From a7b89c6d0a3c685045e37740c8f97b065f37e0a4 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 4 Jan 2017 13:30:32 +0100
> Subject: [PATCH] treewide: use kv[mz]alloc* rather than opencoded variants
> 
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to

Isn't this "requests <= 32kB" because allocation requests for 33kB will be
rounded up to 64kB?

Same for "smaller than 64kB" in PATCH 6/6. But strictly speaking, isn't
it bogus to refer actual size because PAGE_SIZE is not always 4096?

---------- arch/ia64/include/asm/page.h ----------
/*
 * PAGE_SHIFT determines the actual kernel page size.
 */
#if defined(CONFIG_IA64_PAGE_SIZE_4KB)
# define PAGE_SHIFT     12
#elif defined(CONFIG_IA64_PAGE_SIZE_8KB)
# define PAGE_SHIFT     13
#elif defined(CONFIG_IA64_PAGE_SIZE_16KB)
# define PAGE_SHIFT     14
#elif defined(CONFIG_IA64_PAGE_SIZE_64KB)
# define PAGE_SHIFT     16
#else
# error Unsupported page size!
#endif

#define PAGE_SIZE               (__IA64_UL_CONST(1) << PAGE_SHIFT)


> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
> 
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
  2017-01-14  2:42     ` Tetsuo Handa
@ 2017-01-14  8:45       ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-14  8:45 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michael S. Tsirkin

On Sat 14-01-17 11:42:09, Tetsuo Handa wrote:
> On 2017/01/13 0:37, Michal Hocko wrote:
[...]
> > diff --git a/mm/util.c b/mm/util.c
> > index 7e0c240b5760..9306244b9f41 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
> >   * Uses kmalloc to get the memory but if the allocation fails then falls back
> >   * to the vmalloc allocator. Use kvfree for freeing the memory.
> >   *
> > - * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> > + * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
> > + * is supported only for large (>64kB) allocations
> 
> Isn't this ">32kB" (i.e. __GFP_REPEAT is supported for 64kB allocation) ?

True, I will update the patch to use >32kB

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
@ 2017-01-14  8:45       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-14  8:45 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michael S. Tsirkin

On Sat 14-01-17 11:42:09, Tetsuo Handa wrote:
> On 2017/01/13 0:37, Michal Hocko wrote:
[...]
> > diff --git a/mm/util.c b/mm/util.c
> > index 7e0c240b5760..9306244b9f41 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
> >   * Uses kmalloc to get the memory but if the allocation fails then falls back
> >   * to the vmalloc allocator. Use kvfree for freeing the memory.
> >   *
> > - * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> > + * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
> > + * is supported only for large (>64kB) allocations
> 
> Isn't this ">32kB" (i.e. __GFP_REPEAT is supported for 64kB allocation) ?

True, I will update the patch to use >32kB

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-14  3:01       ` Tetsuo Handa
@ 2017-01-14  8:49         ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-14  8:49 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, linux-mm, LKML

On Sat 14-01-17 12:01:50, Tetsuo Handa wrote:
> On 2017/01/13 2:29, Michal Hocko wrote:
> > Ilya has noticed that I've screwed up some k[zc]alloc conversions and
> > didn't use the kvzalloc. This is an updated patch with some acks
> > collected on the way
> > ---
> > From a7b89c6d0a3c685045e37740c8f97b065f37e0a4 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Wed, 4 Jan 2017 13:30:32 +0100
> > Subject: [PATCH] treewide: use kv[mz]alloc* rather than opencoded variants
> > 
> > There are many code paths opencoding kvmalloc. Let's use the helper
> > instead. The main difference to kvmalloc is that those users are usually
> > not considering all the aspects of the memory allocator. E.g. allocation
> > requests < 64kB are basically never failing and invoke OOM killer to
> 
> Isn't this "requests <= 32kB" because allocation requests for 33kB will be
> rounded up to 64kB?

Yes

> Same for "smaller than 64kB" in PATCH 6/6. But strictly speaking, isn't
> it bogus to refer actual size because PAGE_SIZE is not always 4096?

This is just an example and I didn't want to pull
PAGE_ALLOC_COSTLY_ORDER here. So I've instead fixed the wording to:
"
E.g. allocation requests <= 32kB (with 4kB pages) are basically never
failing and invoke OOM killer to satisfy the allocation.
"
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-14  8:49         ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-14  8:49 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, linux-mm, LKML

On Sat 14-01-17 12:01:50, Tetsuo Handa wrote:
> On 2017/01/13 2:29, Michal Hocko wrote:
> > Ilya has noticed that I've screwed up some k[zc]alloc conversions and
> > didn't use the kvzalloc. This is an updated patch with some acks
> > collected on the way
> > ---
> > From a7b89c6d0a3c685045e37740c8f97b065f37e0a4 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Wed, 4 Jan 2017 13:30:32 +0100
> > Subject: [PATCH] treewide: use kv[mz]alloc* rather than opencoded variants
> > 
> > There are many code paths opencoding kvmalloc. Let's use the helper
> > instead. The main difference to kvmalloc is that those users are usually
> > not considering all the aspects of the memory allocator. E.g. allocation
> > requests < 64kB are basically never failing and invoke OOM killer to
> 
> Isn't this "requests <= 32kB" because allocation requests for 33kB will be
> rounded up to 64kB?

Yes

> Same for "smaller than 64kB" in PATCH 6/6. But strictly speaking, isn't
> it bogus to refer actual size because PAGE_SIZE is not always 4096?

This is just an example and I didn't want to pull
PAGE_ALLOC_COSTLY_ORDER here. So I've instead fixed the wording to:
"
E.g. allocation requests <= 32kB (with 4kB pages) are basically never
failing and invoke OOM killer to satisfy the allocation.
"
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
@ 2017-01-14 10:56     ` Leon Romanovsky
  -1 siblings, 0 replies; 180+ messages in thread
From: Leon Romanovsky @ 2017-01-14 10:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev

[-- Attachment #1: Type: text/plain, Size: 4757 bytes --]

On Thu, Jan 12, 2017 at 04:37:16PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
>  crypto/lzo.c                                       |  4 +--
>  drivers/acpi/apei/erst.c                           |  8 ++---
>  drivers/char/agp/generic.c                         |  8 +----
>  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
>  drivers/md/bcache/util.h                           | 12 ++------
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
>  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
>  drivers/nvdimm/dimm_devs.c                         |  5 +---
>  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
>  drivers/xen/evtchn.c                               | 14 +--------
>  fs/btrfs/ctree.c                                   |  9 ++----
>  fs/btrfs/ioctl.c                                   |  9 ++----
>  fs/btrfs/send.c                                    | 27 ++++++-----------
>  fs/ceph/file.c                                     |  9 ++----
>  fs/select.c                                        |  5 +---
>  fs/xattr.c                                         | 27 ++++++-----------
>  kernel/bpf/hashtab.c                               | 11 ++-----
>  lib/iov_iter.c                                     |  5 +---
>  mm/frame_vector.c                                  |  5 +---
>  net/ipv4/inet_hashtables.c                         |  6 +---
>  net/ipv4/tcp_metrics.c                             |  5 +---
>  net/mpls/af_mpls.c                                 |  5 +---
>  net/netfilter/x_tables.c                           | 34 ++++++----------------
>  net/netfilter/xt_recent.c                          |  5 +---
>  net/sched/sch_choke.c                              |  5 +---
>  net/sched/sch_fq_codel.c                           | 26 ++++-------------
>  net/sched/sch_hhf.c                                | 33 ++++++---------------
>  net/sched/sch_netem.c                              |  6 +---
>  net/sched/sch_sfq.c                                |  6 +---
>  security/keys/keyctl.c                             | 22 ++++----------
>  35 files changed, 96 insertions(+), 319 deletions(-)

Hi Michal,

I don't see mlx5_vzalloc in the changed list. Any reason why did you skip it?

 881 static inline void *mlx5_vzalloc(unsigned long size)
 882 {
 883         void *rtn;
 884
 885         rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
 886         if (!rtn)
 887                 rtn = vzalloc(size);
 888         return rtn;
 889 }

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-14 10:56     ` Leon Romanovsky
  0 siblings, 0 replies; 180+ messages in thread
From: Leon Romanovsky @ 2017-01-14 10:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams

[-- Attachment #1: Type: text/plain, Size: 4757 bytes --]

On Thu, Jan 12, 2017 at 04:37:16PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/s390/kvm/kvm-s390.c                           | 10 ++-----
>  crypto/lzo.c                                       |  4 +--
>  drivers/acpi/apei/erst.c                           |  8 ++---
>  drivers/char/agp/generic.c                         |  8 +----
>  drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +--
>  drivers/md/bcache/util.h                           | 12 ++------
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 25 ++--------------
>  drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  2 +-
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++++----------------
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++----
>  drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++----
>  drivers/nvdimm/dimm_devs.c                         |  5 +---
>  .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +------
>  drivers/xen/evtchn.c                               | 14 +--------
>  fs/btrfs/ctree.c                                   |  9 ++----
>  fs/btrfs/ioctl.c                                   |  9 ++----
>  fs/btrfs/send.c                                    | 27 ++++++-----------
>  fs/ceph/file.c                                     |  9 ++----
>  fs/select.c                                        |  5 +---
>  fs/xattr.c                                         | 27 ++++++-----------
>  kernel/bpf/hashtab.c                               | 11 ++-----
>  lib/iov_iter.c                                     |  5 +---
>  mm/frame_vector.c                                  |  5 +---
>  net/ipv4/inet_hashtables.c                         |  6 +---
>  net/ipv4/tcp_metrics.c                             |  5 +---
>  net/mpls/af_mpls.c                                 |  5 +---
>  net/netfilter/x_tables.c                           | 34 ++++++----------------
>  net/netfilter/xt_recent.c                          |  5 +---
>  net/sched/sch_choke.c                              |  5 +---
>  net/sched/sch_fq_codel.c                           | 26 ++++-------------
>  net/sched/sch_hhf.c                                | 33 ++++++---------------
>  net/sched/sch_netem.c                              |  6 +---
>  net/sched/sch_sfq.c                                |  6 +---
>  security/keys/keyctl.c                             | 22 ++++----------
>  35 files changed, 96 insertions(+), 319 deletions(-)

Hi Michal,

I don't see mlx5_vzalloc in the changed list. Any reason why did you skip it?

 881 static inline void *mlx5_vzalloc(unsigned long size)
 882 {
 883         void *rtn;
 884
 885         rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
 886         if (!rtn)
 887                 rtn = vzalloc(size);
 888         return rtn;
 889 }

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-12 15:37   ` Michal Hocko
@ 2017-01-16  4:34     ` John Hubbard
  -1 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-16  4:34 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/12/2017 07:37 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> Using kmalloc with the vmalloc fallback for larger allocations is a
> common pattern in the kernel code. Yet we do not have any common helper
> for that and so users have invented their own helpers. Some of them are
> really creative when doing so. Let's just add kv[mz]alloc and make sure
> it is implemented properly. This implementation makes sure to not make
> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> to not warn about allocation failures. This also rules out the OOM
> killer as the vmalloc is a more approapriate fallback than a disruptive
> user visible action.
>
> This patch also changes some existing users and removes helpers which
> are specific for them. In some cases this is not possible (e.g.
> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> in general (note that the page table allocation is GFP_KERNEL). Those
> need to be fixed separately.
>
> apparmor has already claimed kv[mz]alloc so remove those and use
> __aa_kvmalloc instead to prevent from the naming clashes.
>
> Changes since v3
> - add ipc_alloc
>
> Changes since v2
> - s@WARN_ON@WARN_ON_ONCE@ as per Vlastimil
> - do not fallback to vmalloc for size = PAGE_SIZE as per Vlastimil
>
> Changes since v1
> - define __vmalloc_node_flags for CONFIG_MMU=n
>
> Cc: Anatoly Stepanov <astepanov@cloudlinux.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Reviewed-by: Andreas Dilger <adilger@dilger.ca> # ext4 part
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/x86/kvm/lapic.c                 |  4 ++--
>  arch/x86/kvm/page_track.c            |  4 ++--
>  arch/x86/kvm/x86.c                   |  4 ++--
>  drivers/md/dm-stats.c                |  7 +-----
>  fs/ext4/mballoc.c                    |  2 +-
>  fs/ext4/super.c                      |  4 ++--
>  fs/f2fs/f2fs.h                       | 20 -----------------
>  fs/f2fs/file.c                       |  4 ++--
>  fs/f2fs/segment.c                    | 14 ++++++------
>  fs/seq_file.c                        | 16 +-------------
>  include/linux/kvm_host.h             |  2 --
>  include/linux/mm.h                   | 14 ++++++++++++
>  include/linux/vmalloc.h              |  1 +
>  ipc/util.c                           |  7 +-----
>  mm/nommu.c                           |  5 +++++
>  mm/util.c                            | 42 ++++++++++++++++++++++++++++++++++++
>  mm/vmalloc.c                         |  2 +-
>  security/apparmor/apparmorfs.c       |  2 +-
>  security/apparmor/include/apparmor.h | 10 ---------
>  security/apparmor/match.c            |  2 +-
>  virt/kvm/kvm_main.c                  | 18 +++-------------
>  21 files changed, 89 insertions(+), 95 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5fe290c1b7d8..daf114c3b8ad 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -167,8 +167,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>  		if (kvm_apic_present(vcpu))
>  			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
>
> -	new = kvm_kvzalloc(sizeof(struct kvm_apic_map) +
> -	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1));
> +	new = kvzalloc(sizeof(struct kvm_apic_map) +
> +	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1), GFP_KERNEL);
>
>  	if (!new)
>  		goto out;
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> index 4a1c13eaa518..d46663e655b0 100644
> --- a/arch/x86/kvm/page_track.c
> +++ b/arch/x86/kvm/page_track.c
> @@ -38,8 +38,8 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
>  	int  i;
>
>  	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
> -		slot->arch.gfn_track[i] = kvm_kvzalloc(npages *
> -					    sizeof(*slot->arch.gfn_track[i]));
> +		slot->arch.gfn_track[i] = kvzalloc(npages *
> +					    sizeof(*slot->arch.gfn_track[i]), GFP_KERNEL);
>  		if (!slot->arch.gfn_track[i])
>  			goto track_free;
>  	}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 51ccfe08e32f..ba55bc338f25 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8082,13 +8082,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
>  				      slot->base_gfn, level) + 1;
>
>  		slot->arch.rmap[i] =
> -			kvm_kvzalloc(lpages * sizeof(*slot->arch.rmap[i]));
> +			kvzalloc(lpages * sizeof(*slot->arch.rmap[i]), GFP_KERNEL);
>  		if (!slot->arch.rmap[i])
>  			goto out_free;
>  		if (i == 0)
>  			continue;
>
> -		linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
> +		linfo = kvzalloc(lpages * sizeof(*linfo), GFP_KERNEL);
>  		if (!linfo)
>  			goto out_free;
>
> diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
> index 38b05f23b96c..674f9a1686f7 100644
> --- a/drivers/md/dm-stats.c
> +++ b/drivers/md/dm-stats.c
> @@ -146,12 +146,7 @@ static void *dm_kvzalloc(size_t alloc_size, int node)
>  	if (!claim_shared_memory(alloc_size))
>  		return NULL;
>
> -	if (alloc_size <= KMALLOC_MAX_SIZE) {
> -		p = kzalloc_node(alloc_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, node);
> -		if (p)
> -			return p;
> -	}
> -	p = vzalloc_node(alloc_size, node);
> +	p = kvzalloc_node(alloc_size, GFP_KERNEL | __GFP_NOMEMALLOC, node);
>  	if (p)
>  		return p;
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d9fd184b049e..31a761dd76f5 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2381,7 +2381,7 @@ int ext4_mb_alloc_groupinfo(struct super_block *sb, ext4_group_t ngroups)
>  		return 0;
>
>  	size = roundup_pow_of_two(sizeof(*sbi->s_group_info) * size);
> -	new_groupinfo = ext4_kvzalloc(size, GFP_KERNEL);
> +	new_groupinfo = kvzalloc(size, GFP_KERNEL);
>  	if (!new_groupinfo) {
>  		ext4_msg(sb, KERN_ERR, "can't allocate buddy meta group");
>  		return -ENOMEM;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 66845a08a87a..c65fe19a2a4f 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2116,7 +2116,7 @@ int ext4_alloc_flex_bg_array(struct super_block *sb, ext4_group_t ngroup)
>  		return 0;
>
>  	size = roundup_pow_of_two(size * sizeof(struct flex_groups));
> -	new_groups = ext4_kvzalloc(size, GFP_KERNEL);
> +	new_groups = kvzalloc(size, GFP_KERNEL);
>  	if (!new_groups) {
>  		ext4_msg(sb, KERN_ERR, "not enough memory for %d flex groups",
>  			 size / (int) sizeof(struct flex_groups));
> @@ -3850,7 +3850,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  			goto failed_mount;
>  		}
>  	}
> -	sbi->s_group_desc = ext4_kvmalloc(db_count *
> +	sbi->s_group_desc = kvmalloc(db_count *
>  					  sizeof(struct buffer_head *),
>  					  GFP_KERNEL);
>  	if (sbi->s_group_desc == NULL) {
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2da8c3aa0ce5..4130df0a8e64 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1929,26 +1929,6 @@ static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi,
>  	return kmalloc(size, flags);
>  }
>
> -static inline void *f2fs_kvmalloc(size_t size, gfp_t flags)
> -{
> -	void *ret;
> -
> -	ret = kmalloc(size, flags | __GFP_NOWARN);
> -	if (!ret)
> -		ret = __vmalloc(size, flags, PAGE_KERNEL);
> -	return ret;
> -}
> -
> -static inline void *f2fs_kvzalloc(size_t size, gfp_t flags)
> -{
> -	void *ret;
> -
> -	ret = kzalloc(size, flags | __GFP_NOWARN);
> -	if (!ret)
> -		ret = __vmalloc(size, flags | __GFP_ZERO, PAGE_KERNEL);
> -	return ret;
> -}
> -
>  #define get_inode_mode(i) \
>  	((is_inode_flag_set(i, FI_ACL_MODE)) ? \
>  	 (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 49f10dce817d..fb2e0c156135 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1013,11 +1013,11 @@ static int __exchange_data_block(struct inode *src_inode,
>  	while (len) {
>  		olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len);
>
> -		src_blkaddr = f2fs_kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
> +		src_blkaddr = kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
>  		if (!src_blkaddr)
>  			return -ENOMEM;
>
> -		do_replace = f2fs_kvzalloc(sizeof(int) * olen, GFP_KERNEL);
> +		do_replace = kvzalloc(sizeof(int) * olen, GFP_KERNEL);
>  		if (!do_replace) {
>  			kvfree(src_blkaddr);
>  			return -ENOMEM;
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 0738f48293cc..c50c883bfc1a 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2286,13 +2286,13 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
>
>  	SM_I(sbi)->sit_info = sit_i;
>
> -	sit_i->sentries = f2fs_kvzalloc(MAIN_SEGS(sbi) *
> +	sit_i->sentries = kvzalloc(MAIN_SEGS(sbi) *
>  					sizeof(struct seg_entry), GFP_KERNEL);
>  	if (!sit_i->sentries)
>  		return -ENOMEM;
>
>  	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
> -	sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
> +	sit_i->dirty_sentries_bitmap = kvzalloc(bitmap_size, GFP_KERNEL);
>  	if (!sit_i->dirty_sentries_bitmap)
>  		return -ENOMEM;
>
> @@ -2318,7 +2318,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
>  		return -ENOMEM;
>
>  	if (sbi->segs_per_sec > 1) {
> -		sit_i->sec_entries = f2fs_kvzalloc(MAIN_SECS(sbi) *
> +		sit_i->sec_entries = kvzalloc(MAIN_SECS(sbi) *
>  					sizeof(struct sec_entry), GFP_KERNEL);
>  		if (!sit_i->sec_entries)
>  			return -ENOMEM;
> @@ -2364,12 +2364,12 @@ static int build_free_segmap(struct f2fs_sb_info *sbi)
>  	SM_I(sbi)->free_info = free_i;
>
>  	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
> -	free_i->free_segmap = f2fs_kvmalloc(bitmap_size, GFP_KERNEL);
> +	free_i->free_segmap = kvmalloc(bitmap_size, GFP_KERNEL);
>  	if (!free_i->free_segmap)
>  		return -ENOMEM;
>
>  	sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
> -	free_i->free_secmap = f2fs_kvmalloc(sec_bitmap_size, GFP_KERNEL);
> +	free_i->free_secmap = kvmalloc(sec_bitmap_size, GFP_KERNEL);
>  	if (!free_i->free_secmap)
>  		return -ENOMEM;
>
> @@ -2537,7 +2537,7 @@ static int init_victim_secmap(struct f2fs_sb_info *sbi)
>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>  	unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
>
> -	dirty_i->victim_secmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
> +	dirty_i->victim_secmap = kvzalloc(bitmap_size, GFP_KERNEL);
>  	if (!dirty_i->victim_secmap)
>  		return -ENOMEM;
>  	return 0;
> @@ -2559,7 +2559,7 @@ static int build_dirty_segmap(struct f2fs_sb_info *sbi)
>  	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
>
>  	for (i = 0; i < NR_DIRTY_TYPE; i++) {
> -		dirty_i->dirty_segmap[i] = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
> +		dirty_i->dirty_segmap[i] = kvzalloc(bitmap_size, GFP_KERNEL);
>  		if (!dirty_i->dirty_segmap[i])
>  			return -ENOMEM;
>  	}
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index ca69fb99e41a..dc7c2be963ed 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -25,21 +25,7 @@ static void seq_set_overflow(struct seq_file *m)
>
>  static void *seq_buf_alloc(unsigned long size)
>  {
> -	void *buf;
> -	gfp_t gfp = GFP_KERNEL;
> -
> -	/*
> -	 * For high order allocations, use __GFP_NORETRY to avoid oom-killing -
> -	 * it's better to fall back to vmalloc() than to kill things.  For small
> -	 * allocations, just use GFP_KERNEL which will oom kill, thus no need
> -	 * for vmalloc fallback.
> -	 */
> -	if (size > PAGE_SIZE)
> -		gfp |= __GFP_NORETRY | __GFP_NOWARN;
> -	buf = kmalloc(size, gfp);
> -	if (!buf && size > PAGE_SIZE)
> -		buf = vmalloc(size);
> -	return buf;
> +	return kvmalloc(size, GFP_KERNEL);
>  }
>
>  /**
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 1c5190dab2c1..00e6f93d1ee0 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -768,8 +768,6 @@ void kvm_arch_check_processor_compat(void *rtn);
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
>
> -void *kvm_kvzalloc(unsigned long size);
> -
>  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
>  static inline struct kvm *kvm_arch_alloc_vm(void)
>  {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index fe6b4036664a..55fd570c3e1e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -484,6 +484,20 @@ static inline int is_vmalloc_or_module_addr(const void *x)
>  }
>  #endif
>
> +extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
> +static inline void *kvmalloc(size_t size, gfp_t flags)
> +{
> +	return kvmalloc_node(size, flags, NUMA_NO_NODE);
> +}
> +static inline void *kvzalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	return kvmalloc_node(size, flags | __GFP_ZERO, node);
> +}
> +static inline void *kvzalloc(size_t size, gfp_t flags)
> +{
> +	return kvmalloc(size, flags | __GFP_ZERO);
> +}
> +
>  extern void kvfree(const void *addr);
>
>  static inline atomic_t *compound_mapcount_ptr(struct page *page)
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index d68edffbf142..46991ad3ddd5 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -80,6 +80,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  			unsigned long start, unsigned long end, gfp_t gfp_mask,
>  			pgprot_t prot, unsigned long vm_flags, int node,
>  			const void *caller);
> +extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
>
>  extern void vfree(const void *addr);
>  extern void vfree_atomic(const void *addr);
> diff --git a/ipc/util.c b/ipc/util.c
> index 798cad18dd87..74c2adc62086 100644
> --- a/ipc/util.c
> +++ b/ipc/util.c
> @@ -403,12 +403,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
>   */
>  void *ipc_alloc(int size)
>  {
> -	void *out;
> -	if (size > PAGE_SIZE)
> -		out = vmalloc(size);
> -	else
> -		out = kmalloc(size, GFP_KERNEL);
> -	return out;
> +	return kvmalloc(size, GFP_KERNEL);
>  }
>
>  /**
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 24f9f5f39145..f1927890f75e 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -236,6 +236,11 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
>  }
>  EXPORT_SYMBOL(__vmalloc);
>
> +void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags)
> +{
> +	return __vmalloc(size, flags, PAGE_KERNEL);
> +}
> +
>  void *vmalloc_user(unsigned long size)
>  {
>  	void *ret;
> diff --git a/mm/util.c b/mm/util.c
> index 3cb2164f4099..7e0c240b5760 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
>  }
>  EXPORT_SYMBOL(vm_mmap);
>
> +/**
> + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback

Hi Michal,

How about this wording instead:

kvmalloc_node - attempt to allocate physically contiguous memory, but upon failure, fall back to 
non-contiguous (vmalloc) allocation.


> + * @size: size of the request.
> + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
> + * @node: numa node to allocate from
> + *
> + * Uses kmalloc to get the memory but if the allocation fails then falls back
> + * to the vmalloc allocator. Use kvfree for freeing the memory.
> + *
> + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported

Is that "Reclaim modifiers" line still true, or is it a leftover from an earlier approach? I am 
having trouble reconciling it with rest of the patchset, because:

a) the flags argument below is effectively passed on to either kmalloc_node (possibly adding, but 
not removing flags), or to __vmalloc_node_flags.

b) In patch 6/6, you are in fact passing in __GFP_REPEAT to the wrappers (kvzalloc, for example), 
and again, only adding, not removing flags.


> + */
> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	gfp_t kmalloc_flags = flags;
> +	void *ret;
> +
> +	/*
> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> +	 * so the given set of flags has to be compatible.
> +	 */
> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> +
> +	/*
> +	 * Make sure that larger requests are not too disruptive - no OOM
> +	 * killer and no allocation failure warnings as we have a fallback
> +	 */
> +	if (size > PAGE_SIZE)
> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +
> +	ret = kmalloc_node(size, kmalloc_flags, node);

Along those lines (dealing with larger requests), is there any value in picking some threshold 
value, and going straight to vmalloc if size is greater than that threshold?  It's less flexible and 
might even require occasional maintenance over the years, but it would save some time on *some* 
systems in some cases...OK, I think I just talked myself out of the whole idea. But I still want to 
put the question out there, because I think others may also ask it, and I'd like to hear a more 
experienced opinion.

(This patchset caught my eye because we have something just like it in an out-of-tree driver, so 
this would be nice.)

thanks,
john h


> +
> +	/*
> +	 * It doesn't really make sense to fallback to vmalloc for sub page
> +	 * requests
> +	 */
> +	if (ret || size <= PAGE_SIZE)
> +		return ret;
> +
> +	return __vmalloc_node_flags(size, node, flags);
> +}
> +EXPORT_SYMBOL(kvmalloc_node);
> +
>  void kvfree(const void *addr)
>  {
>  	if (is_vmalloc_addr(addr))
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 3ca82d44edd3..1039b1230889 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1757,7 +1757,7 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
>  }
>  EXPORT_SYMBOL(__vmalloc);
>
> -static inline void *__vmalloc_node_flags(unsigned long size,
> +void *__vmalloc_node_flags(unsigned long size,
>  					int node, gfp_t flags)
>  {
>  	return __vmalloc_node(size, 1, flags, PAGE_KERNEL,
> diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
> index 5923d5665209..83789a03379f 100644
> --- a/security/apparmor/apparmorfs.c
> +++ b/security/apparmor/apparmorfs.c
> @@ -100,7 +100,7 @@ static char *aa_simple_write_to_buffer(int op, const char __user *userbuf,
>  		return ERR_PTR(-EACCES);
>
>  	/* freed by caller to simple_write_to_buffer */
> -	data = kvmalloc(alloc_size);
> +	data = __aa_kvmalloc(alloc_size, 0);
>  	if (data == NULL)
>  		return ERR_PTR(-ENOMEM);
>
> diff --git a/security/apparmor/include/apparmor.h b/security/apparmor/include/apparmor.h
> index 5d721e990876..c88fb0ebc756 100644
> --- a/security/apparmor/include/apparmor.h
> +++ b/security/apparmor/include/apparmor.h
> @@ -68,16 +68,6 @@ char *aa_split_fqname(char *args, char **ns_name);
>  void aa_info_message(const char *str);
>  void *__aa_kvmalloc(size_t size, gfp_t flags);
>
> -static inline void *kvmalloc(size_t size)
> -{
> -	return __aa_kvmalloc(size, 0);
> -}
> -
> -static inline void *kvzalloc(size_t size)
> -{
> -	return __aa_kvmalloc(size, __GFP_ZERO);
> -}
> -
>  /* returns 0 if kref not incremented */
>  static inline int kref_get_not0(struct kref *kref)
>  {
> diff --git a/security/apparmor/match.c b/security/apparmor/match.c
> index 3f900fcca8fb..55f6ae0067a3 100644
> --- a/security/apparmor/match.c
> +++ b/security/apparmor/match.c
> @@ -61,7 +61,7 @@ static struct table_header *unpack_table(char *blob, size_t bsize)
>  	if (bsize < tsize)
>  		goto out;
>
> -	table = kvzalloc(tsize);
> +	table = __aa_kvmalloc(tsize, __GFP_ZERO);
>  	if (table) {
>  		table->td_id = th.td_id;
>  		table->td_flags = th.td_flags;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 482612b4e496..dbfe0e79232d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -502,7 +502,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
>  	int i;
>  	struct kvm_memslots *slots;
>
> -	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
> +	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
>  	if (!slots)
>  		return NULL;
>
> @@ -685,18 +685,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
>  	return ERR_PTR(r);
>  }
>
> -/*
> - * Avoid using vmalloc for a small buffer.
> - * Should not be used when the size is statically known.
> - */
> -void *kvm_kvzalloc(unsigned long size)
> -{
> -	if (size > PAGE_SIZE)
> -		return vzalloc(size);
> -	else
> -		return kzalloc(size, GFP_KERNEL);
> -}
> -
>  static void kvm_destroy_devices(struct kvm *kvm)
>  {
>  	struct kvm_device *dev, *tmp;
> @@ -775,7 +763,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
>  {
>  	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
>
> -	memslot->dirty_bitmap = kvm_kvzalloc(dirty_bytes);
> +	memslot->dirty_bitmap = kvzalloc(dirty_bytes, GFP_KERNEL);
>  	if (!memslot->dirty_bitmap)
>  		return -ENOMEM;
>
> @@ -995,7 +983,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  			goto out_free;
>  	}
>
> -	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
> +	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
>  	if (!slots)
>  		goto out_free;
>  	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
> --
> 2.11.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-16  4:34     ` John Hubbard
  0 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-16  4:34 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/12/2017 07:37 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> Using kmalloc with the vmalloc fallback for larger allocations is a
> common pattern in the kernel code. Yet we do not have any common helper
> for that and so users have invented their own helpers. Some of them are
> really creative when doing so. Let's just add kv[mz]alloc and make sure
> it is implemented properly. This implementation makes sure to not make
> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> to not warn about allocation failures. This also rules out the OOM
> killer as the vmalloc is a more approapriate fallback than a disruptive
> user visible action.
>
> This patch also changes some existing users and removes helpers which
> are specific for them. In some cases this is not possible (e.g.
> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> in general (note that the page table allocation is GFP_KERNEL). Those
> need to be fixed separately.
>
> apparmor has already claimed kv[mz]alloc so remove those and use
> __aa_kvmalloc instead to prevent from the naming clashes.
>
> Changes since v3
> - add ipc_alloc
>
> Changes since v2
> - s@WARN_ON@WARN_ON_ONCE@ as per Vlastimil
> - do not fallback to vmalloc for size = PAGE_SIZE as per Vlastimil
>
> Changes since v1
> - define __vmalloc_node_flags for CONFIG_MMU=n
>
> Cc: Anatoly Stepanov <astepanov@cloudlinux.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Reviewed-by: Andreas Dilger <adilger@dilger.ca> # ext4 part
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/x86/kvm/lapic.c                 |  4 ++--
>  arch/x86/kvm/page_track.c            |  4 ++--
>  arch/x86/kvm/x86.c                   |  4 ++--
>  drivers/md/dm-stats.c                |  7 +-----
>  fs/ext4/mballoc.c                    |  2 +-
>  fs/ext4/super.c                      |  4 ++--
>  fs/f2fs/f2fs.h                       | 20 -----------------
>  fs/f2fs/file.c                       |  4 ++--
>  fs/f2fs/segment.c                    | 14 ++++++------
>  fs/seq_file.c                        | 16 +-------------
>  include/linux/kvm_host.h             |  2 --
>  include/linux/mm.h                   | 14 ++++++++++++
>  include/linux/vmalloc.h              |  1 +
>  ipc/util.c                           |  7 +-----
>  mm/nommu.c                           |  5 +++++
>  mm/util.c                            | 42 ++++++++++++++++++++++++++++++++++++
>  mm/vmalloc.c                         |  2 +-
>  security/apparmor/apparmorfs.c       |  2 +-
>  security/apparmor/include/apparmor.h | 10 ---------
>  security/apparmor/match.c            |  2 +-
>  virt/kvm/kvm_main.c                  | 18 +++-------------
>  21 files changed, 89 insertions(+), 95 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5fe290c1b7d8..daf114c3b8ad 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -167,8 +167,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>  		if (kvm_apic_present(vcpu))
>  			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
>
> -	new = kvm_kvzalloc(sizeof(struct kvm_apic_map) +
> -	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1));
> +	new = kvzalloc(sizeof(struct kvm_apic_map) +
> +	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1), GFP_KERNEL);
>
>  	if (!new)
>  		goto out;
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> index 4a1c13eaa518..d46663e655b0 100644
> --- a/arch/x86/kvm/page_track.c
> +++ b/arch/x86/kvm/page_track.c
> @@ -38,8 +38,8 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
>  	int  i;
>
>  	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
> -		slot->arch.gfn_track[i] = kvm_kvzalloc(npages *
> -					    sizeof(*slot->arch.gfn_track[i]));
> +		slot->arch.gfn_track[i] = kvzalloc(npages *
> +					    sizeof(*slot->arch.gfn_track[i]), GFP_KERNEL);
>  		if (!slot->arch.gfn_track[i])
>  			goto track_free;
>  	}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 51ccfe08e32f..ba55bc338f25 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8082,13 +8082,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
>  				      slot->base_gfn, level) + 1;
>
>  		slot->arch.rmap[i] =
> -			kvm_kvzalloc(lpages * sizeof(*slot->arch.rmap[i]));
> +			kvzalloc(lpages * sizeof(*slot->arch.rmap[i]), GFP_KERNEL);
>  		if (!slot->arch.rmap[i])
>  			goto out_free;
>  		if (i == 0)
>  			continue;
>
> -		linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
> +		linfo = kvzalloc(lpages * sizeof(*linfo), GFP_KERNEL);
>  		if (!linfo)
>  			goto out_free;
>
> diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
> index 38b05f23b96c..674f9a1686f7 100644
> --- a/drivers/md/dm-stats.c
> +++ b/drivers/md/dm-stats.c
> @@ -146,12 +146,7 @@ static void *dm_kvzalloc(size_t alloc_size, int node)
>  	if (!claim_shared_memory(alloc_size))
>  		return NULL;
>
> -	if (alloc_size <= KMALLOC_MAX_SIZE) {
> -		p = kzalloc_node(alloc_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, node);
> -		if (p)
> -			return p;
> -	}
> -	p = vzalloc_node(alloc_size, node);
> +	p = kvzalloc_node(alloc_size, GFP_KERNEL | __GFP_NOMEMALLOC, node);
>  	if (p)
>  		return p;
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d9fd184b049e..31a761dd76f5 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2381,7 +2381,7 @@ int ext4_mb_alloc_groupinfo(struct super_block *sb, ext4_group_t ngroups)
>  		return 0;
>
>  	size = roundup_pow_of_two(sizeof(*sbi->s_group_info) * size);
> -	new_groupinfo = ext4_kvzalloc(size, GFP_KERNEL);
> +	new_groupinfo = kvzalloc(size, GFP_KERNEL);
>  	if (!new_groupinfo) {
>  		ext4_msg(sb, KERN_ERR, "can't allocate buddy meta group");
>  		return -ENOMEM;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 66845a08a87a..c65fe19a2a4f 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2116,7 +2116,7 @@ int ext4_alloc_flex_bg_array(struct super_block *sb, ext4_group_t ngroup)
>  		return 0;
>
>  	size = roundup_pow_of_two(size * sizeof(struct flex_groups));
> -	new_groups = ext4_kvzalloc(size, GFP_KERNEL);
> +	new_groups = kvzalloc(size, GFP_KERNEL);
>  	if (!new_groups) {
>  		ext4_msg(sb, KERN_ERR, "not enough memory for %d flex groups",
>  			 size / (int) sizeof(struct flex_groups));
> @@ -3850,7 +3850,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  			goto failed_mount;
>  		}
>  	}
> -	sbi->s_group_desc = ext4_kvmalloc(db_count *
> +	sbi->s_group_desc = kvmalloc(db_count *
>  					  sizeof(struct buffer_head *),
>  					  GFP_KERNEL);
>  	if (sbi->s_group_desc == NULL) {
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2da8c3aa0ce5..4130df0a8e64 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1929,26 +1929,6 @@ static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi,
>  	return kmalloc(size, flags);
>  }
>
> -static inline void *f2fs_kvmalloc(size_t size, gfp_t flags)
> -{
> -	void *ret;
> -
> -	ret = kmalloc(size, flags | __GFP_NOWARN);
> -	if (!ret)
> -		ret = __vmalloc(size, flags, PAGE_KERNEL);
> -	return ret;
> -}
> -
> -static inline void *f2fs_kvzalloc(size_t size, gfp_t flags)
> -{
> -	void *ret;
> -
> -	ret = kzalloc(size, flags | __GFP_NOWARN);
> -	if (!ret)
> -		ret = __vmalloc(size, flags | __GFP_ZERO, PAGE_KERNEL);
> -	return ret;
> -}
> -
>  #define get_inode_mode(i) \
>  	((is_inode_flag_set(i, FI_ACL_MODE)) ? \
>  	 (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 49f10dce817d..fb2e0c156135 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1013,11 +1013,11 @@ static int __exchange_data_block(struct inode *src_inode,
>  	while (len) {
>  		olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len);
>
> -		src_blkaddr = f2fs_kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
> +		src_blkaddr = kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
>  		if (!src_blkaddr)
>  			return -ENOMEM;
>
> -		do_replace = f2fs_kvzalloc(sizeof(int) * olen, GFP_KERNEL);
> +		do_replace = kvzalloc(sizeof(int) * olen, GFP_KERNEL);
>  		if (!do_replace) {
>  			kvfree(src_blkaddr);
>  			return -ENOMEM;
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 0738f48293cc..c50c883bfc1a 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2286,13 +2286,13 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
>
>  	SM_I(sbi)->sit_info = sit_i;
>
> -	sit_i->sentries = f2fs_kvzalloc(MAIN_SEGS(sbi) *
> +	sit_i->sentries = kvzalloc(MAIN_SEGS(sbi) *
>  					sizeof(struct seg_entry), GFP_KERNEL);
>  	if (!sit_i->sentries)
>  		return -ENOMEM;
>
>  	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
> -	sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
> +	sit_i->dirty_sentries_bitmap = kvzalloc(bitmap_size, GFP_KERNEL);
>  	if (!sit_i->dirty_sentries_bitmap)
>  		return -ENOMEM;
>
> @@ -2318,7 +2318,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
>  		return -ENOMEM;
>
>  	if (sbi->segs_per_sec > 1) {
> -		sit_i->sec_entries = f2fs_kvzalloc(MAIN_SECS(sbi) *
> +		sit_i->sec_entries = kvzalloc(MAIN_SECS(sbi) *
>  					sizeof(struct sec_entry), GFP_KERNEL);
>  		if (!sit_i->sec_entries)
>  			return -ENOMEM;
> @@ -2364,12 +2364,12 @@ static int build_free_segmap(struct f2fs_sb_info *sbi)
>  	SM_I(sbi)->free_info = free_i;
>
>  	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
> -	free_i->free_segmap = f2fs_kvmalloc(bitmap_size, GFP_KERNEL);
> +	free_i->free_segmap = kvmalloc(bitmap_size, GFP_KERNEL);
>  	if (!free_i->free_segmap)
>  		return -ENOMEM;
>
>  	sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
> -	free_i->free_secmap = f2fs_kvmalloc(sec_bitmap_size, GFP_KERNEL);
> +	free_i->free_secmap = kvmalloc(sec_bitmap_size, GFP_KERNEL);
>  	if (!free_i->free_secmap)
>  		return -ENOMEM;
>
> @@ -2537,7 +2537,7 @@ static int init_victim_secmap(struct f2fs_sb_info *sbi)
>  	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>  	unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
>
> -	dirty_i->victim_secmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
> +	dirty_i->victim_secmap = kvzalloc(bitmap_size, GFP_KERNEL);
>  	if (!dirty_i->victim_secmap)
>  		return -ENOMEM;
>  	return 0;
> @@ -2559,7 +2559,7 @@ static int build_dirty_segmap(struct f2fs_sb_info *sbi)
>  	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
>
>  	for (i = 0; i < NR_DIRTY_TYPE; i++) {
> -		dirty_i->dirty_segmap[i] = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
> +		dirty_i->dirty_segmap[i] = kvzalloc(bitmap_size, GFP_KERNEL);
>  		if (!dirty_i->dirty_segmap[i])
>  			return -ENOMEM;
>  	}
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index ca69fb99e41a..dc7c2be963ed 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -25,21 +25,7 @@ static void seq_set_overflow(struct seq_file *m)
>
>  static void *seq_buf_alloc(unsigned long size)
>  {
> -	void *buf;
> -	gfp_t gfp = GFP_KERNEL;
> -
> -	/*
> -	 * For high order allocations, use __GFP_NORETRY to avoid oom-killing -
> -	 * it's better to fall back to vmalloc() than to kill things.  For small
> -	 * allocations, just use GFP_KERNEL which will oom kill, thus no need
> -	 * for vmalloc fallback.
> -	 */
> -	if (size > PAGE_SIZE)
> -		gfp |= __GFP_NORETRY | __GFP_NOWARN;
> -	buf = kmalloc(size, gfp);
> -	if (!buf && size > PAGE_SIZE)
> -		buf = vmalloc(size);
> -	return buf;
> +	return kvmalloc(size, GFP_KERNEL);
>  }
>
>  /**
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 1c5190dab2c1..00e6f93d1ee0 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -768,8 +768,6 @@ void kvm_arch_check_processor_compat(void *rtn);
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
>
> -void *kvm_kvzalloc(unsigned long size);
> -
>  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
>  static inline struct kvm *kvm_arch_alloc_vm(void)
>  {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index fe6b4036664a..55fd570c3e1e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -484,6 +484,20 @@ static inline int is_vmalloc_or_module_addr(const void *x)
>  }
>  #endif
>
> +extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
> +static inline void *kvmalloc(size_t size, gfp_t flags)
> +{
> +	return kvmalloc_node(size, flags, NUMA_NO_NODE);
> +}
> +static inline void *kvzalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	return kvmalloc_node(size, flags | __GFP_ZERO, node);
> +}
> +static inline void *kvzalloc(size_t size, gfp_t flags)
> +{
> +	return kvmalloc(size, flags | __GFP_ZERO);
> +}
> +
>  extern void kvfree(const void *addr);
>
>  static inline atomic_t *compound_mapcount_ptr(struct page *page)
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index d68edffbf142..46991ad3ddd5 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -80,6 +80,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  			unsigned long start, unsigned long end, gfp_t gfp_mask,
>  			pgprot_t prot, unsigned long vm_flags, int node,
>  			const void *caller);
> +extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
>
>  extern void vfree(const void *addr);
>  extern void vfree_atomic(const void *addr);
> diff --git a/ipc/util.c b/ipc/util.c
> index 798cad18dd87..74c2adc62086 100644
> --- a/ipc/util.c
> +++ b/ipc/util.c
> @@ -403,12 +403,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
>   */
>  void *ipc_alloc(int size)
>  {
> -	void *out;
> -	if (size > PAGE_SIZE)
> -		out = vmalloc(size);
> -	else
> -		out = kmalloc(size, GFP_KERNEL);
> -	return out;
> +	return kvmalloc(size, GFP_KERNEL);
>  }
>
>  /**
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 24f9f5f39145..f1927890f75e 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -236,6 +236,11 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
>  }
>  EXPORT_SYMBOL(__vmalloc);
>
> +void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags)
> +{
> +	return __vmalloc(size, flags, PAGE_KERNEL);
> +}
> +
>  void *vmalloc_user(unsigned long size)
>  {
>  	void *ret;
> diff --git a/mm/util.c b/mm/util.c
> index 3cb2164f4099..7e0c240b5760 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
>  }
>  EXPORT_SYMBOL(vm_mmap);
>
> +/**
> + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback

Hi Michal,

How about this wording instead:

kvmalloc_node - attempt to allocate physically contiguous memory, but upon failure, fall back to 
non-contiguous (vmalloc) allocation.


> + * @size: size of the request.
> + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
> + * @node: numa node to allocate from
> + *
> + * Uses kmalloc to get the memory but if the allocation fails then falls back
> + * to the vmalloc allocator. Use kvfree for freeing the memory.
> + *
> + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported

Is that "Reclaim modifiers" line still true, or is it a leftover from an earlier approach? I am 
having trouble reconciling it with rest of the patchset, because:

a) the flags argument below is effectively passed on to either kmalloc_node (possibly adding, but 
not removing flags), or to __vmalloc_node_flags.

b) In patch 6/6, you are in fact passing in __GFP_REPEAT to the wrappers (kvzalloc, for example), 
and again, only adding, not removing flags.


> + */
> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	gfp_t kmalloc_flags = flags;
> +	void *ret;
> +
> +	/*
> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> +	 * so the given set of flags has to be compatible.
> +	 */
> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> +
> +	/*
> +	 * Make sure that larger requests are not too disruptive - no OOM
> +	 * killer and no allocation failure warnings as we have a fallback
> +	 */
> +	if (size > PAGE_SIZE)
> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +
> +	ret = kmalloc_node(size, kmalloc_flags, node);

Along those lines (dealing with larger requests), is there any value in picking some threshold 
value, and going straight to vmalloc if size is greater than that threshold?  It's less flexible and 
might even require occasional maintenance over the years, but it would save some time on *some* 
systems in some cases...OK, I think I just talked myself out of the whole idea. But I still want to 
put the question out there, because I think others may also ask it, and I'd like to hear a more 
experienced opinion.

(This patchset caught my eye because we have something just like it in an out-of-tree driver, so 
this would be nice.)

thanks,
john h


> +
> +	/*
> +	 * It doesn't really make sense to fallback to vmalloc for sub page
> +	 * requests
> +	 */
> +	if (ret || size <= PAGE_SIZE)
> +		return ret;
> +
> +	return __vmalloc_node_flags(size, node, flags);
> +}
> +EXPORT_SYMBOL(kvmalloc_node);
> +
>  void kvfree(const void *addr)
>  {
>  	if (is_vmalloc_addr(addr))
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 3ca82d44edd3..1039b1230889 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1757,7 +1757,7 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
>  }
>  EXPORT_SYMBOL(__vmalloc);
>
> -static inline void *__vmalloc_node_flags(unsigned long size,
> +void *__vmalloc_node_flags(unsigned long size,
>  					int node, gfp_t flags)
>  {
>  	return __vmalloc_node(size, 1, flags, PAGE_KERNEL,
> diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
> index 5923d5665209..83789a03379f 100644
> --- a/security/apparmor/apparmorfs.c
> +++ b/security/apparmor/apparmorfs.c
> @@ -100,7 +100,7 @@ static char *aa_simple_write_to_buffer(int op, const char __user *userbuf,
>  		return ERR_PTR(-EACCES);
>
>  	/* freed by caller to simple_write_to_buffer */
> -	data = kvmalloc(alloc_size);
> +	data = __aa_kvmalloc(alloc_size, 0);
>  	if (data == NULL)
>  		return ERR_PTR(-ENOMEM);
>
> diff --git a/security/apparmor/include/apparmor.h b/security/apparmor/include/apparmor.h
> index 5d721e990876..c88fb0ebc756 100644
> --- a/security/apparmor/include/apparmor.h
> +++ b/security/apparmor/include/apparmor.h
> @@ -68,16 +68,6 @@ char *aa_split_fqname(char *args, char **ns_name);
>  void aa_info_message(const char *str);
>  void *__aa_kvmalloc(size_t size, gfp_t flags);
>
> -static inline void *kvmalloc(size_t size)
> -{
> -	return __aa_kvmalloc(size, 0);
> -}
> -
> -static inline void *kvzalloc(size_t size)
> -{
> -	return __aa_kvmalloc(size, __GFP_ZERO);
> -}
> -
>  /* returns 0 if kref not incremented */
>  static inline int kref_get_not0(struct kref *kref)
>  {
> diff --git a/security/apparmor/match.c b/security/apparmor/match.c
> index 3f900fcca8fb..55f6ae0067a3 100644
> --- a/security/apparmor/match.c
> +++ b/security/apparmor/match.c
> @@ -61,7 +61,7 @@ static struct table_header *unpack_table(char *blob, size_t bsize)
>  	if (bsize < tsize)
>  		goto out;
>
> -	table = kvzalloc(tsize);
> +	table = __aa_kvmalloc(tsize, __GFP_ZERO);
>  	if (table) {
>  		table->td_id = th.td_id;
>  		table->td_flags = th.td_flags;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 482612b4e496..dbfe0e79232d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -502,7 +502,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
>  	int i;
>  	struct kvm_memslots *slots;
>
> -	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
> +	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
>  	if (!slots)
>  		return NULL;
>
> @@ -685,18 +685,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
>  	return ERR_PTR(r);
>  }
>
> -/*
> - * Avoid using vmalloc for a small buffer.
> - * Should not be used when the size is statically known.
> - */
> -void *kvm_kvzalloc(unsigned long size)
> -{
> -	if (size > PAGE_SIZE)
> -		return vzalloc(size);
> -	else
> -		return kzalloc(size, GFP_KERNEL);
> -}
> -
>  static void kvm_destroy_devices(struct kvm *kvm)
>  {
>  	struct kvm_device *dev, *tmp;
> @@ -775,7 +763,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
>  {
>  	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
>
> -	memslot->dirty_bitmap = kvm_kvzalloc(dirty_bytes);
> +	memslot->dirty_bitmap = kvzalloc(dirty_bytes, GFP_KERNEL);
>  	if (!memslot->dirty_bitmap)
>  		return -ENOMEM;
>
> @@ -995,7 +983,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  			goto out_free;
>  	}
>
> -	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
> +	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
>  	if (!slots)
>  		goto out_free;
>  	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
> --
> 2.11.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-14 10:56     ` Leon Romanovsky
  (?)
@ 2017-01-16  7:33       ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-16  7:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev

On Sat 14-01-17 12:56:32, Leon Romanovsky wrote:
[...]
> Hi Michal,
> 
> I don't see mlx5_vzalloc in the changed list. Any reason why did you skip it?
> 
>  881 static inline void *mlx5_vzalloc(unsigned long size)
>  882 {
>  883         void *rtn;
>  884
>  885         rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
>  886         if (!rtn)
>  887                 rtn = vzalloc(size);
>  888         return rtn;
>  889 }

No reason to skip it, I just didn't see it. I will fold the following in
if you are OK with it
---
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index cdd2bd62f86d..5e6063170e48 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -874,12 +874,7 @@ static inline u16 cmdif_rev(struct mlx5_core_dev *dev)
 
 static inline void *mlx5_vzalloc(unsigned long size)
 {
-	void *rtn;
-
-	rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!rtn)
-		rtn = vzalloc(size);
-	return rtn;
+	return kvzalloc(GFP_KERNEL, size);
 }
 
 static inline u32 mlx5_base_mkey(const u32 key)

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-16  7:33       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-16  7:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Ta

On Sat 14-01-17 12:56:32, Leon Romanovsky wrote:
[...]
> Hi Michal,
> 
> I don't see mlx5_vzalloc in the changed list. Any reason why did you skip it?
> 
>  881 static inline void *mlx5_vzalloc(unsigned long size)
>  882 {
>  883         void *rtn;
>  884
>  885         rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
>  886         if (!rtn)
>  887                 rtn = vzalloc(size);
>  888         return rtn;
>  889 }

No reason to skip it, I just didn't see it. I will fold the following in
if you are OK with it
---
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index cdd2bd62f86d..5e6063170e48 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -874,12 +874,7 @@ static inline u16 cmdif_rev(struct mlx5_core_dev *dev)
 
 static inline void *mlx5_vzalloc(unsigned long size)
 {
-	void *rtn;
-
-	rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!rtn)
-		rtn = vzalloc(size);
-	return rtn;
+	return kvzalloc(GFP_KERNEL, size);
 }
 
 static inline u32 mlx5_base_mkey(const u32 key)

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-16  7:33       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-16  7:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev

On Sat 14-01-17 12:56:32, Leon Romanovsky wrote:
[...]
> Hi Michal,
> 
> I don't see mlx5_vzalloc in the changed list. Any reason why did you skip it?
> 
>  881 static inline void *mlx5_vzalloc(unsigned long size)
>  882 {
>  883         void *rtn;
>  884
>  885         rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
>  886         if (!rtn)
>  887                 rtn = vzalloc(size);
>  888         return rtn;
>  889 }

No reason to skip it, I just didn't see it. I will fold the following in
if you are OK with it
---
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index cdd2bd62f86d..5e6063170e48 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -874,12 +874,7 @@ static inline u16 cmdif_rev(struct mlx5_core_dev *dev)
 
 static inline void *mlx5_vzalloc(unsigned long size)
 {
-	void *rtn;
-
-	rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!rtn)
-		rtn = vzalloc(size);
-	return rtn;
+	return kvzalloc(GFP_KERNEL, size);
 }
 
 static inline u32 mlx5_base_mkey(const u32 key)

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 15:37   ` Michal Hocko
  (?)
@ 2017-01-16  8:18     ` Tariq Toukan
  -1 siblings, 0 replies; 180+ messages in thread
From: Tariq Toukan @ 2017-01-16  8:18 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, netdev



On 12/01/2017 5:37 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
Acked-by: Tariq Toukan <tariqt@mellanox.com>
For the mlx4 parts.

Regards.
Tariq

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-16  8:18     ` Tariq Toukan
  0 siblings, 0 replies; 180+ messages in thread
From: Tariq Toukan @ 2017-01-16  8:18 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Yishai



On 12/01/2017 5:37 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
Acked-by: Tariq Toukan <tariqt@mellanox.com>
For the mlx4 parts.

Regards.
Tariq

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-16  8:18     ` Tariq Toukan
  0 siblings, 0 replies; 180+ messages in thread
From: Tariq Toukan @ 2017-01-16  8:18 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Michal Hocko, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, netdev



On 12/01/2017 5:37 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> There are many code paths opencoding kvmalloc. Let's use the helper
> instead. The main difference to kvmalloc is that those users are usually
> not considering all the aspects of the memory allocator. E.g. allocation
> requests < 64kB are basically never failing and invoke OOM killer to
> satisfy the allocation. This sounds too disruptive for something that
> has a reasonable fallback - the vmalloc. On the other hand those
> requests might fallback to vmalloc even when the memory allocator would
> succeed after several more reclaim/compaction attempts previously. There
> is no guarantee something like that happens though.
>
> This patch converts many of those places to kv[mz]alloc* helpers because
> they are more conservative.
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Santosh Raspatur <santosh@chelsio.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oleg Drokin <oleg.drokin@intel.com>
> Cc: Andreas Dilger <andreas.dilger@intel.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Sterba <dsterba@suse.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
Acked-by: Tariq Toukan <tariqt@mellanox.com>
For the mlx4 parts.

Regards.
Tariq

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-16  7:33       ` Michal Hocko
@ 2017-01-16  8:28         ` Leon Romanovsky
  -1 siblings, 0 replies; 180+ messages in thread
From: Leon Romanovsky @ 2017-01-16  8:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Ilya Dryomov,
	Alexei Starovoitov, Eric Dumazet, netdev

[-- Attachment #1: Type: text/plain, Size: 1552 bytes --]

On Mon, Jan 16, 2017 at 08:33:11AM +0100, Michal Hocko wrote:
> On Sat 14-01-17 12:56:32, Leon Romanovsky wrote:
> [...]
> > Hi Michal,
> >
> > I don't see mlx5_vzalloc in the changed list. Any reason why did you skip it?
> >
> >  881 static inline void *mlx5_vzalloc(unsigned long size)
> >  882 {
> >  883         void *rtn;
> >  884
> >  885         rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> >  886         if (!rtn)
> >  887                 rtn = vzalloc(size);
> >  888         return rtn;
> >  889 }
>
> No reason to skip it, I just didn't see it. I will fold the following in
> if you are OK with it

Sure, no problem.
Once, the patch set is accepted, we (Mellanox) will get rid of mlx5_vzalloc().

Thanks


> ---
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index cdd2bd62f86d..5e6063170e48 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -874,12 +874,7 @@ static inline u16 cmdif_rev(struct mlx5_core_dev *dev)
>
>  static inline void *mlx5_vzalloc(unsigned long size)
>  {
> -	void *rtn;
> -
> -	rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -	if (!rtn)
> -		rtn = vzalloc(size);
> -	return rtn;
> +	return kvzalloc(GFP_KERNEL, size);
>  }
>
>  static inline u32 mlx5_base_mkey(const u32 key)
>
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-16  8:28         ` Leon Romanovsky
  0 siblings, 0 replies; 180+ messages in thread
From: Leon Romanovsky @ 2017-01-16  8:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Kees Cook, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Ta

[-- Attachment #1: Type: text/plain, Size: 1552 bytes --]

On Mon, Jan 16, 2017 at 08:33:11AM +0100, Michal Hocko wrote:
> On Sat 14-01-17 12:56:32, Leon Romanovsky wrote:
> [...]
> > Hi Michal,
> >
> > I don't see mlx5_vzalloc in the changed list. Any reason why did you skip it?
> >
> >  881 static inline void *mlx5_vzalloc(unsigned long size)
> >  882 {
> >  883         void *rtn;
> >  884
> >  885         rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> >  886         if (!rtn)
> >  887                 rtn = vzalloc(size);
> >  888         return rtn;
> >  889 }
>
> No reason to skip it, I just didn't see it. I will fold the following in
> if you are OK with it

Sure, no problem.
Once, the patch set is accepted, we (Mellanox) will get rid of mlx5_vzalloc().

Thanks


> ---
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index cdd2bd62f86d..5e6063170e48 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -874,12 +874,7 @@ static inline u16 cmdif_rev(struct mlx5_core_dev *dev)
>
>  static inline void *mlx5_vzalloc(unsigned long size)
>  {
> -	void *rtn;
> -
> -	rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -	if (!rtn)
> -		rtn = vzalloc(size);
> -	return rtn;
> +	return kvzalloc(GFP_KERNEL, size);
>  }
>
>  static inline u32 mlx5_base_mkey(const u32 key)
>
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-16  4:34     ` John Hubbard
@ 2017-01-16  8:47       ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-16  8:47 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Sun 15-01-17 20:34:13, John Hubbard wrote:
> 
> 
> On 01/12/2017 07:37 AM, Michal Hocko wrote:
[...]
> > diff --git a/mm/util.c b/mm/util.c
> > index 3cb2164f4099..7e0c240b5760 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
> >  }
> >  EXPORT_SYMBOL(vm_mmap);
> > 
> > +/**
> > + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
> 
> Hi Michal,
> 
> How about this wording instead:
> 
> kvmalloc_node - attempt to allocate physically contiguous memory, but upon
> failure, fall back to non-contiguous (vmalloc) allocation.

OK, why not.
 
> > + * @size: size of the request.
> > + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
> > + * @node: numa node to allocate from
> > + *
> > + * Uses kmalloc to get the memory but if the allocation fails then falls back
> > + * to the vmalloc allocator. Use kvfree for freeing the memory.
> > + *
> > + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> 
> Is that "Reclaim modifiers" line still true, or is it a leftover from an
> earlier approach? I am having trouble reconciling it with rest of the
> patchset, because:
> 
> a) the flags argument below is effectively passed on to either kmalloc_node
> (possibly adding, but not removing flags), or to __vmalloc_node_flags.

The above only says thos are _unsupported_ - in other words the behavior
is not defined. Even if flags are passed down to kmalloc resp. vmalloc
it doesn't mean they are used that way.  Remember that vmalloc uses
some hardcoded GFP_KERNEL allocations.  So while I could be really
strict about this and mask away these flags I doubt this is worth the
additional code.
 
> b) In patch 6/6, you are in fact passing in __GFP_REPEAT to the wrappers
> (kvzalloc, for example), and again, only adding, not removing flags.

Patch 2 adds a support for __GFP_REPEAT and updates the above line as
well.
 
> > + */
> > +void *kvmalloc_node(size_t size, gfp_t flags, int node)
> > +{
> > +	gfp_t kmalloc_flags = flags;
> > +	void *ret;
> > +
> > +	/*
> > +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> > +	 * so the given set of flags has to be compatible.
> > +	 */
> > +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> > +
> > +	/*
> > +	 * Make sure that larger requests are not too disruptive - no OOM
> > +	 * killer and no allocation failure warnings as we have a fallback
> > +	 */
> > +	if (size > PAGE_SIZE)
> > +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> > +
> > +	ret = kmalloc_node(size, kmalloc_flags, node);
> 
> Along those lines (dealing with larger requests), is there any value in
> picking some threshold value, and going straight to vmalloc if size is
> greater than that threshold?

I am not a fan of thresholds. PAGE_ALLOC_COSTLY_ORDER which is
internally used by the page allocator has turned out to be a major pain.
I do not want to repeat the same mistake again here. Besides that you
could hard find a "one suits all" value so it would have to be a part of
the API. If we ever grow users who would really like to do something
like that then a specialized API should be added.

> It's less flexible and might even require
> occasional maintenance over the years, but it would save some time on *some*
> systems in some cases...OK, I think I just talked myself out of the whole
> idea. But I still want to put the question out there, because I think others
> may also ask it, and I'd like to hear a more experienced opinion.


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-16  8:47       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-16  8:47 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Sun 15-01-17 20:34:13, John Hubbard wrote:
> 
> 
> On 01/12/2017 07:37 AM, Michal Hocko wrote:
[...]
> > diff --git a/mm/util.c b/mm/util.c
> > index 3cb2164f4099..7e0c240b5760 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
> >  }
> >  EXPORT_SYMBOL(vm_mmap);
> > 
> > +/**
> > + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
> 
> Hi Michal,
> 
> How about this wording instead:
> 
> kvmalloc_node - attempt to allocate physically contiguous memory, but upon
> failure, fall back to non-contiguous (vmalloc) allocation.

OK, why not.
 
> > + * @size: size of the request.
> > + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
> > + * @node: numa node to allocate from
> > + *
> > + * Uses kmalloc to get the memory but if the allocation fails then falls back
> > + * to the vmalloc allocator. Use kvfree for freeing the memory.
> > + *
> > + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> 
> Is that "Reclaim modifiers" line still true, or is it a leftover from an
> earlier approach? I am having trouble reconciling it with rest of the
> patchset, because:
> 
> a) the flags argument below is effectively passed on to either kmalloc_node
> (possibly adding, but not removing flags), or to __vmalloc_node_flags.

The above only says thos are _unsupported_ - in other words the behavior
is not defined. Even if flags are passed down to kmalloc resp. vmalloc
it doesn't mean they are used that way.  Remember that vmalloc uses
some hardcoded GFP_KERNEL allocations.  So while I could be really
strict about this and mask away these flags I doubt this is worth the
additional code.
 
> b) In patch 6/6, you are in fact passing in __GFP_REPEAT to the wrappers
> (kvzalloc, for example), and again, only adding, not removing flags.

Patch 2 adds a support for __GFP_REPEAT and updates the above line as
well.
 
> > + */
> > +void *kvmalloc_node(size_t size, gfp_t flags, int node)
> > +{
> > +	gfp_t kmalloc_flags = flags;
> > +	void *ret;
> > +
> > +	/*
> > +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> > +	 * so the given set of flags has to be compatible.
> > +	 */
> > +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> > +
> > +	/*
> > +	 * Make sure that larger requests are not too disruptive - no OOM
> > +	 * killer and no allocation failure warnings as we have a fallback
> > +	 */
> > +	if (size > PAGE_SIZE)
> > +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> > +
> > +	ret = kmalloc_node(size, kmalloc_flags, node);
> 
> Along those lines (dealing with larger requests), is there any value in
> picking some threshold value, and going straight to vmalloc if size is
> greater than that threshold?

I am not a fan of thresholds. PAGE_ALLOC_COSTLY_ORDER which is
internally used by the page allocator has turned out to be a major pain.
I do not want to repeat the same mistake again here. Besides that you
could hard find a "one suits all" value so it would have to be a part of
the API. If we ever grow users who would really like to do something
like that then a specialized API should be added.

> It's less flexible and might even require
> occasional maintenance over the years, but it would save some time on *some*
> systems in some cases...OK, I think I just talked myself out of the whole
> idea. But I still want to put the question out there, because I think others
> may also ask it, and I'd like to hear a more experienced opinion.


-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-16  8:47       ` Michal Hocko
@ 2017-01-16 19:09         ` John Hubbard
  -1 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-16 19:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/16/2017 12:47 AM, Michal Hocko wrote:
> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>
>>
>> On 01/12/2017 07:37 AM, Michal Hocko wrote:
> [...]
>>> diff --git a/mm/util.c b/mm/util.c
>>> index 3cb2164f4099..7e0c240b5760 100644
>>> --- a/mm/util.c
>>> +++ b/mm/util.c
>>> @@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
>>>  }
>>>  EXPORT_SYMBOL(vm_mmap);
>>>
>>> +/**
>>> + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
>>
>> Hi Michal,
>>
>> How about this wording instead:
>>
>> kvmalloc_node - attempt to allocate physically contiguous memory, but upon
>> failure, fall back to non-contiguous (vmalloc) allocation.
>
> OK, why not.
>
>>> + * @size: size of the request.
>>> + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
>>> + * @node: numa node to allocate from
>>> + *
>>> + * Uses kmalloc to get the memory but if the allocation fails then falls back
>>> + * to the vmalloc allocator. Use kvfree for freeing the memory.
>>> + *
>>> + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
>>
>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>> earlier approach? I am having trouble reconciling it with rest of the
>> patchset, because:
>>
>> a) the flags argument below is effectively passed on to either kmalloc_node
>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>
> The above only says thos are _unsupported_ - in other words the behavior
> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> it doesn't mean they are used that way.  Remember that vmalloc uses
> some hardcoded GFP_KERNEL allocations.  So while I could be really
> strict about this and mask away these flags I doubt this is worth the
> additional code.

I do wonder about passing those flags through to kmalloc. Maybe it is worth stripping out 
__GFP_NORETRY and __GFP_NOFAIL, after all. It provides some insulation from any future changes to 
the implementation of kmalloc, and it also makes the documentation more believable.

>
>> b) In patch 6/6, you are in fact passing in __GFP_REPEAT to the wrappers
>> (kvzalloc, for example), and again, only adding, not removing flags.
>
> Patch 2 adds a support for __GFP_REPEAT and updates the above line as
> well.

OK, I see.

>
>>> + */
>>> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>> +{
>>> +	gfp_t kmalloc_flags = flags;
>>> +	void *ret;
>>> +
>>> +	/*
>>> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>> +	 * so the given set of flags has to be compatible.
>>> +	 */
>>> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
>>> +
>>> +	/*
>>> +	 * Make sure that larger requests are not too disruptive - no OOM
>>> +	 * killer and no allocation failure warnings as we have a fallback
>>> +	 */
>>> +	if (size > PAGE_SIZE)
>>> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
>>> +
>>> +	ret = kmalloc_node(size, kmalloc_flags, node);
>>
>> Along those lines (dealing with larger requests), is there any value in
>> picking some threshold value, and going straight to vmalloc if size is
>> greater than that threshold?
>
> I am not a fan of thresholds. PAGE_ALLOC_COSTLY_ORDER which is
> internally used by the page allocator has turned out to be a major pain.
> I do not want to repeat the same mistake again here. Besides that you
> could hard find a "one suits all" value so it would have to be a part of
> the API. If we ever grow users who would really like to do something
> like that then a specialized API should be added.

Thanks for explaining, and the note about the pain of dealing with PAGE_ALLOC_COSTLY_ORDER is 
especially interesting. Sounds good, then.

thanks
john h

>
>> It's less flexible and might even require
>> occasional maintenance over the years, but it would save some time on *some*
>> systems in some cases...OK, I think I just talked myself out of the whole
>> idea. But I still want to put the question out there, because I think others
>> may also ask it, and I'd like to hear a more experienced opinion.
>
>
> --
> Michal Hocko
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-16 19:09         ` John Hubbard
  0 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-16 19:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/16/2017 12:47 AM, Michal Hocko wrote:
> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>
>>
>> On 01/12/2017 07:37 AM, Michal Hocko wrote:
> [...]
>>> diff --git a/mm/util.c b/mm/util.c
>>> index 3cb2164f4099..7e0c240b5760 100644
>>> --- a/mm/util.c
>>> +++ b/mm/util.c
>>> @@ -324,6 +324,48 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
>>>  }
>>>  EXPORT_SYMBOL(vm_mmap);
>>>
>>> +/**
>>> + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
>>
>> Hi Michal,
>>
>> How about this wording instead:
>>
>> kvmalloc_node - attempt to allocate physically contiguous memory, but upon
>> failure, fall back to non-contiguous (vmalloc) allocation.
>
> OK, why not.
>
>>> + * @size: size of the request.
>>> + * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
>>> + * @node: numa node to allocate from
>>> + *
>>> + * Uses kmalloc to get the memory but if the allocation fails then falls back
>>> + * to the vmalloc allocator. Use kvfree for freeing the memory.
>>> + *
>>> + * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
>>
>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>> earlier approach? I am having trouble reconciling it with rest of the
>> patchset, because:
>>
>> a) the flags argument below is effectively passed on to either kmalloc_node
>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>
> The above only says thos are _unsupported_ - in other words the behavior
> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> it doesn't mean they are used that way.  Remember that vmalloc uses
> some hardcoded GFP_KERNEL allocations.  So while I could be really
> strict about this and mask away these flags I doubt this is worth the
> additional code.

I do wonder about passing those flags through to kmalloc. Maybe it is worth stripping out 
__GFP_NORETRY and __GFP_NOFAIL, after all. It provides some insulation from any future changes to 
the implementation of kmalloc, and it also makes the documentation more believable.

>
>> b) In patch 6/6, you are in fact passing in __GFP_REPEAT to the wrappers
>> (kvzalloc, for example), and again, only adding, not removing flags.
>
> Patch 2 adds a support for __GFP_REPEAT and updates the above line as
> well.

OK, I see.

>
>>> + */
>>> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>> +{
>>> +	gfp_t kmalloc_flags = flags;
>>> +	void *ret;
>>> +
>>> +	/*
>>> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>> +	 * so the given set of flags has to be compatible.
>>> +	 */
>>> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
>>> +
>>> +	/*
>>> +	 * Make sure that larger requests are not too disruptive - no OOM
>>> +	 * killer and no allocation failure warnings as we have a fallback
>>> +	 */
>>> +	if (size > PAGE_SIZE)
>>> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
>>> +
>>> +	ret = kmalloc_node(size, kmalloc_flags, node);
>>
>> Along those lines (dealing with larger requests), is there any value in
>> picking some threshold value, and going straight to vmalloc if size is
>> greater than that threshold?
>
> I am not a fan of thresholds. PAGE_ALLOC_COSTLY_ORDER which is
> internally used by the page allocator has turned out to be a major pain.
> I do not want to repeat the same mistake again here. Besides that you
> could hard find a "one suits all" value so it would have to be a part of
> the API. If we ever grow users who would really like to do something
> like that then a specialized API should be added.

Thanks for explaining, and the note about the pain of dealing with PAGE_ALLOC_COSTLY_ORDER is 
especially interesting. Sounds good, then.

thanks
john h

>
>> It's less flexible and might even require
>> occasional maintenance over the years, but it would save some time on *some*
>> systems in some cases...OK, I think I just talked myself out of the whole
>> idea. But I still want to put the question out there, because I think others
>> may also ask it, and I'd like to hear a more experienced opinion.
>
>
> --
> Michal Hocko
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-16 19:09         ` John Hubbard
@ 2017-01-16 19:40           ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-16 19:40 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Mon 16-01-17 11:09:37, John Hubbard wrote:
> 
> 
> On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > On Sun 15-01-17 20:34:13, John Hubbard wrote:
[...]
> > > Is that "Reclaim modifiers" line still true, or is it a leftover from an
> > > earlier approach? I am having trouble reconciling it with rest of the
> > > patchset, because:
> > > 
> > > a) the flags argument below is effectively passed on to either kmalloc_node
> > > (possibly adding, but not removing flags), or to __vmalloc_node_flags.
> > 
> > The above only says thos are _unsupported_ - in other words the behavior
> > is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> > it doesn't mean they are used that way.  Remember that vmalloc uses
> > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > strict about this and mask away these flags I doubt this is worth the
> > additional code.
> 
> I do wonder about passing those flags through to kmalloc. Maybe it is worth
> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
> insulation from any future changes to the implementation of kmalloc, and it
> also makes the documentation more believable.

I am not really convinced that we should take an extra steps for these
flags. There are no existing users for those flags and new users should
follow the documentation.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-16 19:40           ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-16 19:40 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Mon 16-01-17 11:09:37, John Hubbard wrote:
> 
> 
> On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > On Sun 15-01-17 20:34:13, John Hubbard wrote:
[...]
> > > Is that "Reclaim modifiers" line still true, or is it a leftover from an
> > > earlier approach? I am having trouble reconciling it with rest of the
> > > patchset, because:
> > > 
> > > a) the flags argument below is effectively passed on to either kmalloc_node
> > > (possibly adding, but not removing flags), or to __vmalloc_node_flags.
> > 
> > The above only says thos are _unsupported_ - in other words the behavior
> > is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> > it doesn't mean they are used that way.  Remember that vmalloc uses
> > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > strict about this and mask away these flags I doubt this is worth the
> > additional code.
> 
> I do wonder about passing those flags through to kmalloc. Maybe it is worth
> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
> insulation from any future changes to the implementation of kmalloc, and it
> also makes the documentation more believable.

I am not really convinced that we should take an extra steps for these
flags. There are no existing users for those flags and new users should
follow the documentation.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-16 19:40           ` Michal Hocko
@ 2017-01-16 21:15             ` John Hubbard
  -1 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-16 21:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/16/2017 11:40 AM, Michal Hocko wrote:
> On Mon 16-01-17 11:09:37, John Hubbard wrote:
>>
>>
>> On 01/16/2017 12:47 AM, Michal Hocko wrote:
>>> On Sun 15-01-17 20:34:13, John Hubbard wrote:
> [...]
>>>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>>>> earlier approach? I am having trouble reconciling it with rest of the
>>>> patchset, because:
>>>>
>>>> a) the flags argument below is effectively passed on to either kmalloc_node
>>>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>>>
>>> The above only says thos are _unsupported_ - in other words the behavior
>>> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
>>> it doesn't mean they are used that way.  Remember that vmalloc uses
>>> some hardcoded GFP_KERNEL allocations.  So while I could be really
>>> strict about this and mask away these flags I doubt this is worth the
>>> additional code.
>>
>> I do wonder about passing those flags through to kmalloc. Maybe it is worth
>> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
>> insulation from any future changes to the implementation of kmalloc, and it
>> also makes the documentation more believable.
>
> I am not really convinced that we should take an extra steps for these
> flags. There are no existing users for those flags and new users should
> follow the documentation.

OK, let's just fortify the documentation ever so slightly, then, so that users are more likely to do 
the right thing. How's this sound:

* Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
* though the current implementation passes the flags on through to kmalloc and
* vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
* should not pass in these flags.)
*
* __GFP_REPEAT is supported, but only for large (>64kB) allocations.


? Or is that documentation overkill?

thanks
john h

>
> --
> Michal Hocko
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-16 21:15             ` John Hubbard
  0 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-16 21:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/16/2017 11:40 AM, Michal Hocko wrote:
> On Mon 16-01-17 11:09:37, John Hubbard wrote:
>>
>>
>> On 01/16/2017 12:47 AM, Michal Hocko wrote:
>>> On Sun 15-01-17 20:34:13, John Hubbard wrote:
> [...]
>>>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>>>> earlier approach? I am having trouble reconciling it with rest of the
>>>> patchset, because:
>>>>
>>>> a) the flags argument below is effectively passed on to either kmalloc_node
>>>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>>>
>>> The above only says thos are _unsupported_ - in other words the behavior
>>> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
>>> it doesn't mean they are used that way.  Remember that vmalloc uses
>>> some hardcoded GFP_KERNEL allocations.  So while I could be really
>>> strict about this and mask away these flags I doubt this is worth the
>>> additional code.
>>
>> I do wonder about passing those flags through to kmalloc. Maybe it is worth
>> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
>> insulation from any future changes to the implementation of kmalloc, and it
>> also makes the documentation more believable.
>
> I am not really convinced that we should take an extra steps for these
> flags. There are no existing users for those flags and new users should
> follow the documentation.

OK, let's just fortify the documentation ever so slightly, then, so that users are more likely to do 
the right thing. How's this sound:

* Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
* though the current implementation passes the flags on through to kmalloc and
* vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
* should not pass in these flags.)
*
* __GFP_REPEAT is supported, but only for large (>64kB) allocations.


? Or is that documentation overkill?

thanks
john h

>
> --
> Michal Hocko
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-16 21:15             ` John Hubbard
@ 2017-01-16 21:48               ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-16 21:48 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Mon 16-01-17 13:15:08, John Hubbard wrote:
> 
> 
> On 01/16/2017 11:40 AM, Michal Hocko wrote:
> > On Mon 16-01-17 11:09:37, John Hubbard wrote:
> > > 
> > > 
> > > On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > > > On Sun 15-01-17 20:34:13, John Hubbard wrote:
> > [...]
> > > > > Is that "Reclaim modifiers" line still true, or is it a leftover from an
> > > > > earlier approach? I am having trouble reconciling it with rest of the
> > > > > patchset, because:
> > > > > 
> > > > > a) the flags argument below is effectively passed on to either kmalloc_node
> > > > > (possibly adding, but not removing flags), or to __vmalloc_node_flags.
> > > > 
> > > > The above only says thos are _unsupported_ - in other words the behavior
> > > > is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> > > > it doesn't mean they are used that way.  Remember that vmalloc uses
> > > > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > > > strict about this and mask away these flags I doubt this is worth the
> > > > additional code.
> > > 
> > > I do wonder about passing those flags through to kmalloc. Maybe it is worth
> > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
> > > insulation from any future changes to the implementation of kmalloc, and it
> > > also makes the documentation more believable.
> > 
> > I am not really convinced that we should take an extra steps for these
> > flags. There are no existing users for those flags and new users should
> > follow the documentation.
> 
> OK, let's just fortify the documentation ever so slightly, then, so that
> users are more likely to do the right thing. How's this sound:
> 
> * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
> * though the current implementation passes the flags on through to kmalloc and
> * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
> * should not pass in these flags.)
> *
> * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
> 
> 
> ? Or is that documentation overkill?

Dunno, it sounds like an overkill to me. It is telling more than
necessary. If we want to be so vocal about gfp flags then we would have
to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
supported for vmalloc while unsupported for kmalloc. I am pretty sure
there would be other gfp flags to consider and then this would grow
borringly large and uninteresting to the point when people simply stop
reading it. Let's just be as simple as possible.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-16 21:48               ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-16 21:48 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Mon 16-01-17 13:15:08, John Hubbard wrote:
> 
> 
> On 01/16/2017 11:40 AM, Michal Hocko wrote:
> > On Mon 16-01-17 11:09:37, John Hubbard wrote:
> > > 
> > > 
> > > On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > > > On Sun 15-01-17 20:34:13, John Hubbard wrote:
> > [...]
> > > > > Is that "Reclaim modifiers" line still true, or is it a leftover from an
> > > > > earlier approach? I am having trouble reconciling it with rest of the
> > > > > patchset, because:
> > > > > 
> > > > > a) the flags argument below is effectively passed on to either kmalloc_node
> > > > > (possibly adding, but not removing flags), or to __vmalloc_node_flags.
> > > > 
> > > > The above only says thos are _unsupported_ - in other words the behavior
> > > > is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> > > > it doesn't mean they are used that way.  Remember that vmalloc uses
> > > > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > > > strict about this and mask away these flags I doubt this is worth the
> > > > additional code.
> > > 
> > > I do wonder about passing those flags through to kmalloc. Maybe it is worth
> > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
> > > insulation from any future changes to the implementation of kmalloc, and it
> > > also makes the documentation more believable.
> > 
> > I am not really convinced that we should take an extra steps for these
> > flags. There are no existing users for those flags and new users should
> > follow the documentation.
> 
> OK, let's just fortify the documentation ever so slightly, then, so that
> users are more likely to do the right thing. How's this sound:
> 
> * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
> * though the current implementation passes the flags on through to kmalloc and
> * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
> * should not pass in these flags.)
> *
> * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
> 
> 
> ? Or is that documentation overkill?

Dunno, it sounds like an overkill to me. It is telling more than
necessary. If we want to be so vocal about gfp flags then we would have
to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
supported for vmalloc while unsupported for kmalloc. I am pretty sure
there would be other gfp flags to consider and then this would grow
borringly large and uninteresting to the point when people simply stop
reading it. Let's just be as simple as possible.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-16 21:48               ` Michal Hocko
@ 2017-01-16 21:57                 ` John Hubbard
  -1 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-16 21:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/16/2017 01:48 PM, Michal Hocko wrote:
> On Mon 16-01-17 13:15:08, John Hubbard wrote:
>>
>>
>> On 01/16/2017 11:40 AM, Michal Hocko wrote:
>>> On Mon 16-01-17 11:09:37, John Hubbard wrote:
>>>>
>>>>
>>>> On 01/16/2017 12:47 AM, Michal Hocko wrote:
>>>>> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>> [...]
>>>>>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>>>>>> earlier approach? I am having trouble reconciling it with rest of the
>>>>>> patchset, because:
>>>>>>
>>>>>> a) the flags argument below is effectively passed on to either kmalloc_node
>>>>>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>>>>>
>>>>> The above only says thos are _unsupported_ - in other words the behavior
>>>>> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
>>>>> it doesn't mean they are used that way.  Remember that vmalloc uses
>>>>> some hardcoded GFP_KERNEL allocations.  So while I could be really
>>>>> strict about this and mask away these flags I doubt this is worth the
>>>>> additional code.
>>>>
>>>> I do wonder about passing those flags through to kmalloc. Maybe it is worth
>>>> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
>>>> insulation from any future changes to the implementation of kmalloc, and it
>>>> also makes the documentation more believable.
>>>
>>> I am not really convinced that we should take an extra steps for these
>>> flags. There are no existing users for those flags and new users should
>>> follow the documentation.
>>
>> OK, let's just fortify the documentation ever so slightly, then, so that
>> users are more likely to do the right thing. How's this sound:
>>
>> * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
>> * though the current implementation passes the flags on through to kmalloc and
>> * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
>> * should not pass in these flags.)
>> *
>> * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
>>
>>
>> ? Or is that documentation overkill?
>
> Dunno, it sounds like an overkill to me. It is telling more than
> necessary. If we want to be so vocal about gfp flags then we would have
> to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
> supported for vmalloc while unsupported for kmalloc. I am pretty sure
> there would be other gfp flags to consider and then this would grow
> borringly large and uninteresting to the point when people simply stop
> reading it. Let's just be as simple as possible.

Agreed, on the simplicity point: simple and clear is ideal. But here, it's merely short, and not 
quite simple. :)  People will look at that short bit of documentation, and then notice that the 
flags are, in fact, all passed right on through down to both kmalloc_node and __vmalloc_node_flags.

If you don't want too much documentation, then I'd be inclined to say something higher-level, about 
the intent, rather than mentioning those two flags directly. Because as it stands, the documentation 
contradicts what the code does.

Sorry to go on and on about such a minor point. I'll let it go after this last note.

> --
> Michal Hocko
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-16 21:57                 ` John Hubbard
  0 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-16 21:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/16/2017 01:48 PM, Michal Hocko wrote:
> On Mon 16-01-17 13:15:08, John Hubbard wrote:
>>
>>
>> On 01/16/2017 11:40 AM, Michal Hocko wrote:
>>> On Mon 16-01-17 11:09:37, John Hubbard wrote:
>>>>
>>>>
>>>> On 01/16/2017 12:47 AM, Michal Hocko wrote:
>>>>> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>> [...]
>>>>>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>>>>>> earlier approach? I am having trouble reconciling it with rest of the
>>>>>> patchset, because:
>>>>>>
>>>>>> a) the flags argument below is effectively passed on to either kmalloc_node
>>>>>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>>>>>
>>>>> The above only says thos are _unsupported_ - in other words the behavior
>>>>> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
>>>>> it doesn't mean they are used that way.  Remember that vmalloc uses
>>>>> some hardcoded GFP_KERNEL allocations.  So while I could be really
>>>>> strict about this and mask away these flags I doubt this is worth the
>>>>> additional code.
>>>>
>>>> I do wonder about passing those flags through to kmalloc. Maybe it is worth
>>>> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
>>>> insulation from any future changes to the implementation of kmalloc, and it
>>>> also makes the documentation more believable.
>>>
>>> I am not really convinced that we should take an extra steps for these
>>> flags. There are no existing users for those flags and new users should
>>> follow the documentation.
>>
>> OK, let's just fortify the documentation ever so slightly, then, so that
>> users are more likely to do the right thing. How's this sound:
>>
>> * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
>> * though the current implementation passes the flags on through to kmalloc and
>> * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
>> * should not pass in these flags.)
>> *
>> * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
>>
>>
>> ? Or is that documentation overkill?
>
> Dunno, it sounds like an overkill to me. It is telling more than
> necessary. If we want to be so vocal about gfp flags then we would have
> to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
> supported for vmalloc while unsupported for kmalloc. I am pretty sure
> there would be other gfp flags to consider and then this would grow
> borringly large and uninteresting to the point when people simply stop
> reading it. Let's just be as simple as possible.

Agreed, on the simplicity point: simple and clear is ideal. But here, it's merely short, and not 
quite simple. :)  People will look at that short bit of documentation, and then notice that the 
flags are, in fact, all passed right on through down to both kmalloc_node and __vmalloc_node_flags.

If you don't want too much documentation, then I'd be inclined to say something higher-level, about 
the intent, rather than mentioning those two flags directly. Because as it stands, the documentation 
contradicts what the code does.

Sorry to go on and on about such a minor point. I'll let it go after this last note.

> --
> Michal Hocko
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-16 21:57                 ` John Hubbard
@ 2017-01-17  7:51                   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-17  7:51 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Mon 16-01-17 13:57:43, John Hubbard wrote:
> 
> 
> On 01/16/2017 01:48 PM, Michal Hocko wrote:
> > On Mon 16-01-17 13:15:08, John Hubbard wrote:
> > > 
> > > 
> > > On 01/16/2017 11:40 AM, Michal Hocko wrote:
> > > > On Mon 16-01-17 11:09:37, John Hubbard wrote:
> > > > > 
> > > > > 
> > > > > On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > > > > > On Sun 15-01-17 20:34:13, John Hubbard wrote:
> > > > [...]
> > > > > > > Is that "Reclaim modifiers" line still true, or is it a leftover from an
> > > > > > > earlier approach? I am having trouble reconciling it with rest of the
> > > > > > > patchset, because:
> > > > > > > 
> > > > > > > a) the flags argument below is effectively passed on to either kmalloc_node
> > > > > > > (possibly adding, but not removing flags), or to __vmalloc_node_flags.
> > > > > > 
> > > > > > The above only says thos are _unsupported_ - in other words the behavior
> > > > > > is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> > > > > > it doesn't mean they are used that way.  Remember that vmalloc uses
> > > > > > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > > > > > strict about this and mask away these flags I doubt this is worth the
> > > > > > additional code.
> > > > > 
> > > > > I do wonder about passing those flags through to kmalloc. Maybe it is worth
> > > > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
> > > > > insulation from any future changes to the implementation of kmalloc, and it
> > > > > also makes the documentation more believable.
> > > > 
> > > > I am not really convinced that we should take an extra steps for these
> > > > flags. There are no existing users for those flags and new users should
> > > > follow the documentation.
> > > 
> > > OK, let's just fortify the documentation ever so slightly, then, so that
> > > users are more likely to do the right thing. How's this sound:
> > > 
> > > * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
> > > * though the current implementation passes the flags on through to kmalloc and
> > > * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
> > > * should not pass in these flags.)
> > > *
> > > * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
> > > 
> > > 
> > > ? Or is that documentation overkill?
> > 
> > Dunno, it sounds like an overkill to me. It is telling more than
> > necessary. If we want to be so vocal about gfp flags then we would have
> > to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
> > supported for vmalloc while unsupported for kmalloc. I am pretty sure
> > there would be other gfp flags to consider and then this would grow
> > borringly large and uninteresting to the point when people simply stop
> > reading it. Let's just be as simple as possible.
> 
> Agreed, on the simplicity point: simple and clear is ideal. But here, it's
> merely short, and not quite simple. :)  People will look at that short bit
> of documentation, and then notice that the flags are, in fact, all passed
> right on through down to both kmalloc_node and __vmalloc_node_flags.
> 
> If you don't want too much documentation, then I'd be inclined to say
> something higher-level, about the intent, rather than mentioning those two
> flags directly. Because as it stands, the documentation contradicts what the
> code does.

Feel free to suggest a better wording. I am, of course, open to any
changes.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-17  7:51                   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-17  7:51 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Mon 16-01-17 13:57:43, John Hubbard wrote:
> 
> 
> On 01/16/2017 01:48 PM, Michal Hocko wrote:
> > On Mon 16-01-17 13:15:08, John Hubbard wrote:
> > > 
> > > 
> > > On 01/16/2017 11:40 AM, Michal Hocko wrote:
> > > > On Mon 16-01-17 11:09:37, John Hubbard wrote:
> > > > > 
> > > > > 
> > > > > On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > > > > > On Sun 15-01-17 20:34:13, John Hubbard wrote:
> > > > [...]
> > > > > > > Is that "Reclaim modifiers" line still true, or is it a leftover from an
> > > > > > > earlier approach? I am having trouble reconciling it with rest of the
> > > > > > > patchset, because:
> > > > > > > 
> > > > > > > a) the flags argument below is effectively passed on to either kmalloc_node
> > > > > > > (possibly adding, but not removing flags), or to __vmalloc_node_flags.
> > > > > > 
> > > > > > The above only says thos are _unsupported_ - in other words the behavior
> > > > > > is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> > > > > > it doesn't mean they are used that way.  Remember that vmalloc uses
> > > > > > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > > > > > strict about this and mask away these flags I doubt this is worth the
> > > > > > additional code.
> > > > > 
> > > > > I do wonder about passing those flags through to kmalloc. Maybe it is worth
> > > > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
> > > > > insulation from any future changes to the implementation of kmalloc, and it
> > > > > also makes the documentation more believable.
> > > > 
> > > > I am not really convinced that we should take an extra steps for these
> > > > flags. There are no existing users for those flags and new users should
> > > > follow the documentation.
> > > 
> > > OK, let's just fortify the documentation ever so slightly, then, so that
> > > users are more likely to do the right thing. How's this sound:
> > > 
> > > * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
> > > * though the current implementation passes the flags on through to kmalloc and
> > > * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
> > > * should not pass in these flags.)
> > > *
> > > * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
> > > 
> > > 
> > > ? Or is that documentation overkill?
> > 
> > Dunno, it sounds like an overkill to me. It is telling more than
> > necessary. If we want to be so vocal about gfp flags then we would have
> > to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
> > supported for vmalloc while unsupported for kmalloc. I am pretty sure
> > there would be other gfp flags to consider and then this would grow
> > borringly large and uninteresting to the point when people simply stop
> > reading it. Let's just be as simple as possible.
> 
> Agreed, on the simplicity point: simple and clear is ideal. But here, it's
> merely short, and not quite simple. :)  People will look at that short bit
> of documentation, and then notice that the flags are, in fact, all passed
> right on through down to both kmalloc_node and __vmalloc_node_flags.
> 
> If you don't want too much documentation, then I'd be inclined to say
> something higher-level, about the intent, rather than mentioning those two
> flags directly. Because as it stands, the documentation contradicts what the
> code does.

Feel free to suggest a better wording. I am, of course, open to any
changes.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-17  7:51                   ` Michal Hocko
@ 2017-01-18  5:59                     ` John Hubbard
  -1 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-18  5:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o


On 01/16/2017 11:51 PM, Michal Hocko wrote:
> On Mon 16-01-17 13:57:43, John Hubbard wrote:
>>
>>
>> On 01/16/2017 01:48 PM, Michal Hocko wrote:
>>> On Mon 16-01-17 13:15:08, John Hubbard wrote:
>>>>
>>>>
>>>> On 01/16/2017 11:40 AM, Michal Hocko wrote:
>>>>> On Mon 16-01-17 11:09:37, John Hubbard wrote:
>>>>>>
>>>>>>
>>>>>> On 01/16/2017 12:47 AM, Michal Hocko wrote:
>>>>>>> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>>>> [...]
>>>>>>>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>>>>>>>> earlier approach? I am having trouble reconciling it with rest of the
>>>>>>>> patchset, because:
>>>>>>>>
>>>>>>>> a) the flags argument below is effectively passed on to either kmalloc_node
>>>>>>>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>>>>>>>
>>>>>>> The above only says thos are _unsupported_ - in other words the behavior
>>>>>>> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
>>>>>>> it doesn't mean they are used that way.  Remember that vmalloc uses
>>>>>>> some hardcoded GFP_KERNEL allocations.  So while I could be really
>>>>>>> strict about this and mask away these flags I doubt this is worth the
>>>>>>> additional code.
>>>>>>
>>>>>> I do wonder about passing those flags through to kmalloc. Maybe it is worth
>>>>>> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
>>>>>> insulation from any future changes to the implementation of kmalloc, and it
>>>>>> also makes the documentation more believable.
>>>>>
>>>>> I am not really convinced that we should take an extra steps for these
>>>>> flags. There are no existing users for those flags and new users should
>>>>> follow the documentation.
>>>>
>>>> OK, let's just fortify the documentation ever so slightly, then, so that
>>>> users are more likely to do the right thing. How's this sound:
>>>>
>>>> * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
>>>> * though the current implementation passes the flags on through to kmalloc and
>>>> * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
>>>> * should not pass in these flags.)
>>>> *
>>>> * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
>>>>
>>>>
>>>> ? Or is that documentation overkill?
>>>
>>> Dunno, it sounds like an overkill to me. It is telling more than
>>> necessary. If we want to be so vocal about gfp flags then we would have
>>> to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
>>> supported for vmalloc while unsupported for kmalloc. I am pretty sure
>>> there would be other gfp flags to consider and then this would grow
>>> borringly large and uninteresting to the point when people simply stop
>>> reading it. Let's just be as simple as possible.
>>
>> Agreed, on the simplicity point: simple and clear is ideal. But here, it's
>> merely short, and not quite simple. :)  People will look at that short bit
>> of documentation, and then notice that the flags are, in fact, all passed
>> right on through down to both kmalloc_node and __vmalloc_node_flags.
>>
>> If you don't want too much documentation, then I'd be inclined to say
>> something higher-level, about the intent, rather than mentioning those two
>> flags directly. Because as it stands, the documentation contradicts what the
>> code does.
>
> Feel free to suggest a better wording. I am, of course, open to any
> changes.

OK, here's the best I've got, I tried to keep it concise, but (as you suspected) I'm not sure it's 
actually any better than the original:

  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
  * (<=64KB) allocations, during the kmalloc attempt. __GFP_REPEAT is fully
  * honored for  all allocation sizes during the second part: the vmalloc attempt.


>
> --
> Michal Hocko
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-18  5:59                     ` John Hubbard
  0 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-18  5:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o


On 01/16/2017 11:51 PM, Michal Hocko wrote:
> On Mon 16-01-17 13:57:43, John Hubbard wrote:
>>
>>
>> On 01/16/2017 01:48 PM, Michal Hocko wrote:
>>> On Mon 16-01-17 13:15:08, John Hubbard wrote:
>>>>
>>>>
>>>> On 01/16/2017 11:40 AM, Michal Hocko wrote:
>>>>> On Mon 16-01-17 11:09:37, John Hubbard wrote:
>>>>>>
>>>>>>
>>>>>> On 01/16/2017 12:47 AM, Michal Hocko wrote:
>>>>>>> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>>>> [...]
>>>>>>>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>>>>>>>> earlier approach? I am having trouble reconciling it with rest of the
>>>>>>>> patchset, because:
>>>>>>>>
>>>>>>>> a) the flags argument below is effectively passed on to either kmalloc_node
>>>>>>>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>>>>>>>
>>>>>>> The above only says thos are _unsupported_ - in other words the behavior
>>>>>>> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
>>>>>>> it doesn't mean they are used that way.  Remember that vmalloc uses
>>>>>>> some hardcoded GFP_KERNEL allocations.  So while I could be really
>>>>>>> strict about this and mask away these flags I doubt this is worth the
>>>>>>> additional code.
>>>>>>
>>>>>> I do wonder about passing those flags through to kmalloc. Maybe it is worth
>>>>>> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
>>>>>> insulation from any future changes to the implementation of kmalloc, and it
>>>>>> also makes the documentation more believable.
>>>>>
>>>>> I am not really convinced that we should take an extra steps for these
>>>>> flags. There are no existing users for those flags and new users should
>>>>> follow the documentation.
>>>>
>>>> OK, let's just fortify the documentation ever so slightly, then, so that
>>>> users are more likely to do the right thing. How's this sound:
>>>>
>>>> * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
>>>> * though the current implementation passes the flags on through to kmalloc and
>>>> * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
>>>> * should not pass in these flags.)
>>>> *
>>>> * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
>>>>
>>>>
>>>> ? Or is that documentation overkill?
>>>
>>> Dunno, it sounds like an overkill to me. It is telling more than
>>> necessary. If we want to be so vocal about gfp flags then we would have
>>> to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
>>> supported for vmalloc while unsupported for kmalloc. I am pretty sure
>>> there would be other gfp flags to consider and then this would grow
>>> borringly large and uninteresting to the point when people simply stop
>>> reading it. Let's just be as simple as possible.
>>
>> Agreed, on the simplicity point: simple and clear is ideal. But here, it's
>> merely short, and not quite simple. :)  People will look at that short bit
>> of documentation, and then notice that the flags are, in fact, all passed
>> right on through down to both kmalloc_node and __vmalloc_node_flags.
>>
>> If you don't want too much documentation, then I'd be inclined to say
>> something higher-level, about the intent, rather than mentioning those two
>> flags directly. Because as it stands, the documentation contradicts what the
>> code does.
>
> Feel free to suggest a better wording. I am, of course, open to any
> changes.

OK, here's the best I've got, I tried to keep it concise, but (as you suspected) I'm not sure it's 
actually any better than the original:

  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
  * (<=64KB) allocations, during the kmalloc attempt. __GFP_REPEAT is fully
  * honored for  all allocation sizes during the second part: the vmalloc attempt.


>
> --
> Michal Hocko
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-18  5:59                     ` John Hubbard
@ 2017-01-18  8:21                       ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-18  8:21 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Tue 17-01-17 21:59:13, John Hubbard wrote:
> 
> On 01/16/2017 11:51 PM, Michal Hocko wrote:
> > On Mon 16-01-17 13:57:43, John Hubbard wrote:
> > > 
> > > 
> > > On 01/16/2017 01:48 PM, Michal Hocko wrote:
> > > > On Mon 16-01-17 13:15:08, John Hubbard wrote:
> > > > > 
> > > > > 
> > > > > On 01/16/2017 11:40 AM, Michal Hocko wrote:
> > > > > > On Mon 16-01-17 11:09:37, John Hubbard wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > > > > > > > On Sun 15-01-17 20:34:13, John Hubbard wrote:
> > > > > > [...]
> > > > > > > > > Is that "Reclaim modifiers" line still true, or is it a leftover from an
> > > > > > > > > earlier approach? I am having trouble reconciling it with rest of the
> > > > > > > > > patchset, because:
> > > > > > > > > 
> > > > > > > > > a) the flags argument below is effectively passed on to either kmalloc_node
> > > > > > > > > (possibly adding, but not removing flags), or to __vmalloc_node_flags.
> > > > > > > > 
> > > > > > > > The above only says thos are _unsupported_ - in other words the behavior
> > > > > > > > is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> > > > > > > > it doesn't mean they are used that way.  Remember that vmalloc uses
> > > > > > > > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > > > > > > > strict about this and mask away these flags I doubt this is worth the
> > > > > > > > additional code.
> > > > > > > 
> > > > > > > I do wonder about passing those flags through to kmalloc. Maybe it is worth
> > > > > > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
> > > > > > > insulation from any future changes to the implementation of kmalloc, and it
> > > > > > > also makes the documentation more believable.
> > > > > > 
> > > > > > I am not really convinced that we should take an extra steps for these
> > > > > > flags. There are no existing users for those flags and new users should
> > > > > > follow the documentation.
> > > > > 
> > > > > OK, let's just fortify the documentation ever so slightly, then, so that
> > > > > users are more likely to do the right thing. How's this sound:
> > > > > 
> > > > > * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
> > > > > * though the current implementation passes the flags on through to kmalloc and
> > > > > * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
> > > > > * should not pass in these flags.)
> > > > > *
> > > > > * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
> > > > > 
> > > > > 
> > > > > ? Or is that documentation overkill?
> > > > 
> > > > Dunno, it sounds like an overkill to me. It is telling more than
> > > > necessary. If we want to be so vocal about gfp flags then we would have
> > > > to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
> > > > supported for vmalloc while unsupported for kmalloc. I am pretty sure
> > > > there would be other gfp flags to consider and then this would grow
> > > > borringly large and uninteresting to the point when people simply stop
> > > > reading it. Let's just be as simple as possible.
> > > 
> > > Agreed, on the simplicity point: simple and clear is ideal. But here, it's
> > > merely short, and not quite simple. :)  People will look at that short bit
> > > of documentation, and then notice that the flags are, in fact, all passed
> > > right on through down to both kmalloc_node and __vmalloc_node_flags.
> > > 
> > > If you don't want too much documentation, then I'd be inclined to say
> > > something higher-level, about the intent, rather than mentioning those two
> > > flags directly. Because as it stands, the documentation contradicts what the
> > > code does.
> > 
> > Feel free to suggest a better wording. I am, of course, open to any
> > changes.
> 
> OK, here's the best I've got, I tried to keep it concise, but (as you
> suspected) I'm not sure it's actually any better than the original:
> 
>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
>  * (<=64KB) allocations, during the kmalloc attempt. 

> __GFP_REPEAT is fully
>  * honored for  all allocation sizes during the second part: the vmalloc attempt.

this is not true to be really precise because vmalloc doesn't respect
the given gfp mask all the way down (look at the pte initialization).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-18  8:21                       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-18  8:21 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Tue 17-01-17 21:59:13, John Hubbard wrote:
> 
> On 01/16/2017 11:51 PM, Michal Hocko wrote:
> > On Mon 16-01-17 13:57:43, John Hubbard wrote:
> > > 
> > > 
> > > On 01/16/2017 01:48 PM, Michal Hocko wrote:
> > > > On Mon 16-01-17 13:15:08, John Hubbard wrote:
> > > > > 
> > > > > 
> > > > > On 01/16/2017 11:40 AM, Michal Hocko wrote:
> > > > > > On Mon 16-01-17 11:09:37, John Hubbard wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > > > > > > > On Sun 15-01-17 20:34:13, John Hubbard wrote:
> > > > > > [...]
> > > > > > > > > Is that "Reclaim modifiers" line still true, or is it a leftover from an
> > > > > > > > > earlier approach? I am having trouble reconciling it with rest of the
> > > > > > > > > patchset, because:
> > > > > > > > > 
> > > > > > > > > a) the flags argument below is effectively passed on to either kmalloc_node
> > > > > > > > > (possibly adding, but not removing flags), or to __vmalloc_node_flags.
> > > > > > > > 
> > > > > > > > The above only says thos are _unsupported_ - in other words the behavior
> > > > > > > > is not defined. Even if flags are passed down to kmalloc resp. vmalloc
> > > > > > > > it doesn't mean they are used that way.  Remember that vmalloc uses
> > > > > > > > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > > > > > > > strict about this and mask away these flags I doubt this is worth the
> > > > > > > > additional code.
> > > > > > > 
> > > > > > > I do wonder about passing those flags through to kmalloc. Maybe it is worth
> > > > > > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
> > > > > > > insulation from any future changes to the implementation of kmalloc, and it
> > > > > > > also makes the documentation more believable.
> > > > > > 
> > > > > > I am not really convinced that we should take an extra steps for these
> > > > > > flags. There are no existing users for those flags and new users should
> > > > > > follow the documentation.
> > > > > 
> > > > > OK, let's just fortify the documentation ever so slightly, then, so that
> > > > > users are more likely to do the right thing. How's this sound:
> > > > > 
> > > > > * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
> > > > > * though the current implementation passes the flags on through to kmalloc and
> > > > > * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
> > > > > * should not pass in these flags.)
> > > > > *
> > > > > * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
> > > > > 
> > > > > 
> > > > > ? Or is that documentation overkill?
> > > > 
> > > > Dunno, it sounds like an overkill to me. It is telling more than
> > > > necessary. If we want to be so vocal about gfp flags then we would have
> > > > to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
> > > > supported for vmalloc while unsupported for kmalloc. I am pretty sure
> > > > there would be other gfp flags to consider and then this would grow
> > > > borringly large and uninteresting to the point when people simply stop
> > > > reading it. Let's just be as simple as possible.
> > > 
> > > Agreed, on the simplicity point: simple and clear is ideal. But here, it's
> > > merely short, and not quite simple. :)  People will look at that short bit
> > > of documentation, and then notice that the flags are, in fact, all passed
> > > right on through down to both kmalloc_node and __vmalloc_node_flags.
> > > 
> > > If you don't want too much documentation, then I'd be inclined to say
> > > something higher-level, about the intent, rather than mentioning those two
> > > flags directly. Because as it stands, the documentation contradicts what the
> > > code does.
> > 
> > Feel free to suggest a better wording. I am, of course, open to any
> > changes.
> 
> OK, here's the best I've got, I tried to keep it concise, but (as you
> suspected) I'm not sure it's actually any better than the original:
> 
>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
>  * (<=64KB) allocations, during the kmalloc attempt. 

> __GFP_REPEAT is fully
>  * honored for  all allocation sizes during the second part: the vmalloc attempt.

this is not true to be really precise because vmalloc doesn't respect
the given gfp mask all the way down (look at the pte initialization).
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-18  8:21                       ` Michal Hocko
@ 2017-01-19  8:37                         ` John Hubbard
  -1 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-19  8:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/18/2017 12:21 AM, Michal Hocko wrote:
> On Tue 17-01-17 21:59:13, John Hubbard wrote:
>>
>> On 01/16/2017 11:51 PM, Michal Hocko wrote:
>>> On Mon 16-01-17 13:57:43, John Hubbard wrote:
>>>>
>>>>
>>>> On 01/16/2017 01:48 PM, Michal Hocko wrote:
>>>>> On Mon 16-01-17 13:15:08, John Hubbard wrote:
>>>>>>
>>>>>>
>>>>>> On 01/16/2017 11:40 AM, Michal Hocko wrote:
>>>>>>> On Mon 16-01-17 11:09:37, John Hubbard wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/16/2017 12:47 AM, Michal Hocko wrote:
>>>>>>>>> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>>>>>> [...]
>>>>>>>>>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>>>>>>>>>> earlier approach? I am having trouble reconciling it with rest of the
>>>>>>>>>> patchset, because:
>>>>>>>>>>
>>>>>>>>>> a) the flags argument below is effectively passed on to either kmalloc_node
>>>>>>>>>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>>>>>>>>>
>>>>>>>>> The above only says thos are _unsupported_ - in other words the behavior
>>>>>>>>> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
>>>>>>>>> it doesn't mean they are used that way.  Remember that vmalloc uses
>>>>>>>>> some hardcoded GFP_KERNEL allocations.  So while I could be really
>>>>>>>>> strict about this and mask away these flags I doubt this is worth the
>>>>>>>>> additional code.
>>>>>>>>
>>>>>>>> I do wonder about passing those flags through to kmalloc. Maybe it is worth
>>>>>>>> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
>>>>>>>> insulation from any future changes to the implementation of kmalloc, and it
>>>>>>>> also makes the documentation more believable.
>>>>>>>
>>>>>>> I am not really convinced that we should take an extra steps for these
>>>>>>> flags. There are no existing users for those flags and new users should
>>>>>>> follow the documentation.
>>>>>>
>>>>>> OK, let's just fortify the documentation ever so slightly, then, so that
>>>>>> users are more likely to do the right thing. How's this sound:
>>>>>>
>>>>>> * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
>>>>>> * though the current implementation passes the flags on through to kmalloc and
>>>>>> * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
>>>>>> * should not pass in these flags.)
>>>>>> *
>>>>>> * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
>>>>>>
>>>>>>
>>>>>> ? Or is that documentation overkill?
>>>>>
>>>>> Dunno, it sounds like an overkill to me. It is telling more than
>>>>> necessary. If we want to be so vocal about gfp flags then we would have
>>>>> to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
>>>>> supported for vmalloc while unsupported for kmalloc. I am pretty sure
>>>>> there would be other gfp flags to consider and then this would grow
>>>>> borringly large and uninteresting to the point when people simply stop
>>>>> reading it. Let's just be as simple as possible.
>>>>
>>>> Agreed, on the simplicity point: simple and clear is ideal. But here, it's
>>>> merely short, and not quite simple. :)  People will look at that short bit
>>>> of documentation, and then notice that the flags are, in fact, all passed
>>>> right on through down to both kmalloc_node and __vmalloc_node_flags.
>>>>
>>>> If you don't want too much documentation, then I'd be inclined to say
>>>> something higher-level, about the intent, rather than mentioning those two
>>>> flags directly. Because as it stands, the documentation contradicts what the
>>>> code does.
>>>
>>> Feel free to suggest a better wording. I am, of course, open to any
>>> changes.
>>
>> OK, here's the best I've got, I tried to keep it concise, but (as you
>> suspected) I'm not sure it's actually any better than the original:
>>
>>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>>  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
>>  * (<=64KB) allocations, during the kmalloc attempt.
>
>> __GFP_REPEAT is fully
>>  * honored for  all allocation sizes during the second part: the vmalloc attempt.
>
> this is not true to be really precise because vmalloc doesn't respect
> the given gfp mask all the way down (look at the pte initialization).
>

I'm having some difficulty in locating that pte initialization part, am I on the 
wrong code path? Here's what I checked, before making the claim about __GFP_REPEAT 
being honored:

kvmalloc_node
   __vmalloc_node_flags
     __vmalloc_node
       __vmalloc_node_range
         __vmalloc_area_node
             alloc_pages_node
               __alloc_pages_node
                 __alloc_pages
                   __alloc_pages_nodemask
                     __alloc_pages_slowpath


...and __alloc_pages_slowpath does the __GFP_REPEAT handling:

     /*
      * Do not retry costly high order allocations unless they are
      * __GFP_REPEAT
      */
     if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
         goto nopage;

thanks,
john h

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-19  8:37                         ` John Hubbard
  0 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-19  8:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o



On 01/18/2017 12:21 AM, Michal Hocko wrote:
> On Tue 17-01-17 21:59:13, John Hubbard wrote:
>>
>> On 01/16/2017 11:51 PM, Michal Hocko wrote:
>>> On Mon 16-01-17 13:57:43, John Hubbard wrote:
>>>>
>>>>
>>>> On 01/16/2017 01:48 PM, Michal Hocko wrote:
>>>>> On Mon 16-01-17 13:15:08, John Hubbard wrote:
>>>>>>
>>>>>>
>>>>>> On 01/16/2017 11:40 AM, Michal Hocko wrote:
>>>>>>> On Mon 16-01-17 11:09:37, John Hubbard wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/16/2017 12:47 AM, Michal Hocko wrote:
>>>>>>>>> On Sun 15-01-17 20:34:13, John Hubbard wrote:
>>>>>>> [...]
>>>>>>>>>> Is that "Reclaim modifiers" line still true, or is it a leftover from an
>>>>>>>>>> earlier approach? I am having trouble reconciling it with rest of the
>>>>>>>>>> patchset, because:
>>>>>>>>>>
>>>>>>>>>> a) the flags argument below is effectively passed on to either kmalloc_node
>>>>>>>>>> (possibly adding, but not removing flags), or to __vmalloc_node_flags.
>>>>>>>>>
>>>>>>>>> The above only says thos are _unsupported_ - in other words the behavior
>>>>>>>>> is not defined. Even if flags are passed down to kmalloc resp. vmalloc
>>>>>>>>> it doesn't mean they are used that way.  Remember that vmalloc uses
>>>>>>>>> some hardcoded GFP_KERNEL allocations.  So while I could be really
>>>>>>>>> strict about this and mask away these flags I doubt this is worth the
>>>>>>>>> additional code.
>>>>>>>>
>>>>>>>> I do wonder about passing those flags through to kmalloc. Maybe it is worth
>>>>>>>> stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides some
>>>>>>>> insulation from any future changes to the implementation of kmalloc, and it
>>>>>>>> also makes the documentation more believable.
>>>>>>>
>>>>>>> I am not really convinced that we should take an extra steps for these
>>>>>>> flags. There are no existing users for those flags and new users should
>>>>>>> follow the documentation.
>>>>>>
>>>>>> OK, let's just fortify the documentation ever so slightly, then, so that
>>>>>> users are more likely to do the right thing. How's this sound:
>>>>>>
>>>>>> * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. (Even
>>>>>> * though the current implementation passes the flags on through to kmalloc and
>>>>>> * vmalloc, that is done for efficiency and to avoid unnecessary code. The caller
>>>>>> * should not pass in these flags.)
>>>>>> *
>>>>>> * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
>>>>>>
>>>>>>
>>>>>> ? Or is that documentation overkill?
>>>>>
>>>>> Dunno, it sounds like an overkill to me. It is telling more than
>>>>> necessary. If we want to be so vocal about gfp flags then we would have
>>>>> to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
>>>>> supported for vmalloc while unsupported for kmalloc. I am pretty sure
>>>>> there would be other gfp flags to consider and then this would grow
>>>>> borringly large and uninteresting to the point when people simply stop
>>>>> reading it. Let's just be as simple as possible.
>>>>
>>>> Agreed, on the simplicity point: simple and clear is ideal. But here, it's
>>>> merely short, and not quite simple. :)  People will look at that short bit
>>>> of documentation, and then notice that the flags are, in fact, all passed
>>>> right on through down to both kmalloc_node and __vmalloc_node_flags.
>>>>
>>>> If you don't want too much documentation, then I'd be inclined to say
>>>> something higher-level, about the intent, rather than mentioning those two
>>>> flags directly. Because as it stands, the documentation contradicts what the
>>>> code does.
>>>
>>> Feel free to suggest a better wording. I am, of course, open to any
>>> changes.
>>
>> OK, here's the best I've got, I tried to keep it concise, but (as you
>> suspected) I'm not sure it's actually any better than the original:
>>
>>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>>  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
>>  * (<=64KB) allocations, during the kmalloc attempt.
>
>> __GFP_REPEAT is fully
>>  * honored for  all allocation sizes during the second part: the vmalloc attempt.
>
> this is not true to be really precise because vmalloc doesn't respect
> the given gfp mask all the way down (look at the pte initialization).
>

I'm having some difficulty in locating that pte initialization part, am I on the 
wrong code path? Here's what I checked, before making the claim about __GFP_REPEAT 
being honored:

kvmalloc_node
   __vmalloc_node_flags
     __vmalloc_node
       __vmalloc_node_range
         __vmalloc_area_node
             alloc_pages_node
               __alloc_pages_node
                 __alloc_pages
                   __alloc_pages_nodemask
                     __alloc_pages_slowpath


...and __alloc_pages_slowpath does the __GFP_REPEAT handling:

     /*
      * Do not retry costly high order allocations unless they are
      * __GFP_REPEAT
      */
     if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
         goto nopage;

thanks,
john h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-19  8:37                         ` John Hubbard
@ 2017-01-19  8:45                           ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-19  8:45 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Thu 19-01-17 00:37:08, John Hubbard wrote:
> 
> 
> On 01/18/2017 12:21 AM, Michal Hocko wrote:
> > On Tue 17-01-17 21:59:13, John Hubbard wrote:
[...]
> > >  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
> > >  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
> > >  * (<=64KB) allocations, during the kmalloc attempt.
> > 
> > > __GFP_REPEAT is fully
> > >  * honored for  all allocation sizes during the second part: the vmalloc attempt.
> > 
> > this is not true to be really precise because vmalloc doesn't respect
> > the given gfp mask all the way down (look at the pte initialization).
> > 
> 
> I'm having some difficulty in locating that pte initialization part, am I on
> the wrong code path? Here's what I checked, before making the claim about
> __GFP_REPEAT being honored:
> 
> kvmalloc_node
>   __vmalloc_node_flags
>     __vmalloc_node
>       __vmalloc_node_range
>         __vmalloc_area_node
	    map_vm_area
	      vmap_page_range
	        vmap_page_range_noflush
		  vmap_pud_range
		    pud_alloc
		      __pud_alloc
		        pud_alloc_one

pud will be allocated but the same pattern repeats on the pmd and pte
levels. This is btw. one of the reasons why vmalloc with gfp flags is
tricky!

moreover
>             alloc_pages_node

this is order-0 request so...

>               __alloc_pages_node
>                 __alloc_pages
>                   __alloc_pages_nodemask
>                     __alloc_pages_slowpath
> 
> 
> ...and __alloc_pages_slowpath does the __GFP_REPEAT handling:
> 
>     /*
>      * Do not retry costly high order allocations unless they are
>      * __GFP_REPEAT
>      */
>     if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
>         goto nopage;

... this doesn't apply


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-19  8:45                           ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-19  8:45 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Thu 19-01-17 00:37:08, John Hubbard wrote:
> 
> 
> On 01/18/2017 12:21 AM, Michal Hocko wrote:
> > On Tue 17-01-17 21:59:13, John Hubbard wrote:
[...]
> > >  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
> > >  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
> > >  * (<=64KB) allocations, during the kmalloc attempt.
> > 
> > > __GFP_REPEAT is fully
> > >  * honored for  all allocation sizes during the second part: the vmalloc attempt.
> > 
> > this is not true to be really precise because vmalloc doesn't respect
> > the given gfp mask all the way down (look at the pte initialization).
> > 
> 
> I'm having some difficulty in locating that pte initialization part, am I on
> the wrong code path? Here's what I checked, before making the claim about
> __GFP_REPEAT being honored:
> 
> kvmalloc_node
>   __vmalloc_node_flags
>     __vmalloc_node
>       __vmalloc_node_range
>         __vmalloc_area_node
	    map_vm_area
	      vmap_page_range
	        vmap_page_range_noflush
		  vmap_pud_range
		    pud_alloc
		      __pud_alloc
		        pud_alloc_one

pud will be allocated but the same pattern repeats on the pmd and pte
levels. This is btw. one of the reasons why vmalloc with gfp flags is
tricky!

moreover
>             alloc_pages_node

this is order-0 request so...

>               __alloc_pages_node
>                 __alloc_pages
>                   __alloc_pages_nodemask
>                     __alloc_pages_slowpath
> 
> 
> ...and __alloc_pages_slowpath does the __GFP_REPEAT handling:
> 
>     /*
>      * Do not retry costly high order allocations unless they are
>      * __GFP_REPEAT
>      */
>     if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
>         goto nopage;

... this doesn't apply


-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-19  8:45                           ` Michal Hocko
@ 2017-01-19  9:09                             ` John Hubbard
  -1 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-19  9:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On 01/19/2017 12:45 AM, Michal Hocko wrote:
> On Thu 19-01-17 00:37:08, John Hubbard wrote:
>>
>>
>> On 01/18/2017 12:21 AM, Michal Hocko wrote:
>>> On Tue 17-01-17 21:59:13, John Hubbard wrote:
> [...]
>>>>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>>>>  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
>>>>  * (<=64KB) allocations, during the kmalloc attempt.
>>>
>>>> __GFP_REPEAT is fully
>>>>  * honored for  all allocation sizes during the second part: the vmalloc attempt.
>>>
>>> this is not true to be really precise because vmalloc doesn't respect
>>> the given gfp mask all the way down (look at the pte initialization).
>>>
>>
>> I'm having some difficulty in locating that pte initialization part, am I on
>> the wrong code path? Here's what I checked, before making the claim about
>> __GFP_REPEAT being honored:
>>
>> kvmalloc_node
>>   __vmalloc_node_flags
>>     __vmalloc_node
>>       __vmalloc_node_range
>>         __vmalloc_area_node
> 	    map_vm_area
> 	      vmap_page_range
> 	        vmap_page_range_noflush
> 		  vmap_pud_range
> 		    pud_alloc
> 		      __pud_alloc
> 		        pud_alloc_one
>
> pud will be allocated but the same pattern repeats on the pmd and pte
> levels. This is btw. one of the reasons why vmalloc with gfp flags is
> tricky!

Yes, I see that now, thank you for explaining, much appreciated. The flags are left 
way behind in the code path.

So that leaves us with maybe this for documentation?

  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
  * Passing in __GFP_REPEAT is supported, and will cause the following behavior:
  * for larger (>64KB) allocations, the first part (kmalloc) will do some
  * retrying, before falling back to vmalloc.


>
> moreover
>>             alloc_pages_node
>
> this is order-0 request so...
>
>>               __alloc_pages_node
>>                 __alloc_pages
>>                   __alloc_pages_nodemask
>>                     __alloc_pages_slowpath
>>
>>
>> ...and __alloc_pages_slowpath does the __GFP_REPEAT handling:
>>
>>     /*
>>      * Do not retry costly high order allocations unless they are
>>      * __GFP_REPEAT
>>      */
>>     if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
>>         goto nopage;
>
> ... this doesn't apply
>

yes, true.

thanks
john h

>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-19  9:09                             ` John Hubbard
  0 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-19  9:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On 01/19/2017 12:45 AM, Michal Hocko wrote:
> On Thu 19-01-17 00:37:08, John Hubbard wrote:
>>
>>
>> On 01/18/2017 12:21 AM, Michal Hocko wrote:
>>> On Tue 17-01-17 21:59:13, John Hubbard wrote:
> [...]
>>>>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>>>>  * Passing in __GFP_REPEAT is supported, but note that it is ignored for small
>>>>  * (<=64KB) allocations, during the kmalloc attempt.
>>>
>>>> __GFP_REPEAT is fully
>>>>  * honored for  all allocation sizes during the second part: the vmalloc attempt.
>>>
>>> this is not true to be really precise because vmalloc doesn't respect
>>> the given gfp mask all the way down (look at the pte initialization).
>>>
>>
>> I'm having some difficulty in locating that pte initialization part, am I on
>> the wrong code path? Here's what I checked, before making the claim about
>> __GFP_REPEAT being honored:
>>
>> kvmalloc_node
>>   __vmalloc_node_flags
>>     __vmalloc_node
>>       __vmalloc_node_range
>>         __vmalloc_area_node
> 	    map_vm_area
> 	      vmap_page_range
> 	        vmap_page_range_noflush
> 		  vmap_pud_range
> 		    pud_alloc
> 		      __pud_alloc
> 		        pud_alloc_one
>
> pud will be allocated but the same pattern repeats on the pmd and pte
> levels. This is btw. one of the reasons why vmalloc with gfp flags is
> tricky!

Yes, I see that now, thank you for explaining, much appreciated. The flags are left 
way behind in the code path.

So that leaves us with maybe this for documentation?

  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
  * Passing in __GFP_REPEAT is supported, and will cause the following behavior:
  * for larger (>64KB) allocations, the first part (kmalloc) will do some
  * retrying, before falling back to vmalloc.


>
> moreover
>>             alloc_pages_node
>
> this is order-0 request so...
>
>>               __alloc_pages_node
>>                 __alloc_pages
>>                   __alloc_pages_nodemask
>>                     __alloc_pages_slowpath
>>
>>
>> ...and __alloc_pages_slowpath does the __GFP_REPEAT handling:
>>
>>     /*
>>      * Do not retry costly high order allocations unless they are
>>      * __GFP_REPEAT
>>      */
>>     if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
>>         goto nopage;
>
> ... this doesn't apply
>

yes, true.

thanks
john h

>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-19  9:09                             ` John Hubbard
@ 2017-01-19  9:56                               ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-19  9:56 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Thu 19-01-17 01:09:35, John Hubbard wrote:
[...]
> So that leaves us with maybe this for documentation?
> 
>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>  * Passing in __GFP_REPEAT is supported, and will cause the following behavior:
>  * for larger (>64KB) allocations, the first part (kmalloc) will do some
>  * retrying, before falling back to vmalloc.

I am worried this is just too vague. It doesn't really help user to
decide whether "do some retrying" is what he really want's or needs.

So I would rather see the following.
"
 * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
 * is supported only for large (>32kB) allocations and it should be used when using
 * kmalloc is preferable because vmalloc fallback has visible performance drawbacks.
"

I would also add
"
Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
"

Does it sound any better?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-19  9:56                               ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-19  9:56 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On Thu 19-01-17 01:09:35, John Hubbard wrote:
[...]
> So that leaves us with maybe this for documentation?
> 
>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>  * Passing in __GFP_REPEAT is supported, and will cause the following behavior:
>  * for larger (>64KB) allocations, the first part (kmalloc) will do some
>  * retrying, before falling back to vmalloc.

I am worried this is just too vague. It doesn't really help user to
decide whether "do some retrying" is what he really want's or needs.

So I would rather see the following.
"
 * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
 * is supported only for large (>32kB) allocations and it should be used when using
 * kmalloc is preferable because vmalloc fallback has visible performance drawbacks.
"

I would also add
"
Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
"

Does it sound any better?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-19  9:56                               ` Michal Hocko
@ 2017-01-19 21:28                                 ` John Hubbard
  -1 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-19 21:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On 01/19/2017 01:56 AM, Michal Hocko wrote:
> On Thu 19-01-17 01:09:35, John Hubbard wrote:
> [...]
>> So that leaves us with maybe this for documentation?
>>
>>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>>  * Passing in __GFP_REPEAT is supported, and will cause the following behavior:
>>  * for larger (>64KB) allocations, the first part (kmalloc) will do some
>>  * retrying, before falling back to vmalloc.
>
> I am worried this is just too vague. It doesn't really help user to
> decide whether "do some retrying" is what he really want's or needs.
>
> So I would rather see the following.
> "
>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
>  * is supported only for large (>32kB) allocations and it should be used when using
>  * kmalloc is preferable because vmalloc fallback has visible performance drawbacks.
> "
>
> I would also add
> "
> Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
> "
>
> Does it sound any better?

Yes, that is good. I like that it helps guide the user. Here's some proposed optional grammar 
tweaks, but even without these, the above is understandable, so either way, I'm happy now:

  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
  * is supported only for large (>32kB) allocations, and it should be used only if
  * kmalloc is preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
  * Please consult with mm people before using any gfp flags other than GFP_KERNEL.

thanks
john h

> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-19 21:28                                 ` John Hubbard
  0 siblings, 0 replies; 180+ messages in thread
From: John Hubbard @ 2017-01-19 21:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o

On 01/19/2017 01:56 AM, Michal Hocko wrote:
> On Thu 19-01-17 01:09:35, John Hubbard wrote:
> [...]
>> So that leaves us with maybe this for documentation?
>>
>>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL should not be passed in.
>>  * Passing in __GFP_REPEAT is supported, and will cause the following behavior:
>>  * for larger (>64KB) allocations, the first part (kmalloc) will do some
>>  * retrying, before falling back to vmalloc.
>
> I am worried this is just too vague. It doesn't really help user to
> decide whether "do some retrying" is what he really want's or needs.
>
> So I would rather see the following.
> "
>  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
>  * is supported only for large (>32kB) allocations and it should be used when using
>  * kmalloc is preferable because vmalloc fallback has visible performance drawbacks.
> "
>
> I would also add
> "
> Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
> "
>
> Does it sound any better?

Yes, that is good. I like that it helps guide the user. Here's some proposed optional grammar 
tweaks, but even without these, the above is understandable, so either way, I'm happy now:

  * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
  * is supported only for large (>32kB) allocations, and it should be used only if
  * kmalloc is preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
  * Please consult with mm people before using any gfp flags other than GFP_KERNEL.

thanks
john h

> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-12 17:37       ` Michal Hocko
  (?)
@ 2017-01-20 13:41         ` Vlastimil Babka
  -1 siblings, 0 replies; 180+ messages in thread
From: Vlastimil Babka @ 2017-01-20 13:41 UTC (permalink / raw)
  To: Michal Hocko, Kees Cook
  Cc: Andrew Morton, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, Linux-MM, LKML, Martin Schwidefsky, Heiko Carstens,
	Herbert Xu, Anton Vorontsov, Colin Cross, Tony Luck,
	Rafael J. Wysocki, Ben Skeggs, Kent Overstreet, Santosh Raspatur,
	Hariprasad S, Tariq Toukan, Yishai Hadas, Dan Williams,
	Oleg Drokin, Andreas Dilger, Boris Ostrovsky, David Sterba, Yan,
	Zheng, Ilya Dryomov, Alexei Starovoitov, Eric Dumazet,
	Network Development

On 01/12/2017 06:37 PM, Michal Hocko wrote:
> On Thu 12-01-17 09:26:09, Kees Cook wrote:
>> On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> [...]
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index 4f74511015b8..e6bbb33d2956 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>>>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>>>                 return -EINVAL;
>>>
>>> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
>>> -                            GFP_KERNEL | __GFP_NOWARN);
>>> -       if (!keys)
>>> -               keys = vmalloc(sizeof(uint8_t) * args->count);
>>> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
>>
>> Before doing this conversion, can we add a kvmalloc_array() API? This
>> conversion could allow for the reintroduction of integer overflow
>> flaws. (This particular situation isn't at risk since ->count is
>> checked, but I'd prefer we not create a risky set of examples for
>> using kvmalloc.)
> 
> Well, I am not opposed to kvmalloc_array but I would argue that this
> conversion cannot introduce new overflow issues. The code would have
> to be broken already because even though kmalloc_array checks for the
> overflow but vmalloc fallback doesn't...

Yeah I agree, but if some of the places were really wrong, after the
conversion we won't see them anymore.

> If there is a general interest for this API I can add it.

I think it would be better, yes.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-20 13:41         ` Vlastimil Babka
  0 siblings, 0 replies; 180+ messages in thread
From: Vlastimil Babka @ 2017-01-20 13:41 UTC (permalink / raw)
  To: Michal Hocko, Kees Cook
  Cc: Andrew Morton, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, Linux-MM, LKML, Martin Schwidefsky, Heiko Carstens,
	Herbert Xu, Anton Vorontsov, Colin Cross, Tony Luck,
	Rafael J. Wysocki, Ben Skeggs, Kent Overstreet, Santosh Raspatur,
	Hariprasad S, Tariq Toukan, Yishai Hadas, Dan Williams,
	Oleg Drokin, Andreas Dilger, Bor

On 01/12/2017 06:37 PM, Michal Hocko wrote:
> On Thu 12-01-17 09:26:09, Kees Cook wrote:
>> On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> [...]
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index 4f74511015b8..e6bbb33d2956 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>>>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>>>                 return -EINVAL;
>>>
>>> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
>>> -                            GFP_KERNEL | __GFP_NOWARN);
>>> -       if (!keys)
>>> -               keys = vmalloc(sizeof(uint8_t) * args->count);
>>> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
>>
>> Before doing this conversion, can we add a kvmalloc_array() API? This
>> conversion could allow for the reintroduction of integer overflow
>> flaws. (This particular situation isn't at risk since ->count is
>> checked, but I'd prefer we not create a risky set of examples for
>> using kvmalloc.)
> 
> Well, I am not opposed to kvmalloc_array but I would argue that this
> conversion cannot introduce new overflow issues. The code would have
> to be broken already because even though kmalloc_array checks for the
> overflow but vmalloc fallback doesn't...

Yeah I agree, but if some of the places were really wrong, after the
conversion we won't see them anymore.

> If there is a general interest for this API I can add it.

I think it would be better, yes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-20 13:41         ` Vlastimil Babka
  0 siblings, 0 replies; 180+ messages in thread
From: Vlastimil Babka @ 2017-01-20 13:41 UTC (permalink / raw)
  To: Michal Hocko, Kees Cook
  Cc: Andrew Morton, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, Linux-MM, LKML, Martin Schwidefsky, Heiko Carstens,
	Herbert Xu, Anton Vorontsov, Colin Cross, Tony Luck,
	Rafael J. Wysocki, Ben Skeggs, Kent Overstreet, Santosh Raspatur,
	Hariprasad S, Tariq Toukan, Yishai Hadas, Dan Williams,
	Oleg Drokin, Andreas Dilger, Boris Ostrovsky, David Sterba, Yan,
	Zheng, Ilya Dryomov, Alexei Starovoitov, Eric Dumazet,
	Network Development

On 01/12/2017 06:37 PM, Michal Hocko wrote:
> On Thu 12-01-17 09:26:09, Kees Cook wrote:
>> On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> [...]
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index 4f74511015b8..e6bbb33d2956 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>>>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
>>>                 return -EINVAL;
>>>
>>> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
>>> -                            GFP_KERNEL | __GFP_NOWARN);
>>> -       if (!keys)
>>> -               keys = vmalloc(sizeof(uint8_t) * args->count);
>>> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
>>
>> Before doing this conversion, can we add a kvmalloc_array() API? This
>> conversion could allow for the reintroduction of integer overflow
>> flaws. (This particular situation isn't at risk since ->count is
>> checked, but I'd prefer we not create a risky set of examples for
>> using kvmalloc.)
> 
> Well, I am not opposed to kvmalloc_array but I would argue that this
> conversion cannot introduce new overflow issues. The code would have
> to be broken already because even though kmalloc_array checks for the
> overflow but vmalloc fallback doesn't...

Yeah I agree, but if some of the places were really wrong, after the
conversion we won't see them anymore.

> If there is a general interest for this API I can add it.

I think it would be better, yes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-20 13:41         ` Vlastimil Babka
  (?)
@ 2017-01-24 15:00           ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-24 15:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Andrew Morton, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Network Development

On Fri 20-01-17 14:41:37, Vlastimil Babka wrote:
> On 01/12/2017 06:37 PM, Michal Hocko wrote:
> > On Thu 12-01-17 09:26:09, Kees Cook wrote:
> >> On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> >>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> >>> index 4f74511015b8..e6bbb33d2956 100644
> >>> --- a/arch/s390/kvm/kvm-s390.c
> >>> +++ b/arch/s390/kvm/kvm-s390.c
> >>> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >>>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >>>                 return -EINVAL;
> >>>
> >>> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> >>> -                            GFP_KERNEL | __GFP_NOWARN);
> >>> -       if (!keys)
> >>> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> >>> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
> >>
> >> Before doing this conversion, can we add a kvmalloc_array() API? This
> >> conversion could allow for the reintroduction of integer overflow
> >> flaws. (This particular situation isn't at risk since ->count is
> >> checked, but I'd prefer we not create a risky set of examples for
> >> using kvmalloc.)
> > 
> > Well, I am not opposed to kvmalloc_array but I would argue that this
> > conversion cannot introduce new overflow issues. The code would have
> > to be broken already because even though kmalloc_array checks for the
> > overflow but vmalloc fallback doesn't...
> 
> Yeah I agree, but if some of the places were really wrong, after the
> conversion we won't see them anymore.
> 
> > If there is a general interest for this API I can add it.
> 
> I think it would be better, yes.

OK, fair enough. I will fold the following into the original patch. I
was little bit reluctant to create kvcalloc so I've made the original
callers more talkative and added | __GFP_ZERO. To be honest I do not
really like how kcalloc...
---
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e6bbb33d2956..aa558dce6bb4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1126,7 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
+	keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
@@ -1168,7 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
+	keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index 82354fd0a87e..6583d4601480 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -115,7 +115,7 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
 
 	for (i = 0; i <= buddy->max_order; ++i) {
 		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-		buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
+		buddy->bits[i] = kvmalloc_array(s, sizeof(long), GFP_KERNEL | __GFP_ZERO);
 		if (!buddy->bits[i])
 			goto err_out_free;
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 55fd570c3e1e..22c6e81d0c16 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -498,6 +498,14 @@ static inline void *kvzalloc(size_t size, gfp_t flags)
 	return kvmalloc(size, flags | __GFP_ZERO);
 }
 
+static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
+{
+	if (size != 0 && n > SIZE_MAX / size)
+		return NULL;
+
+	return kvmalloc(n * size, flags);
+}
+
 extern void kvfree(const void *addr);
 
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4ca30a951bbc..58ec07946fe6 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -320,7 +320,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		goto free_htab;
 
 	err = -ENOMEM;
-	htab->buckets = kvmalloc(htab->n_buckets * sizeof(struct bucket), GFP_USER);
+	htab->buckets = kvmalloc_array(htab->n_buckets, sizeof(struct bucket), GFP_USER);
 	if (!htab->buckets)
 		goto free_htab;
 
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 45c17b5562b5..8f9caf095172 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -957,7 +957,7 @@ EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
 {
-	return kvmalloc(n * sizeof(struct page *), GFP_KERNEL);
+	return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL);
 }
 
 static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index a46a9fd8b540..0c4848bd86c4 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -687,7 +687,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 		/* no more locks than number of hash buckets */
 		nblocks = min(nblocks, hashinfo->ehash_mask + 1);
 
-		hashinfo->ehash_locks = kvmalloc(nblocks * locksz, GFP_KERNEL);
+		hashinfo->ehash_locks = kvmalloc_array(nblocks, locksz, GFP_KERNEL);
 		if (!hashinfo->ehash_locks)
 			return -ENOMEM;
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index cdc55d5ee4ad..eca16612b1ae 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-	if (size < (SIZE_MAX / sizeof(unsigned int)))
-		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
-
-	return NULL;
+	return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO);
 
 }
 EXPORT_SYMBOL(xt_alloc_entry_offsets);
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 30d6a39fd2c8..47cbfae44898 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -431,7 +431,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 	if (mask != q->tab_mask) {
 		struct sk_buff **ntab;
 
-		ntab = kvzalloc((mask + 1) * sizeof(struct sk_buff *), GFP_KERNEL);
+		ntab = kvmalloc_array((mask + 1), sizeof(struct sk_buff *), GFP_KERNEL | __GFP_ZERO);
 		if (!ntab)
 			return -ENOMEM;
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-24 15:00           ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-24 15:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Andrew Morton, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan

On Fri 20-01-17 14:41:37, Vlastimil Babka wrote:
> On 01/12/2017 06:37 PM, Michal Hocko wrote:
> > On Thu 12-01-17 09:26:09, Kees Cook wrote:
> >> On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> >>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> >>> index 4f74511015b8..e6bbb33d2956 100644
> >>> --- a/arch/s390/kvm/kvm-s390.c
> >>> +++ b/arch/s390/kvm/kvm-s390.c
> >>> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >>>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >>>                 return -EINVAL;
> >>>
> >>> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> >>> -                            GFP_KERNEL | __GFP_NOWARN);
> >>> -       if (!keys)
> >>> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> >>> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
> >>
> >> Before doing this conversion, can we add a kvmalloc_array() API? This
> >> conversion could allow for the reintroduction of integer overflow
> >> flaws. (This particular situation isn't at risk since ->count is
> >> checked, but I'd prefer we not create a risky set of examples for
> >> using kvmalloc.)
> > 
> > Well, I am not opposed to kvmalloc_array but I would argue that this
> > conversion cannot introduce new overflow issues. The code would have
> > to be broken already because even though kmalloc_array checks for the
> > overflow but vmalloc fallback doesn't...
> 
> Yeah I agree, but if some of the places were really wrong, after the
> conversion we won't see them anymore.
> 
> > If there is a general interest for this API I can add it.
> 
> I think it would be better, yes.

OK, fair enough. I will fold the following into the original patch. I
was little bit reluctant to create kvcalloc so I've made the original
callers more talkative and added | __GFP_ZERO. To be honest I do not
really like how kcalloc...
---
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e6bbb33d2956..aa558dce6bb4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1126,7 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
+	keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
@@ -1168,7 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
+	keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index 82354fd0a87e..6583d4601480 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -115,7 +115,7 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
 
 	for (i = 0; i <= buddy->max_order; ++i) {
 		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-		buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
+		buddy->bits[i] = kvmalloc_array(s, sizeof(long), GFP_KERNEL | __GFP_ZERO);
 		if (!buddy->bits[i])
 			goto err_out_free;
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 55fd570c3e1e..22c6e81d0c16 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -498,6 +498,14 @@ static inline void *kvzalloc(size_t size, gfp_t flags)
 	return kvmalloc(size, flags | __GFP_ZERO);
 }
 
+static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
+{
+	if (size != 0 && n > SIZE_MAX / size)
+		return NULL;
+
+	return kvmalloc(n * size, flags);
+}
+
 extern void kvfree(const void *addr);
 
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4ca30a951bbc..58ec07946fe6 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -320,7 +320,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		goto free_htab;
 
 	err = -ENOMEM;
-	htab->buckets = kvmalloc(htab->n_buckets * sizeof(struct bucket), GFP_USER);
+	htab->buckets = kvmalloc_array(htab->n_buckets, sizeof(struct bucket), GFP_USER);
 	if (!htab->buckets)
 		goto free_htab;
 
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 45c17b5562b5..8f9caf095172 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -957,7 +957,7 @@ EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
 {
-	return kvmalloc(n * sizeof(struct page *), GFP_KERNEL);
+	return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL);
 }
 
 static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index a46a9fd8b540..0c4848bd86c4 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -687,7 +687,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 		/* no more locks than number of hash buckets */
 		nblocks = min(nblocks, hashinfo->ehash_mask + 1);
 
-		hashinfo->ehash_locks = kvmalloc(nblocks * locksz, GFP_KERNEL);
+		hashinfo->ehash_locks = kvmalloc_array(nblocks, locksz, GFP_KERNEL);
 		if (!hashinfo->ehash_locks)
 			return -ENOMEM;
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index cdc55d5ee4ad..eca16612b1ae 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-	if (size < (SIZE_MAX / sizeof(unsigned int)))
-		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
-
-	return NULL;
+	return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO);
 
 }
 EXPORT_SYMBOL(xt_alloc_entry_offsets);
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 30d6a39fd2c8..47cbfae44898 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -431,7 +431,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 	if (mask != q->tab_mask) {
 		struct sk_buff **ntab;
 
-		ntab = kvzalloc((mask + 1) * sizeof(struct sk_buff *), GFP_KERNEL);
+		ntab = kvmalloc_array((mask + 1), sizeof(struct sk_buff *), GFP_KERNEL | __GFP_ZERO);
 		if (!ntab)
 			return -ENOMEM;
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-24 15:00           ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-24 15:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Andrew Morton, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Network Development

On Fri 20-01-17 14:41:37, Vlastimil Babka wrote:
> On 01/12/2017 06:37 PM, Michal Hocko wrote:
> > On Thu 12-01-17 09:26:09, Kees Cook wrote:
> >> On Thu, Jan 12, 2017 at 7:37 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> >>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> >>> index 4f74511015b8..e6bbb33d2956 100644
> >>> --- a/arch/s390/kvm/kvm-s390.c
> >>> +++ b/arch/s390/kvm/kvm-s390.c
> >>> @@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> >>>         if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
> >>>                 return -EINVAL;
> >>>
> >>> -       keys = kmalloc_array(args->count, sizeof(uint8_t),
> >>> -                            GFP_KERNEL | __GFP_NOWARN);
> >>> -       if (!keys)
> >>> -               keys = vmalloc(sizeof(uint8_t) * args->count);
> >>> +       keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
> >>
> >> Before doing this conversion, can we add a kvmalloc_array() API? This
> >> conversion could allow for the reintroduction of integer overflow
> >> flaws. (This particular situation isn't at risk since ->count is
> >> checked, but I'd prefer we not create a risky set of examples for
> >> using kvmalloc.)
> > 
> > Well, I am not opposed to kvmalloc_array but I would argue that this
> > conversion cannot introduce new overflow issues. The code would have
> > to be broken already because even though kmalloc_array checks for the
> > overflow but vmalloc fallback doesn't...
> 
> Yeah I agree, but if some of the places were really wrong, after the
> conversion we won't see them anymore.
> 
> > If there is a general interest for this API I can add it.
> 
> I think it would be better, yes.

OK, fair enough. I will fold the following into the original patch. I
was little bit reluctant to create kvcalloc so I've made the original
callers more talkative and added | __GFP_ZERO. To be honest I do not
really like how kcalloc...
---
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e6bbb33d2956..aa558dce6bb4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1126,7 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
+	keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
@@ -1168,7 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
+	keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c
index 82354fd0a87e..6583d4601480 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
@@ -115,7 +115,7 @@ static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
 
 	for (i = 0; i <= buddy->max_order; ++i) {
 		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-		buddy->bits[i] = kvzalloc(s * sizeof(long), GFP_KERNEL);
+		buddy->bits[i] = kvmalloc_array(s, sizeof(long), GFP_KERNEL | __GFP_ZERO);
 		if (!buddy->bits[i])
 			goto err_out_free;
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 55fd570c3e1e..22c6e81d0c16 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -498,6 +498,14 @@ static inline void *kvzalloc(size_t size, gfp_t flags)
 	return kvmalloc(size, flags | __GFP_ZERO);
 }
 
+static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
+{
+	if (size != 0 && n > SIZE_MAX / size)
+		return NULL;
+
+	return kvmalloc(n * size, flags);
+}
+
 extern void kvfree(const void *addr);
 
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4ca30a951bbc..58ec07946fe6 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -320,7 +320,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		goto free_htab;
 
 	err = -ENOMEM;
-	htab->buckets = kvmalloc(htab->n_buckets * sizeof(struct bucket), GFP_USER);
+	htab->buckets = kvmalloc_array(htab->n_buckets, sizeof(struct bucket), GFP_USER);
 	if (!htab->buckets)
 		goto free_htab;
 
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 45c17b5562b5..8f9caf095172 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -957,7 +957,7 @@ EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
 {
-	return kvmalloc(n * sizeof(struct page *), GFP_KERNEL);
+	return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL);
 }
 
 static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index a46a9fd8b540..0c4848bd86c4 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -687,7 +687,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 		/* no more locks than number of hash buckets */
 		nblocks = min(nblocks, hashinfo->ehash_mask + 1);
 
-		hashinfo->ehash_locks = kvmalloc(nblocks * locksz, GFP_KERNEL);
+		hashinfo->ehash_locks = kvmalloc_array(nblocks, locksz, GFP_KERNEL);
 		if (!hashinfo->ehash_locks)
 			return -ENOMEM;
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index cdc55d5ee4ad..eca16612b1ae 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-	if (size < (SIZE_MAX / sizeof(unsigned int)))
-		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
-
-	return NULL;
+	return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO);
 
 }
 EXPORT_SYMBOL(xt_alloc_entry_offsets);
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 30d6a39fd2c8..47cbfae44898 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -431,7 +431,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 	if (mask != q->tab_mask) {
 		struct sk_buff **ntab;
 
-		ntab = kvzalloc((mask + 1) * sizeof(struct sk_buff *), GFP_KERNEL);
+		ntab = kvmalloc_array((mask + 1), sizeof(struct sk_buff *), GFP_KERNEL | __GFP_ZERO);
 		if (!ntab)
 			return -ENOMEM;
 
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-12 15:37 ` Michal Hocko
@ 2017-01-24 15:17   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-24 15:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Alexei Starovoitov, Anatoly Stepanov,
	Andreas Dilger, Andreas Dilger, Anton Vorontsov, Ben Skeggs,
	Boris Ostrovsky, Colin Cross, Dan Williams, David Sterba,
	Eric Dumazet, Eric Dumazet, Hariprasad S, Heiko Carstens,
	Herbert Xu, Ilya Dryomov, Kees Cook, Kent Overstreet,
	Martin Schwidefsky, Michael S. Tsirkin, Mike Snitzer,
	Oleg Drokin, Paolo Bonzini, Rafael J. Wysocki, Santosh Raspatur,
	Tariq Toukan, Theodore Ts'o, Tom Herbert, Tony Luck, Yan,
	Zheng, Yishai Hadas

On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> Hi,
> this has been previously posted as a single patch [1] but later on more
> built on top. It turned out that there are users who would like to have
> __GFP_REPEAT semantic. This is currently implemented for costly >64B
> requests. Doing the same for smaller requests would require to redefine
> __GFP_REPEAT semantic in the page allocator which is out of scope of
> this series.
> 
> There are many open coded kmalloc with vmalloc fallback instances in
> the tree.  Most of them are not careful enough or simply do not care
> about the underlying semantic of the kmalloc/page allocator which means
> that a) some vmalloc fallbacks are basically unreachable because the
> kmalloc part will keep retrying until it succeeds b) the page allocator
> can invoke a really disruptive steps like the OOM killer to move forward
> which doesn't sound appropriate when we consider that the vmalloc
> fallback is available.
> 
> As it can be seen implementing kvmalloc requires quite an intimate
> knowledge if the page allocator and the memory reclaim internals which
> strongly suggests that a helper should be implemented in the memory
> subsystem proper.
> 
> Most callers I could find have been converted to use the helper instead.
> This is patch 5. There are some more relying on __GFP_REPEAT in the
> networking stack which I have converted as well but considering we do
> not have a support for __GFP_REPEAT for requests smaller than 64kB I
> have marked it RFC.

Are there any more comments? I would really appreciate to hear from
networking folks before I resubmit the series.

Thanks!

> [1] http://lkml.kernel.org/r/20170102133700.1734-1-mhocko@kernel.org
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-24 15:17   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-24 15:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Alexei Starovoitov, Anatoly Stepanov,
	Andreas Dilger, Andreas Dilger, Anton Vorontsov, Ben Skeggs,
	Boris Ostrovsky, Colin Cross, Dan Williams, David Sterba,
	Eric Dumazet, Eric Dumazet, Hariprasad S, Heiko Carstens,
	Herbert Xu, Ilya Dryomov, Kees Cook, Kent Overstreet,
	Martin Schwidefsky, Michael S. Tsirkin, Mike Snitzer,
	Oleg Drokin, Paolo Bonzini, Rafael J. Wysocki, Santosh Raspatur,
	Tariq Toukan, Theodore Ts'o, Tom Herbert, Tony Luck, Yan,
	Zheng, Yishai Hadas

On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> Hi,
> this has been previously posted as a single patch [1] but later on more
> built on top. It turned out that there are users who would like to have
> __GFP_REPEAT semantic. This is currently implemented for costly >64B
> requests. Doing the same for smaller requests would require to redefine
> __GFP_REPEAT semantic in the page allocator which is out of scope of
> this series.
> 
> There are many open coded kmalloc with vmalloc fallback instances in
> the tree.  Most of them are not careful enough or simply do not care
> about the underlying semantic of the kmalloc/page allocator which means
> that a) some vmalloc fallbacks are basically unreachable because the
> kmalloc part will keep retrying until it succeeds b) the page allocator
> can invoke a really disruptive steps like the OOM killer to move forward
> which doesn't sound appropriate when we consider that the vmalloc
> fallback is available.
> 
> As it can be seen implementing kvmalloc requires quite an intimate
> knowledge if the page allocator and the memory reclaim internals which
> strongly suggests that a helper should be implemented in the memory
> subsystem proper.
> 
> Most callers I could find have been converted to use the helper instead.
> This is patch 5. There are some more relying on __GFP_REPEAT in the
> networking stack which I have converted as well but considering we do
> not have a support for __GFP_REPEAT for requests smaller than 64kB I
> have marked it RFC.

Are there any more comments? I would really appreciate to hear from
networking folks before I resubmit the series.

Thanks!

> [1] http://lkml.kernel.org/r/20170102133700.1734-1-mhocko@kernel.org
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
  2017-01-12 15:37   ` Michal Hocko
@ 2017-01-24 15:40     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 180+ messages in thread
From: Michael S. Tsirkin @ 2017-01-24 15:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko

On Thu, Jan 12, 2017 at 04:37:13PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
> vhost_vsock because it would really like to prefer kmalloc to the
> vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
> allocation to vmalloc") for more context. Michael Tsirkin has also
> noted:
> "
> __GFP_REPEAT overhead is during allocation time.  Using vmalloc means all
> accesses are slowed down.  Allocation is not on data path, accesses are.
> "
> 
> The similar applies to other vhost_kvzalloc users.
> 
> Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two
> things to be careful about. First we should prevent from the OOM killer
> and so have to involve __GFP_NORETRY by default and secondly override
> __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored
> for !costly orders.
> 
> Supporting __GFP_REPEAT like semantic for !costly request is possible
> it would require changes in the page allocator. This is out of scope of
> this patch.
> 
> This patch shouldn't introduce any functional change.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Assuming the new APIs are upstream I see no reason
not to use them in vhost. For vhost bits:

Acked-by: Michael S. Tsirkin <mst@redhat.com>




> ---
>  drivers/vhost/net.c   |  9 +++------
>  drivers/vhost/vhost.c | 15 +++------------
>  drivers/vhost/vsock.c |  9 +++------
>  mm/util.c             | 17 ++++++++++++++---
>  4 files changed, 23 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc34653274a..105cd04c7414 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  	struct vhost_virtqueue **vqs;
>  	int i;
>  
> -	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!n) {
> -		n = vmalloc(sizeof *n);
> -		if (!n)
> -			return -ENOMEM;
> -	}
> +	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);
> +	if (!n)
> +		return -ENOMEM;
>  	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
>  		kvfree(n);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index d6432603880c..d2bf8a41f55e 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -515,18 +515,9 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
>  
> -static void *vhost_kvzalloc(unsigned long size)
> -{
> -	void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -
> -	if (!n)
> -		n = vzalloc(size);
> -	return n;
> -}
> -
>  struct vhost_umem *vhost_dev_reset_owner_prepare(void)
>  {
> -	return vhost_kvzalloc(sizeof(struct vhost_umem));
> +	return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
>  
> @@ -1190,7 +1181,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
>  
>  static struct vhost_umem *vhost_umem_alloc(void)
>  {
> -	struct vhost_umem *umem = vhost_kvzalloc(sizeof(*umem));
> +	struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL);
>  
>  	if (!umem)
>  		return NULL;
> @@ -1216,7 +1207,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  		return -EOPNOTSUPP;
>  	if (mem.nregions > max_mem_regions)
>  		return -E2BIG;
> -	newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions));
> +	newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
>  	if (!newmem)
>  		return -ENOMEM;
>  
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bbbf588540ed..7e0159867553 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -455,12 +455,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>  	/* This struct is large and allocation could fail, fall back to vmalloc
>  	 * if there is no other way.
>  	 */
> -	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!vsock) {
> -		vsock = vmalloc(sizeof(*vsock));
> -		if (!vsock)
> -			return -ENOMEM;
> -	}
> +	vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_REPEAT);
> +	if (!vsock)
> +		return -ENOMEM;
>  
>  	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
> diff --git a/mm/util.c b/mm/util.c
> index 7e0c240b5760..9306244b9f41 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
>   * Uses kmalloc to get the memory but if the allocation fails then falls back
>   * to the vmalloc allocator. Use kvfree for freeing the memory.
>   *
> - * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> + * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
> + * is supported only for large (>64kB) allocations
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> @@ -350,8 +351,18 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	 * Make sure that larger requests are not too disruptive - no OOM
>  	 * killer and no allocation failure warnings as we have a fallback
>  	 */
> -	if (size > PAGE_SIZE)
> -		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +	if (size > PAGE_SIZE) {
> +		kmalloc_flags |= __GFP_NOWARN;
> +
> +		/*
> +		 * We have to override __GFP_REPEAT by __GFP_NORETRY for !costly
> +		 * requests because there is no other way to tell the allocator
> +		 * that we want to fail rather than retry endlessly.
> +		 */
> +		if (!(kmalloc_flags & __GFP_REPEAT) ||
> +				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> +			kmalloc_flags |= __GFP_NORETRY;
> +	}
>  
>  	ret = kmalloc_node(size, kmalloc_flags, node);
>  
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB
@ 2017-01-24 15:40     ` Michael S. Tsirkin
  0 siblings, 0 replies; 180+ messages in thread
From: Michael S. Tsirkin @ 2017-01-24 15:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Michal Hocko

On Thu, Jan 12, 2017 at 04:37:13PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
> vhost_vsock because it would really like to prefer kmalloc to the
> vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
> allocation to vmalloc") for more context. Michael Tsirkin has also
> noted:
> "
> __GFP_REPEAT overhead is during allocation time.  Using vmalloc means all
> accesses are slowed down.  Allocation is not on data path, accesses are.
> "
> 
> The similar applies to other vhost_kvzalloc users.
> 
> Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two
> things to be careful about. First we should prevent from the OOM killer
> and so have to involve __GFP_NORETRY by default and secondly override
> __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored
> for !costly orders.
> 
> Supporting __GFP_REPEAT like semantic for !costly request is possible
> it would require changes in the page allocator. This is out of scope of
> this patch.
> 
> This patch shouldn't introduce any functional change.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Assuming the new APIs are upstream I see no reason
not to use them in vhost. For vhost bits:

Acked-by: Michael S. Tsirkin <mst@redhat.com>




> ---
>  drivers/vhost/net.c   |  9 +++------
>  drivers/vhost/vhost.c | 15 +++------------
>  drivers/vhost/vsock.c |  9 +++------
>  mm/util.c             | 17 ++++++++++++++---
>  4 files changed, 23 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc34653274a..105cd04c7414 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  	struct vhost_virtqueue **vqs;
>  	int i;
>  
> -	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!n) {
> -		n = vmalloc(sizeof *n);
> -		if (!n)
> -			return -ENOMEM;
> -	}
> +	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);
> +	if (!n)
> +		return -ENOMEM;
>  	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
>  		kvfree(n);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index d6432603880c..d2bf8a41f55e 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -515,18 +515,9 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
>  
> -static void *vhost_kvzalloc(unsigned long size)
> -{
> -	void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -
> -	if (!n)
> -		n = vzalloc(size);
> -	return n;
> -}
> -
>  struct vhost_umem *vhost_dev_reset_owner_prepare(void)
>  {
> -	return vhost_kvzalloc(sizeof(struct vhost_umem));
> +	return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
>  
> @@ -1190,7 +1181,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
>  
>  static struct vhost_umem *vhost_umem_alloc(void)
>  {
> -	struct vhost_umem *umem = vhost_kvzalloc(sizeof(*umem));
> +	struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL);
>  
>  	if (!umem)
>  		return NULL;
> @@ -1216,7 +1207,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  		return -EOPNOTSUPP;
>  	if (mem.nregions > max_mem_regions)
>  		return -E2BIG;
> -	newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions));
> +	newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
>  	if (!newmem)
>  		return -ENOMEM;
>  
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bbbf588540ed..7e0159867553 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -455,12 +455,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>  	/* This struct is large and allocation could fail, fall back to vmalloc
>  	 * if there is no other way.
>  	 */
> -	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!vsock) {
> -		vsock = vmalloc(sizeof(*vsock));
> -		if (!vsock)
> -			return -ENOMEM;
> -	}
> +	vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_REPEAT);
> +	if (!vsock)
> +		return -ENOMEM;
>  
>  	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
> diff --git a/mm/util.c b/mm/util.c
> index 7e0c240b5760..9306244b9f41 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -333,7 +333,8 @@ EXPORT_SYMBOL(vm_mmap);
>   * Uses kmalloc to get the memory but if the allocation fails then falls back
>   * to the vmalloc allocator. Use kvfree for freeing the memory.
>   *
> - * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
> + * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. __GFP_REPEAT
> + * is supported only for large (>64kB) allocations
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> @@ -350,8 +351,18 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	 * Make sure that larger requests are not too disruptive - no OOM
>  	 * killer and no allocation failure warnings as we have a fallback
>  	 */
> -	if (size > PAGE_SIZE)
> -		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +	if (size > PAGE_SIZE) {
> +		kmalloc_flags |= __GFP_NOWARN;
> +
> +		/*
> +		 * We have to override __GFP_REPEAT by __GFP_NORETRY for !costly
> +		 * requests because there is no other way to tell the allocator
> +		 * that we want to fail rather than retry endlessly.
> +		 */
> +		if (!(kmalloc_flags & __GFP_REPEAT) ||
> +				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> +			kmalloc_flags |= __GFP_NORETRY;
> +	}
>  
>  	ret = kmalloc_node(size, kmalloc_flags, node);
>  
> -- 
> 2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-24 15:17   ` Michal Hocko
@ 2017-01-24 16:00     ` Eric Dumazet
  -1 siblings, 0 replies; 180+ messages in thread
From: Eric Dumazet @ 2017-01-24 16:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Hariprasad S,
	Heiko Carstens, Herbert Xu, Ilya Dryomov, Kees Cook,
	Kent Overstreet, Martin Schwidefsky, Michael S. Tsirkin,
	Mike Snitzer, Oleg Drokin, Paolo Bonzini, Rafael J. Wysocki,
	Santosh Raspatur, Tariq Toukan, Theodore Ts'o, Tom Herbert,
	Tony Luck, Yan, Zheng, Yishai Hadas

On Tue, 2017-01-24 at 16:17 +0100, Michal Hocko wrote:
> On Thu 12-01-17 16:37:11, Michal Hocko wrote:

> Are there any more comments? I would really appreciate to hear from
> networking folks before I resubmit the series.

I do not see any issues right now.

I am happy to see this thing finally coming, after years of
resistance ;)

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-24 16:00     ` Eric Dumazet
  0 siblings, 0 replies; 180+ messages in thread
From: Eric Dumazet @ 2017-01-24 16:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Hariprasad S,
	Heiko Carstens, Herbert Xu, Ilya Dryomov, Kees Cook,
	Kent Overstreet, Martin Schwidefsky, Michael S. Tsirkin,
	Mike Snitzer, Oleg Drokin, Paolo Bonzini, Rafael J. Wysocki,
	Santosh Raspatur, Tariq Toukan, Theodore Ts'o, Tom Herbert,
	Tony Luck, Yan, Zheng, Yishai Hadas

On Tue, 2017-01-24 at 16:17 +0100, Michal Hocko wrote:
> On Thu 12-01-17 16:37:11, Michal Hocko wrote:

> Are there any more comments? I would really appreciate to hear from
> networking folks before I resubmit the series.

I do not see any issues right now.

I am happy to see this thing finally coming, after years of
resistance ;)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-24 15:17   ` Michal Hocko
@ 2017-01-24 19:17     ` Alexei Starovoitov
  -1 siblings, 0 replies; 180+ messages in thread
From: Alexei Starovoitov @ 2017-01-24 19:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Eric Dumazet,
	Hariprasad S, Heiko Carstens, Herbert Xu, Ilya Dryomov,
	Kees Cook, Kent Overstreet, Martin Schwidefsky,
	Michael S. Tsirkin, Mike Snitzer, Oleg Drokin, Paolo Bonzini,
	Rafael J. Wysocki, Santosh Raspatur, Tariq Toukan,
	Theodore Ts'o, Tom Herbert, Tony Luck, Yan, Zheng,
	Yishai Hadas, Daniel Borkmann

On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
> On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> > Hi,
> > this has been previously posted as a single patch [1] but later on more
> > built on top. It turned out that there are users who would like to have
> > __GFP_REPEAT semantic. This is currently implemented for costly >64B
> > requests. Doing the same for smaller requests would require to redefine
> > __GFP_REPEAT semantic in the page allocator which is out of scope of
> > this series.
> > 
> > There are many open coded kmalloc with vmalloc fallback instances in
> > the tree.  Most of them are not careful enough or simply do not care
> > about the underlying semantic of the kmalloc/page allocator which means
> > that a) some vmalloc fallbacks are basically unreachable because the
> > kmalloc part will keep retrying until it succeeds b) the page allocator
> > can invoke a really disruptive steps like the OOM killer to move forward
> > which doesn't sound appropriate when we consider that the vmalloc
> > fallback is available.
> > 
> > As it can be seen implementing kvmalloc requires quite an intimate
> > knowledge if the page allocator and the memory reclaim internals which
> > strongly suggests that a helper should be implemented in the memory
> > subsystem proper.
> > 
> > Most callers I could find have been converted to use the helper instead.
> > This is patch 5. There are some more relying on __GFP_REPEAT in the
> > networking stack which I have converted as well but considering we do
> > not have a support for __GFP_REPEAT for requests smaller than 64kB I
> > have marked it RFC.
> 
> Are there any more comments? I would really appreciate to hear from
> networking folks before I resubmit the series.

while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
Thanks

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-24 19:17     ` Alexei Starovoitov
  0 siblings, 0 replies; 180+ messages in thread
From: Alexei Starovoitov @ 2017-01-24 19:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Eric Dumazet,
	Hariprasad S, Heiko Carstens, Herbert Xu, Ilya Dryomov,
	Kees Cook, Kent Overstreet, Martin Schwidefsky,
	Michael S. Tsirkin, Mike Snitzer, Oleg Drokin, Paolo Bonzini,
	Rafael J. Wysocki, Santosh Raspatur, Tariq Toukan,
	Theodore Ts'o, Tom Herbert, Tony Luck, Yan, Zheng,
	Yishai Hadas, Daniel Borkmann

On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
> On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> > Hi,
> > this has been previously posted as a single patch [1] but later on more
> > built on top. It turned out that there are users who would like to have
> > __GFP_REPEAT semantic. This is currently implemented for costly >64B
> > requests. Doing the same for smaller requests would require to redefine
> > __GFP_REPEAT semantic in the page allocator which is out of scope of
> > this series.
> > 
> > There are many open coded kmalloc with vmalloc fallback instances in
> > the tree.  Most of them are not careful enough or simply do not care
> > about the underlying semantic of the kmalloc/page allocator which means
> > that a) some vmalloc fallbacks are basically unreachable because the
> > kmalloc part will keep retrying until it succeeds b) the page allocator
> > can invoke a really disruptive steps like the OOM killer to move forward
> > which doesn't sound appropriate when we consider that the vmalloc
> > fallback is available.
> > 
> > As it can be seen implementing kvmalloc requires quite an intimate
> > knowledge if the page allocator and the memory reclaim internals which
> > strongly suggests that a helper should be implemented in the memory
> > subsystem proper.
> > 
> > Most callers I could find have been converted to use the helper instead.
> > This is patch 5. There are some more relying on __GFP_REPEAT in the
> > networking stack which I have converted as well but considering we do
> > not have a support for __GFP_REPEAT for requests smaller than 64kB I
> > have marked it RFC.
> 
> Are there any more comments? I would really appreciate to hear from
> networking folks before I resubmit the series.

while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
Thanks

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-24 15:00           ` Michal Hocko
  (?)
@ 2017-01-25 11:15             ` Vlastimil Babka
  -1 siblings, 0 replies; 180+ messages in thread
From: Vlastimil Babka @ 2017-01-25 11:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, Andrew Morton, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Network Development

On 01/24/2017 04:00 PM, Michal Hocko wrote:
>> > Well, I am not opposed to kvmalloc_array but I would argue that this
>> > conversion cannot introduce new overflow issues. The code would have
>> > to be broken already because even though kmalloc_array checks for the
>> > overflow but vmalloc fallback doesn't...
>>
>> Yeah I agree, but if some of the places were really wrong, after the
>> conversion we won't see them anymore.
>>
>> > If there is a general interest for this API I can add it.
>>
>> I think it would be better, yes.
>
> OK, fair enough. I will fold the following into the original patch. I
> was little bit reluctant to create kvcalloc so I've made the original
> callers more talkative and added | __GFP_ZERO.

Fair enough,

> To be honest I do not
> really like how kcalloc...

how kcalloc what?

[...]
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index cdc55d5ee4ad..eca16612b1ae 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
>   */
>  unsigned int *xt_alloc_entry_offsets(unsigned int size)
>  {
> -	if (size < (SIZE_MAX / sizeof(unsigned int)))
> -		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
> -
> -	return NULL;
> +	return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO);

This one wouldn't compile.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-25 11:15             ` Vlastimil Babka
  0 siblings, 0 replies; 180+ messages in thread
From: Vlastimil Babka @ 2017-01-25 11:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, Andrew Morton, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas

On 01/24/2017 04:00 PM, Michal Hocko wrote:
>> > Well, I am not opposed to kvmalloc_array but I would argue that this
>> > conversion cannot introduce new overflow issues. The code would have
>> > to be broken already because even though kmalloc_array checks for the
>> > overflow but vmalloc fallback doesn't...
>>
>> Yeah I agree, but if some of the places were really wrong, after the
>> conversion we won't see them anymore.
>>
>> > If there is a general interest for this API I can add it.
>>
>> I think it would be better, yes.
>
> OK, fair enough. I will fold the following into the original patch. I
> was little bit reluctant to create kvcalloc so I've made the original
> callers more talkative and added | __GFP_ZERO.

Fair enough,

> To be honest I do not
> really like how kcalloc...

how kcalloc what?

[...]
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index cdc55d5ee4ad..eca16612b1ae 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
>   */
>  unsigned int *xt_alloc_entry_offsets(unsigned int size)
>  {
> -	if (size < (SIZE_MAX / sizeof(unsigned int)))
> -		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
> -
> -	return NULL;
> +	return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO);

This one wouldn't compile.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-25 11:15             ` Vlastimil Babka
  0 siblings, 0 replies; 180+ messages in thread
From: Vlastimil Babka @ 2017-01-25 11:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, Andrew Morton, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Network Development

On 01/24/2017 04:00 PM, Michal Hocko wrote:
>> > Well, I am not opposed to kvmalloc_array but I would argue that this
>> > conversion cannot introduce new overflow issues. The code would have
>> > to be broken already because even though kmalloc_array checks for the
>> > overflow but vmalloc fallback doesn't...
>>
>> Yeah I agree, but if some of the places were really wrong, after the
>> conversion we won't see them anymore.
>>
>> > If there is a general interest for this API I can add it.
>>
>> I think it would be better, yes.
>
> OK, fair enough. I will fold the following into the original patch. I
> was little bit reluctant to create kvcalloc so I've made the original
> callers more talkative and added | __GFP_ZERO.

Fair enough,

> To be honest I do not
> really like how kcalloc...

how kcalloc what?

[...]
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index cdc55d5ee4ad..eca16612b1ae 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
>   */
>  unsigned int *xt_alloc_entry_offsets(unsigned int size)
>  {
> -	if (size < (SIZE_MAX / sizeof(unsigned int)))
> -		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
> -
> -	return NULL;
> +	return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO);

This one wouldn't compile.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-25 11:15             ` Vlastimil Babka
  (?)
@ 2017-01-25 13:09               ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-25 13:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Andrew Morton, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Network Development

On Wed 25-01-17 12:15:59, Vlastimil Babka wrote:
> On 01/24/2017 04:00 PM, Michal Hocko wrote:
> > > > Well, I am not opposed to kvmalloc_array but I would argue that this
> > > > conversion cannot introduce new overflow issues. The code would have
> > > > to be broken already because even though kmalloc_array checks for the
> > > > overflow but vmalloc fallback doesn't...
> > > 
> > > Yeah I agree, but if some of the places were really wrong, after the
> > > conversion we won't see them anymore.
> > > 
> > > > If there is a general interest for this API I can add it.
> > > 
> > > I think it would be better, yes.
> > 
> > OK, fair enough. I will fold the following into the original patch. I
> > was little bit reluctant to create kvcalloc so I've made the original
> > callers more talkative and added | __GFP_ZERO.
> 
> Fair enough,
> 
> > To be honest I do not
> > really like how kcalloc...
> 
> how kcalloc what?

how kcalloc hides the GFP_ZERO and the name doesn't reflect that.
 
> [...]
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index cdc55d5ee4ad..eca16612b1ae 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
> >   */
> >  unsigned int *xt_alloc_entry_offsets(unsigned int size)
> >  {
> > -	if (size < (SIZE_MAX / sizeof(unsigned int)))
> > -		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
> > -
> > -	return NULL;
> > +	return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO);
> 
> This one wouldn't compile.

fixed, thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-25 13:09               ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-25 13:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Andrew Morton, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan

On Wed 25-01-17 12:15:59, Vlastimil Babka wrote:
> On 01/24/2017 04:00 PM, Michal Hocko wrote:
> > > > Well, I am not opposed to kvmalloc_array but I would argue that this
> > > > conversion cannot introduce new overflow issues. The code would have
> > > > to be broken already because even though kmalloc_array checks for the
> > > > overflow but vmalloc fallback doesn't...
> > > 
> > > Yeah I agree, but if some of the places were really wrong, after the
> > > conversion we won't see them anymore.
> > > 
> > > > If there is a general interest for this API I can add it.
> > > 
> > > I think it would be better, yes.
> > 
> > OK, fair enough. I will fold the following into the original patch. I
> > was little bit reluctant to create kvcalloc so I've made the original
> > callers more talkative and added | __GFP_ZERO.
> 
> Fair enough,
> 
> > To be honest I do not
> > really like how kcalloc...
> 
> how kcalloc what?

how kcalloc hides the GFP_ZERO and the name doesn't reflect that.
 
> [...]
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index cdc55d5ee4ad..eca16612b1ae 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
> >   */
> >  unsigned int *xt_alloc_entry_offsets(unsigned int size)
> >  {
> > -	if (size < (SIZE_MAX / sizeof(unsigned int)))
> > -		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
> > -
> > -	return NULL;
> > +	return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO);
> 
> This one wouldn't compile.

fixed, thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-25 13:09               ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-25 13:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Andrew Morton, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, Linux-MM, LKML, Martin Schwidefsky,
	Heiko Carstens, Herbert Xu, Anton Vorontsov, Colin Cross,
	Tony Luck, Rafael J. Wysocki, Ben Skeggs, Kent Overstreet,
	Santosh Raspatur, Hariprasad S, Tariq Toukan, Yishai Hadas,
	Dan Williams, Oleg Drokin, Andreas Dilger, Boris Ostrovsky,
	David Sterba, Yan, Zheng, Ilya Dryomov, Alexei Starovoitov,
	Eric Dumazet, Network Development

On Wed 25-01-17 12:15:59, Vlastimil Babka wrote:
> On 01/24/2017 04:00 PM, Michal Hocko wrote:
> > > > Well, I am not opposed to kvmalloc_array but I would argue that this
> > > > conversion cannot introduce new overflow issues. The code would have
> > > > to be broken already because even though kmalloc_array checks for the
> > > > overflow but vmalloc fallback doesn't...
> > > 
> > > Yeah I agree, but if some of the places were really wrong, after the
> > > conversion we won't see them anymore.
> > > 
> > > > If there is a general interest for this API I can add it.
> > > 
> > > I think it would be better, yes.
> > 
> > OK, fair enough. I will fold the following into the original patch. I
> > was little bit reluctant to create kvcalloc so I've made the original
> > callers more talkative and added | __GFP_ZERO.
> 
> Fair enough,
> 
> > To be honest I do not
> > really like how kcalloc...
> 
> how kcalloc what?

how kcalloc hides the GFP_ZERO and the name doesn't reflect that.
 
> [...]
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index cdc55d5ee4ad..eca16612b1ae 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
> >   */
> >  unsigned int *xt_alloc_entry_offsets(unsigned int size)
> >  {
> > -	if (size < (SIZE_MAX / sizeof(unsigned int)))
> > -		return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
> > -
> > -	return NULL;
> > +	return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | __GFP_ZERO);
> 
> This one wouldn't compile.

fixed, thanks!

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-24 19:17     ` Alexei Starovoitov
@ 2017-01-25 13:10       ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-25 13:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Eric Dumazet,
	Hariprasad S, Heiko Carstens, Herbert Xu, Ilya Dryomov,
	Kees Cook, Kent Overstreet, Martin Schwidefsky,
	Michael S. Tsirkin, Mike Snitzer, Oleg Drokin, Paolo Bonzini,
	Rafael J. Wysocki, Santosh Raspatur, Tariq Toukan,
	Theodore Ts'o, Tom Herbert, Tony Luck, Yan, Zheng,
	Yishai Hadas, Daniel Borkmann

On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
> On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
> > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> > > Hi,
> > > this has been previously posted as a single patch [1] but later on more
> > > built on top. It turned out that there are users who would like to have
> > > __GFP_REPEAT semantic. This is currently implemented for costly >64B
> > > requests. Doing the same for smaller requests would require to redefine
> > > __GFP_REPEAT semantic in the page allocator which is out of scope of
> > > this series.
> > > 
> > > There are many open coded kmalloc with vmalloc fallback instances in
> > > the tree.  Most of them are not careful enough or simply do not care
> > > about the underlying semantic of the kmalloc/page allocator which means
> > > that a) some vmalloc fallbacks are basically unreachable because the
> > > kmalloc part will keep retrying until it succeeds b) the page allocator
> > > can invoke a really disruptive steps like the OOM killer to move forward
> > > which doesn't sound appropriate when we consider that the vmalloc
> > > fallback is available.
> > > 
> > > As it can be seen implementing kvmalloc requires quite an intimate
> > > knowledge if the page allocator and the memory reclaim internals which
> > > strongly suggests that a helper should be implemented in the memory
> > > subsystem proper.
> > > 
> > > Most callers I could find have been converted to use the helper instead.
> > > This is patch 5. There are some more relying on __GFP_REPEAT in the
> > > networking stack which I have converted as well but considering we do
> > > not have a support for __GFP_REPEAT for requests smaller than 64kB I
> > > have marked it RFC.
> > 
> > Are there any more comments? I would really appreciate to hear from
> > networking folks before I resubmit the series.
> 
> while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
> it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
> So please rebase and switch bpf_map_area_alloc() to use kvmalloc().

OK, will do. Thanks for the heads up.


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-25 13:10       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-25 13:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Eric Dumazet,
	Hariprasad S, Heiko Carstens, Herbert Xu, Ilya Dryomov,
	Kees Cook, Kent Overstreet, Martin Schwidefsky,
	Michael S. Tsirkin, Mike Snitzer, Oleg Drokin, Paolo Bonzini,
	Rafael J. Wysocki, Santosh Raspatur, Tariq Toukan,
	Theodore Ts'o, Tom Herbert, Tony Luck, Yan, Zheng,
	Yishai Hadas, Daniel Borkmann

On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
> On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
> > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> > > Hi,
> > > this has been previously posted as a single patch [1] but later on more
> > > built on top. It turned out that there are users who would like to have
> > > __GFP_REPEAT semantic. This is currently implemented for costly >64B
> > > requests. Doing the same for smaller requests would require to redefine
> > > __GFP_REPEAT semantic in the page allocator which is out of scope of
> > > this series.
> > > 
> > > There are many open coded kmalloc with vmalloc fallback instances in
> > > the tree.  Most of them are not careful enough or simply do not care
> > > about the underlying semantic of the kmalloc/page allocator which means
> > > that a) some vmalloc fallbacks are basically unreachable because the
> > > kmalloc part will keep retrying until it succeeds b) the page allocator
> > > can invoke a really disruptive steps like the OOM killer to move forward
> > > which doesn't sound appropriate when we consider that the vmalloc
> > > fallback is available.
> > > 
> > > As it can be seen implementing kvmalloc requires quite an intimate
> > > knowledge if the page allocator and the memory reclaim internals which
> > > strongly suggests that a helper should be implemented in the memory
> > > subsystem proper.
> > > 
> > > Most callers I could find have been converted to use the helper instead.
> > > This is patch 5. There are some more relying on __GFP_REPEAT in the
> > > networking stack which I have converted as well but considering we do
> > > not have a support for __GFP_REPEAT for requests smaller than 64kB I
> > > have marked it RFC.
> > 
> > Are there any more comments? I would really appreciate to hear from
> > networking folks before I resubmit the series.
> 
> while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
> it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
> So please rebase and switch bpf_map_area_alloc() to use kvmalloc().

OK, will do. Thanks for the heads up.


-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-24 16:00     ` Eric Dumazet
@ 2017-01-25 13:10       ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-25 13:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Hariprasad S,
	Heiko Carstens, Herbert Xu, Ilya Dryomov, Kees Cook,
	Kent Overstreet, Martin Schwidefsky, Michael S. Tsirkin,
	Mike Snitzer, Oleg Drokin, Paolo Bonzini, Rafael J. Wysocki,
	Santosh Raspatur, Tariq Toukan, Theodore Ts'o, Tom Herbert,
	Tony Luck, Yan, Zheng, Yishai Hadas

On Tue 24-01-17 08:00:26, Eric Dumazet wrote:
> On Tue, 2017-01-24 at 16:17 +0100, Michal Hocko wrote:
> > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> 
> > Are there any more comments? I would really appreciate to hear from
> > networking folks before I resubmit the series.
> 
> I do not see any issues right now.
> 
> I am happy to see this thing finally coming, after years of
> resistance ;)

OK, so I will repost the series and ask Andrew for inclusion
after it passes my compile test battery after the rebase.
 
Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-25 13:10       ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-25 13:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Hariprasad S,
	Heiko Carstens, Herbert Xu, Ilya Dryomov, Kees Cook,
	Kent Overstreet, Martin Schwidefsky, Michael S. Tsirkin,
	Mike Snitzer, Oleg Drokin, Paolo Bonzini, Rafael J. Wysocki,
	Santosh Raspatur, Tariq Toukan, Theodore Ts'o, Tom Herbert,
	Tony Luck, Yan, Zheng, Yishai Hadas

On Tue 24-01-17 08:00:26, Eric Dumazet wrote:
> On Tue, 2017-01-24 at 16:17 +0100, Michal Hocko wrote:
> > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> 
> > Are there any more comments? I would really appreciate to hear from
> > networking folks before I resubmit the series.
> 
> I do not see any issues right now.
> 
> I am happy to see this thing finally coming, after years of
> resistance ;)

OK, so I will repost the series and ask Andrew for inclusion
after it passes my compile test battery after the rebase.
 
Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-25 13:10       ` Michal Hocko
@ 2017-01-25 13:21         ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-25 13:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Eric Dumazet,
	Hariprasad S, Heiko Carstens, Herbert Xu, Ilya Dryomov,
	Kees Cook, Kent Overstreet, Martin Schwidefsky,
	Michael S. Tsirkin, Mike Snitzer, Oleg Drokin, Paolo Bonzini,
	Rafael J. Wysocki, Santosh Raspatur, Tariq Toukan,
	Theodore Ts'o, Tom Herbert, Tony Luck, Yan, Zheng,
	Yishai Hadas, Daniel Borkmann

On Wed 25-01-17 14:10:06, Michal Hocko wrote:
> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
> > On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
> > > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> > > > Hi,
> > > > this has been previously posted as a single patch [1] but later on more
> > > > built on top. It turned out that there are users who would like to have
> > > > __GFP_REPEAT semantic. This is currently implemented for costly >64B
> > > > requests. Doing the same for smaller requests would require to redefine
> > > > __GFP_REPEAT semantic in the page allocator which is out of scope of
> > > > this series.
> > > > 
> > > > There are many open coded kmalloc with vmalloc fallback instances in
> > > > the tree.  Most of them are not careful enough or simply do not care
> > > > about the underlying semantic of the kmalloc/page allocator which means
> > > > that a) some vmalloc fallbacks are basically unreachable because the
> > > > kmalloc part will keep retrying until it succeeds b) the page allocator
> > > > can invoke a really disruptive steps like the OOM killer to move forward
> > > > which doesn't sound appropriate when we consider that the vmalloc
> > > > fallback is available.
> > > > 
> > > > As it can be seen implementing kvmalloc requires quite an intimate
> > > > knowledge if the page allocator and the memory reclaim internals which
> > > > strongly suggests that a helper should be implemented in the memory
> > > > subsystem proper.
> > > > 
> > > > Most callers I could find have been converted to use the helper instead.
> > > > This is patch 5. There are some more relying on __GFP_REPEAT in the
> > > > networking stack which I have converted as well but considering we do
> > > > not have a support for __GFP_REPEAT for requests smaller than 64kB I
> > > > have marked it RFC.
> > > 
> > > Are there any more comments? I would really appreciate to hear from
> > > networking folks before I resubmit the series.
> > 
> > while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
> > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
> > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
> > it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
> > So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
> 
> OK, will do. Thanks for the heads up.

Just for the record, I will fold the following into the patch 1
---
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19b6129eab23..8697f43cf93c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
 
 void *bpf_map_area_alloc(size_t size)
 {
-	/* We definitely need __GFP_NORETRY, so OOM killer doesn't
-	 * trigger under memory pressure as we really just want to
-	 * fail instead.
-	 */
-	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
-	void *area;
-
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		area = kmalloc(size, GFP_USER | flags);
-		if (area != NULL)
-			return area;
-	}
-
-	return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
-			 PAGE_KERNEL);
+	return kvzalloc(size, GFP_USER);
 }
 
 void bpf_map_area_free(void *area)

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-25 13:21         ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-25 13:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, Alexei Starovoitov,
	Anatoly Stepanov, Andreas Dilger, Andreas Dilger,
	Anton Vorontsov, Ben Skeggs, Boris Ostrovsky, Colin Cross,
	Dan Williams, David Sterba, Eric Dumazet, Eric Dumazet,
	Hariprasad S, Heiko Carstens, Herbert Xu, Ilya Dryomov,
	Kees Cook, Kent Overstreet, Martin Schwidefsky,
	Michael S. Tsirkin, Mike Snitzer, Oleg Drokin, Paolo Bonzini,
	Rafael J. Wysocki, Santosh Raspatur, Tariq Toukan,
	Theodore Ts'o, Tom Herbert, Tony Luck, Yan, Zheng,
	Yishai Hadas, Daniel Borkmann

On Wed 25-01-17 14:10:06, Michal Hocko wrote:
> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
> > On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
> > > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
> > > > Hi,
> > > > this has been previously posted as a single patch [1] but later on more
> > > > built on top. It turned out that there are users who would like to have
> > > > __GFP_REPEAT semantic. This is currently implemented for costly >64B
> > > > requests. Doing the same for smaller requests would require to redefine
> > > > __GFP_REPEAT semantic in the page allocator which is out of scope of
> > > > this series.
> > > > 
> > > > There are many open coded kmalloc with vmalloc fallback instances in
> > > > the tree.  Most of them are not careful enough or simply do not care
> > > > about the underlying semantic of the kmalloc/page allocator which means
> > > > that a) some vmalloc fallbacks are basically unreachable because the
> > > > kmalloc part will keep retrying until it succeeds b) the page allocator
> > > > can invoke a really disruptive steps like the OOM killer to move forward
> > > > which doesn't sound appropriate when we consider that the vmalloc
> > > > fallback is available.
> > > > 
> > > > As it can be seen implementing kvmalloc requires quite an intimate
> > > > knowledge if the page allocator and the memory reclaim internals which
> > > > strongly suggests that a helper should be implemented in the memory
> > > > subsystem proper.
> > > > 
> > > > Most callers I could find have been converted to use the helper instead.
> > > > This is patch 5. There are some more relying on __GFP_REPEAT in the
> > > > networking stack which I have converted as well but considering we do
> > > > not have a support for __GFP_REPEAT for requests smaller than 64kB I
> > > > have marked it RFC.
> > > 
> > > Are there any more comments? I would really appreciate to hear from
> > > networking folks before I resubmit the series.
> > 
> > while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
> > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
> > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
> > it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
> > So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
> 
> OK, will do. Thanks for the heads up.

Just for the record, I will fold the following into the patch 1
---
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19b6129eab23..8697f43cf93c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
 
 void *bpf_map_area_alloc(size_t size)
 {
-	/* We definitely need __GFP_NORETRY, so OOM killer doesn't
-	 * trigger under memory pressure as we really just want to
-	 * fail instead.
-	 */
-	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
-	void *area;
-
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		area = kmalloc(size, GFP_USER | flags);
-		if (area != NULL)
-			return area;
-	}
-
-	return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
-			 PAGE_KERNEL);
+	return kvzalloc(size, GFP_USER);
 }
 
 void bpf_map_area_free(void *area)

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
  2017-01-25 13:09               ` Michal Hocko
  (?)
@ 2017-01-25 13:40                 ` Ilya Dryomov
  -1 siblings, 0 replies; 180+ messages in thread
From: Ilya Dryomov @ 2017-01-25 13:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Kees Cook, Andrew Morton, David Rientjes,
	Mel Gorman, Johannes Weiner, Al Viro, Linux-MM, LKML,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Alexei Starovoitov,
	Eric Dumazet, Network Development

On Wed, Jan 25, 2017 at 2:09 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 25-01-17 12:15:59, Vlastimil Babka wrote:
>> On 01/24/2017 04:00 PM, Michal Hocko wrote:
>> > > > Well, I am not opposed to kvmalloc_array but I would argue that this
>> > > > conversion cannot introduce new overflow issues. The code would have
>> > > > to be broken already because even though kmalloc_array checks for the
>> > > > overflow but vmalloc fallback doesn't...
>> > >
>> > > Yeah I agree, but if some of the places were really wrong, after the
>> > > conversion we won't see them anymore.
>> > >
>> > > > If there is a general interest for this API I can add it.
>> > >
>> > > I think it would be better, yes.
>> >
>> > OK, fair enough. I will fold the following into the original patch. I
>> > was little bit reluctant to create kvcalloc so I've made the original
>> > callers more talkative and added | __GFP_ZERO.
>>
>> Fair enough,
>>
>> > To be honest I do not
>> > really like how kcalloc...
>>
>> how kcalloc what?
>
> how kcalloc hides the GFP_ZERO and the name doesn't reflect that.

The userspace calloc() is specified to zero memory, so I'd say the name
does reflect that.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-25 13:40                 ` Ilya Dryomov
  0 siblings, 0 replies; 180+ messages in thread
From: Ilya Dryomov @ 2017-01-25 13:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Kees Cook, Andrew Morton, David Rientjes,
	Mel Gorman, Johannes Weiner, Al Viro, Linux-MM, LKML,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S

On Wed, Jan 25, 2017 at 2:09 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 25-01-17 12:15:59, Vlastimil Babka wrote:
>> On 01/24/2017 04:00 PM, Michal Hocko wrote:
>> > > > Well, I am not opposed to kvmalloc_array but I would argue that this
>> > > > conversion cannot introduce new overflow issues. The code would have
>> > > > to be broken already because even though kmalloc_array checks for the
>> > > > overflow but vmalloc fallback doesn't...
>> > >
>> > > Yeah I agree, but if some of the places were really wrong, after the
>> > > conversion we won't see them anymore.
>> > >
>> > > > If there is a general interest for this API I can add it.
>> > >
>> > > I think it would be better, yes.
>> >
>> > OK, fair enough. I will fold the following into the original patch. I
>> > was little bit reluctant to create kvcalloc so I've made the original
>> > callers more talkative and added | __GFP_ZERO.
>>
>> Fair enough,
>>
>> > To be honest I do not
>> > really like how kcalloc...
>>
>> how kcalloc what?
>
> how kcalloc hides the GFP_ZERO and the name doesn't reflect that.

The userspace calloc() is specified to zero memory, so I'd say the name
does reflect that.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants
@ 2017-01-25 13:40                 ` Ilya Dryomov
  0 siblings, 0 replies; 180+ messages in thread
From: Ilya Dryomov @ 2017-01-25 13:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Kees Cook, Andrew Morton, David Rientjes,
	Mel Gorman, Johannes Weiner, Al Viro, Linux-MM, LKML,
	Martin Schwidefsky, Heiko Carstens, Herbert Xu, Anton Vorontsov,
	Colin Cross, Tony Luck, Rafael J. Wysocki, Ben Skeggs,
	Kent Overstreet, Santosh Raspatur, Hariprasad S, Tariq Toukan,
	Yishai Hadas, Dan Williams, Oleg Drokin, Andreas Dilger,
	Boris Ostrovsky, David Sterba, Yan, Zheng, Alexei Starovoitov,
	Eric Dumazet, Network Development

On Wed, Jan 25, 2017 at 2:09 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 25-01-17 12:15:59, Vlastimil Babka wrote:
>> On 01/24/2017 04:00 PM, Michal Hocko wrote:
>> > > > Well, I am not opposed to kvmalloc_array but I would argue that this
>> > > > conversion cannot introduce new overflow issues. The code would have
>> > > > to be broken already because even though kmalloc_array checks for the
>> > > > overflow but vmalloc fallback doesn't...
>> > >
>> > > Yeah I agree, but if some of the places were really wrong, after the
>> > > conversion we won't see them anymore.
>> > >
>> > > > If there is a general interest for this API I can add it.
>> > >
>> > > I think it would be better, yes.
>> >
>> > OK, fair enough. I will fold the following into the original patch. I
>> > was little bit reluctant to create kvcalloc so I've made the original
>> > callers more talkative and added | __GFP_ZERO.
>>
>> Fair enough,
>>
>> > To be honest I do not
>> > really like how kcalloc...
>>
>> how kcalloc what?
>
> how kcalloc hides the GFP_ZERO and the name doesn't reflect that.

The userspace calloc() is specified to zero memory, so I'd say the name
does reflect that.

Thanks,

                Ilya

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-12 15:37   ` Michal Hocko
@ 2017-01-26 12:09     ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 12:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Anatoly Stepanov, Paolo Bonzini,
	Mike Snitzer, Michael S. Tsirkin, Theodore Ts'o,
	Andreas Dilger

On Thu 12-01-17 16:37:12, Michal Hocko wrote:
[...]
> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	gfp_t kmalloc_flags = flags;
> +	void *ret;
> +
> +	/*
> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> +	 * so the given set of flags has to be compatible.
> +	 */
> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> +
> +	/*
> +	 * Make sure that larger requests are not too disruptive - no OOM
> +	 * killer and no allocation failure warnings as we have a fallback
> +	 */
> +	if (size > PAGE_SIZE)
> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +
> +	ret = kmalloc_node(size, kmalloc_flags, node);
> +
> +	/*
> +	 * It doesn't really make sense to fallback to vmalloc for sub page
> +	 * requests
> +	 */
> +	if (ret || size <= PAGE_SIZE)
> +		return ret;
> +
> +	return __vmalloc_node_flags(size, node, flags);
> +}
> +EXPORT_SYMBOL(kvmalloc_node);

While discussing bpf change I've realized that the vmalloc fallback
doesn't request __GFP_HIGHMEM. So I've updated the patch to do so. All
the current users except for f2fs_kv[zm]alloc which just seemed to
forgot or didn't know about the flag. In the next step, I would like to
check whether we actually have any __vmalloc* user which would strictly
refuse __GFP_HIGHMEM because I do not really see any reason for that and
if there is none then I would simply pull __GFP_HIGHMEM handling into
the vmalloc.

So before I resend the full series again, can I keep acks with the
following?

>From e53038d7fc947519830b6d29a20ec1ff66ae53f0 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 8 Dec 2016 09:19:32 +0100
Subject: [PATCH] mm: introduce kv[mz]alloc helpers

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be
broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
in general (note that the page table allocation is GFP_KERNEL). Those
need to be fixed separately.

While we are at it, document that __vmalloc{_node} about unsupported
gfp mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers. Existing ones would have to die
slowly.

Changes since v3
- add __GFP_HIGHMEM for the vmalloc fallback
- document gfp_mask in __vmalloc_node
- change ipc_alloc to use the library kvmalloc
- __aa_kvmalloc doesn't rely on GFP_NOIO anymore so we can drop and
use the library kvmalloc directly
- bpf has grown its own kvmalloc open coded variant so replace it by
the library one

Changes since v2
- s@WARN_ON@WARN_ON_ONCE@ as per Vlastimil
- do not fallback to vmalloc for size = PAGE_SIZE as per Vlastimil

Changes since v1
- define __vmalloc_node_flags for CONFIG_MMU=n

Cc: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> # ext4 part
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/kvm/lapic.c              |  4 ++--
 arch/x86/kvm/page_track.c         |  4 ++--
 arch/x86/kvm/x86.c                |  4 ++--
 drivers/md/dm-stats.c             |  7 +-----
 fs/ext4/mballoc.c                 |  2 +-
 fs/ext4/super.c                   |  4 ++--
 fs/f2fs/f2fs.h                    | 20 -----------------
 fs/f2fs/file.c                    |  4 ++--
 fs/f2fs/segment.c                 | 14 ++++++------
 fs/seq_file.c                     | 16 +-------------
 include/linux/kvm_host.h          |  2 --
 include/linux/mm.h                | 14 ++++++++++++
 include/linux/vmalloc.h           |  1 +
 ipc/util.c                        |  7 +-----
 kernel/bpf/syscall.c              | 16 +-------------
 mm/nommu.c                        |  5 +++++
 mm/util.c                         | 45 +++++++++++++++++++++++++++++++++++++++
 mm/vmalloc.c                      |  9 +++++++-
 security/apparmor/apparmorfs.c    |  2 +-
 security/apparmor/include/lib.h   | 11 ----------
 security/apparmor/lib.c           | 30 --------------------------
 security/apparmor/match.c         |  2 +-
 security/apparmor/policy_unpack.c |  2 +-
 virt/kvm/kvm_main.c               | 18 +++-------------
 24 files changed, 101 insertions(+), 142 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 33b799fd3a6e..42562348bed2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -177,8 +177,8 @@ static void recalculate_apic_map(struct kvm *kvm)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_x2apic_id(vcpu->arch.apic));
 
-	new = kvm_kvzalloc(sizeof(struct kvm_apic_map) +
-	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1));
+	new = kvzalloc(sizeof(struct kvm_apic_map) +
+	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1), GFP_KERNEL);
 
 	if (!new)
 		goto out;
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index 4a1c13eaa518..d46663e655b0 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -38,8 +38,8 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
 	int  i;
 
 	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		slot->arch.gfn_track[i] = kvm_kvzalloc(npages *
-					    sizeof(*slot->arch.gfn_track[i]));
+		slot->arch.gfn_track[i] = kvzalloc(npages *
+					    sizeof(*slot->arch.gfn_track[i]), GFP_KERNEL);
 		if (!slot->arch.gfn_track[i])
 			goto track_free;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 313f2cecbc57..07b0d17df9ea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8121,13 +8121,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 				      slot->base_gfn, level) + 1;
 
 		slot->arch.rmap[i] =
-			kvm_kvzalloc(lpages * sizeof(*slot->arch.rmap[i]));
+			kvzalloc(lpages * sizeof(*slot->arch.rmap[i]), GFP_KERNEL);
 		if (!slot->arch.rmap[i])
 			goto out_free;
 		if (i == 0)
 			continue;
 
-		linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
+		linfo = kvzalloc(lpages * sizeof(*linfo), GFP_KERNEL);
 		if (!linfo)
 			goto out_free;
 
diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 38b05f23b96c..674f9a1686f7 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -146,12 +146,7 @@ static void *dm_kvzalloc(size_t alloc_size, int node)
 	if (!claim_shared_memory(alloc_size))
 		return NULL;
 
-	if (alloc_size <= KMALLOC_MAX_SIZE) {
-		p = kzalloc_node(alloc_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, node);
-		if (p)
-			return p;
-	}
-	p = vzalloc_node(alloc_size, node);
+	p = kvzalloc_node(alloc_size, GFP_KERNEL | __GFP_NOMEMALLOC, node);
 	if (p)
 		return p;
 
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d9fd184b049e..31a761dd76f5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2381,7 +2381,7 @@ int ext4_mb_alloc_groupinfo(struct super_block *sb, ext4_group_t ngroups)
 		return 0;
 
 	size = roundup_pow_of_two(sizeof(*sbi->s_group_info) * size);
-	new_groupinfo = ext4_kvzalloc(size, GFP_KERNEL);
+	new_groupinfo = kvzalloc(size, GFP_KERNEL);
 	if (!new_groupinfo) {
 		ext4_msg(sb, KERN_ERR, "can't allocate buddy meta group");
 		return -ENOMEM;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9d15a6293124..e3f1ff04a85f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2110,7 +2110,7 @@ int ext4_alloc_flex_bg_array(struct super_block *sb, ext4_group_t ngroup)
 		return 0;
 
 	size = roundup_pow_of_two(size * sizeof(struct flex_groups));
-	new_groups = ext4_kvzalloc(size, GFP_KERNEL);
+	new_groups = kvzalloc(size, GFP_KERNEL);
 	if (!new_groups) {
 		ext4_msg(sb, KERN_ERR, "not enough memory for %d flex groups",
 			 size / (int) sizeof(struct flex_groups));
@@ -3844,7 +3844,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 			goto failed_mount;
 		}
 	}
-	sbi->s_group_desc = ext4_kvmalloc(db_count *
+	sbi->s_group_desc = kvmalloc(db_count *
 					  sizeof(struct buffer_head *),
 					  GFP_KERNEL);
 	if (sbi->s_group_desc == NULL) {
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4cce13d0a1a4..83dc25025277 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1935,26 +1935,6 @@ static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi,
 	return kmalloc(size, flags);
 }
 
-static inline void *f2fs_kvmalloc(size_t size, gfp_t flags)
-{
-	void *ret;
-
-	ret = kmalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags, PAGE_KERNEL);
-	return ret;
-}
-
-static inline void *f2fs_kvzalloc(size_t size, gfp_t flags)
-{
-	void *ret;
-
-	ret = kzalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags | __GFP_ZERO, PAGE_KERNEL);
-	return ret;
-}
-
 #define get_inode_mode(i) \
 	((is_inode_flag_set(i, FI_ACL_MODE)) ? \
 	 (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 2752bcf98f95..82ca8c038ecf 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1014,11 +1014,11 @@ static int __exchange_data_block(struct inode *src_inode,
 	while (len) {
 		olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len);
 
-		src_blkaddr = f2fs_kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
+		src_blkaddr = kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
 		if (!src_blkaddr)
 			return -ENOMEM;
 
-		do_replace = f2fs_kvzalloc(sizeof(int) * olen, GFP_KERNEL);
+		do_replace = kvzalloc(sizeof(int) * olen, GFP_KERNEL);
 		if (!do_replace) {
 			kvfree(src_blkaddr);
 			return -ENOMEM;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index fb57ab9f6aa6..127d875a79f7 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2351,13 +2351,13 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
 
 	SM_I(sbi)->sit_info = sit_i;
 
-	sit_i->sentries = f2fs_kvzalloc(MAIN_SEGS(sbi) *
+	sit_i->sentries = kvzalloc(MAIN_SEGS(sbi) *
 					sizeof(struct seg_entry), GFP_KERNEL);
 	if (!sit_i->sentries)
 		return -ENOMEM;
 
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
-	sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+	sit_i->dirty_sentries_bitmap = kvzalloc(bitmap_size, GFP_KERNEL);
 	if (!sit_i->dirty_sentries_bitmap)
 		return -ENOMEM;
 
@@ -2390,7 +2390,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
 		return -ENOMEM;
 
 	if (sbi->segs_per_sec > 1) {
-		sit_i->sec_entries = f2fs_kvzalloc(MAIN_SECS(sbi) *
+		sit_i->sec_entries = kvzalloc(MAIN_SECS(sbi) *
 					sizeof(struct sec_entry), GFP_KERNEL);
 		if (!sit_i->sec_entries)
 			return -ENOMEM;
@@ -2441,12 +2441,12 @@ static int build_free_segmap(struct f2fs_sb_info *sbi)
 	SM_I(sbi)->free_info = free_i;
 
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
-	free_i->free_segmap = f2fs_kvmalloc(bitmap_size, GFP_KERNEL);
+	free_i->free_segmap = kvmalloc(bitmap_size, GFP_KERNEL);
 	if (!free_i->free_segmap)
 		return -ENOMEM;
 
 	sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
-	free_i->free_secmap = f2fs_kvmalloc(sec_bitmap_size, GFP_KERNEL);
+	free_i->free_secmap = kvmalloc(sec_bitmap_size, GFP_KERNEL);
 	if (!free_i->free_secmap)
 		return -ENOMEM;
 
@@ -2614,7 +2614,7 @@ static int init_victim_secmap(struct f2fs_sb_info *sbi)
 	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
 	unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
 
-	dirty_i->victim_secmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+	dirty_i->victim_secmap = kvzalloc(bitmap_size, GFP_KERNEL);
 	if (!dirty_i->victim_secmap)
 		return -ENOMEM;
 	return 0;
@@ -2636,7 +2636,7 @@ static int build_dirty_segmap(struct f2fs_sb_info *sbi)
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
 
 	for (i = 0; i < NR_DIRTY_TYPE; i++) {
-		dirty_i->dirty_segmap[i] = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+		dirty_i->dirty_segmap[i] = kvzalloc(bitmap_size, GFP_KERNEL);
 		if (!dirty_i->dirty_segmap[i])
 			return -ENOMEM;
 	}
diff --git a/fs/seq_file.c b/fs/seq_file.c
index ca69fb99e41a..dc7c2be963ed 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -25,21 +25,7 @@ static void seq_set_overflow(struct seq_file *m)
 
 static void *seq_buf_alloc(unsigned long size)
 {
-	void *buf;
-	gfp_t gfp = GFP_KERNEL;
-
-	/*
-	 * For high order allocations, use __GFP_NORETRY to avoid oom-killing -
-	 * it's better to fall back to vmalloc() than to kill things.  For small
-	 * allocations, just use GFP_KERNEL which will oom kill, thus no need
-	 * for vmalloc fallback.
-	 */
-	if (size > PAGE_SIZE)
-		gfp |= __GFP_NORETRY | __GFP_NOWARN;
-	buf = kmalloc(size, gfp);
-	if (!buf && size > PAGE_SIZE)
-		buf = vmalloc(size);
-	return buf;
+	return kvmalloc(size, GFP_KERNEL);
 }
 
 /**
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1c5190dab2c1..00e6f93d1ee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -768,8 +768,6 @@ void kvm_arch_check_processor_compat(void *rtn);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
 
-void *kvm_kvzalloc(unsigned long size);
-
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a6b9ab0945c4..0d9fdc0a2a7b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -498,6 +498,20 @@ static inline int is_vmalloc_or_module_addr(const void *x)
 }
 #endif
 
+extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
+static inline void *kvmalloc(size_t size, gfp_t flags)
+{
+	return kvmalloc_node(size, flags, NUMA_NO_NODE);
+}
+static inline void *kvzalloc_node(size_t size, gfp_t flags, int node)
+{
+	return kvmalloc_node(size, flags | __GFP_ZERO, node);
+}
+static inline void *kvzalloc(size_t size, gfp_t flags)
+{
+	return kvmalloc(size, flags | __GFP_ZERO);
+}
+
 extern void kvfree(const void *addr);
 
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index d68edffbf142..46991ad3ddd5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -80,6 +80,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			unsigned long start, unsigned long end, gfp_t gfp_mask,
 			pgprot_t prot, unsigned long vm_flags, int node,
 			const void *caller);
+extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
diff --git a/ipc/util.c b/ipc/util.c
index 798cad18dd87..74c2adc62086 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -403,12 +403,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
  */
 void *ipc_alloc(int size)
 {
-	void *out;
-	if (size > PAGE_SIZE)
-		out = vmalloc(size);
-	else
-		out = kmalloc(size, GFP_KERNEL);
-	return out;
+	return kvmalloc(size, GFP_KERNEL);
 }
 
 /**
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19b6129eab23..8697f43cf93c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
 
 void *bpf_map_area_alloc(size_t size)
 {
-	/* We definitely need __GFP_NORETRY, so OOM killer doesn't
-	 * trigger under memory pressure as we really just want to
-	 * fail instead.
-	 */
-	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
-	void *area;
-
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		area = kmalloc(size, GFP_USER | flags);
-		if (area != NULL)
-			return area;
-	}
-
-	return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
-			 PAGE_KERNEL);
+	return kvzalloc(size, GFP_USER);
 }
 
 void bpf_map_area_free(void *area)
diff --git a/mm/nommu.c b/mm/nommu.c
index 215c62296028..bee76e6cd4e5 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -236,6 +236,11 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
 }
 EXPORT_SYMBOL(__vmalloc);
 
+void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags)
+{
+	return __vmalloc(size, flags, PAGE_KERNEL);
+}
+
 void *vmalloc_user(unsigned long size)
 {
 	void *ret;
diff --git a/mm/util.c b/mm/util.c
index 3cb2164f4099..ef72e2554edb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -324,6 +324,51 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_mmap);
 
+/**
+ * kvmalloc_node - attempt to allocate physically contiguous memory, but upon
+ * failure, fall back to non-contiguous (vmalloc) allocation.
+ * @size: size of the request.
+ * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
+ * @node: numa node to allocate from
+ *
+ * Uses kmalloc to get the memory but if the allocation fails then falls back
+ * to the vmalloc allocator. Use kvfree for freeing the memory.
+ *
+ * Reclaim modifiers - __GFP_NORETRY, __GFP_REPEAT and __GFP_NOFAIL are not supported
+ *
+ * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
+ */
+void *kvmalloc_node(size_t size, gfp_t flags, int node)
+{
+	gfp_t kmalloc_flags = flags;
+	void *ret;
+
+	/*
+	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
+	 * so the given set of flags has to be compatible.
+	 */
+	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
+
+	/*
+	 * Make sure that larger requests are not too disruptive - no OOM
+	 * killer and no allocation failure warnings as we have a fallback
+	 */
+	if (size > PAGE_SIZE)
+		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
+
+	ret = kmalloc_node(size, kmalloc_flags, node);
+
+	/*
+	 * It doesn't really make sense to fallback to vmalloc for sub page
+	 * requests
+	 */
+	if (ret || size <= PAGE_SIZE)
+		return ret;
+
+	return __vmalloc_node_flags(size, node, flags | __GFP_HIGHMEM);
+}
+EXPORT_SYMBOL(kvmalloc_node);
+
 void kvfree(const void *addr)
 {
 	if (is_vmalloc_addr(addr))
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d89034a393f2..6c1aa2c68887 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
  *	Allocate enough pages to cover @size from the page level
  *	allocator with @gfp_mask flags.  Map them into contiguous
  *	kernel virtual space, using a pagetable protection of @prot.
+ *
+ *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
+ *	and __GFP_NOFAIL are not supported
+ *
+ *	Any use of gfp flags outside of GFP_KERNEL should be consulted
+ *	with mm people.
+ *
  */
 static void *__vmalloc_node(unsigned long size, unsigned long align,
 			    gfp_t gfp_mask, pgprot_t prot,
@@ -1757,7 +1764,7 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-static inline void *__vmalloc_node_flags(unsigned long size,
+void *__vmalloc_node_flags(unsigned long size,
 					int node, gfp_t flags)
 {
 	return __vmalloc_node(size, 1, flags, PAGE_KERNEL,
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 41073f70eb41..be0b49897a67 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -98,7 +98,7 @@ static struct aa_loaddata *aa_simple_write_to_buffer(const char __user *userbuf,
 		return ERR_PTR(-ESPIPE);
 
 	/* freed by caller to simple_write_to_buffer */
-	data = kvmalloc(sizeof(*data) + alloc_size);
+	data = kvmalloc(sizeof(*data) + alloc_size, GFP_KERNEL);
 	if (data == NULL)
 		return ERR_PTR(-ENOMEM);
 	kref_init(&data->count);
diff --git a/security/apparmor/include/lib.h b/security/apparmor/include/lib.h
index 65ff492a9807..75733baa6702 100644
--- a/security/apparmor/include/lib.h
+++ b/security/apparmor/include/lib.h
@@ -64,17 +64,6 @@ char *aa_split_fqname(char *args, char **ns_name);
 const char *aa_splitn_fqname(const char *fqname, size_t n, const char **ns_name,
 			     size_t *ns_len);
 void aa_info_message(const char *str);
-void *__aa_kvmalloc(size_t size, gfp_t flags);
-
-static inline void *kvmalloc(size_t size)
-{
-	return __aa_kvmalloc(size, 0);
-}
-
-static inline void *kvzalloc(size_t size)
-{
-	return __aa_kvmalloc(size, __GFP_ZERO);
-}
 
 /**
  * aa_strneq - compare null terminated @str to a non null terminated substring
diff --git a/security/apparmor/lib.c b/security/apparmor/lib.c
index 66475bda6f72..1a13494bc7c7 100644
--- a/security/apparmor/lib.c
+++ b/security/apparmor/lib.c
@@ -129,36 +129,6 @@ void aa_info_message(const char *str)
 }
 
 /**
- * __aa_kvmalloc - do allocation preferring kmalloc but falling back to vmalloc
- * @size: how many bytes of memory are required
- * @flags: the type of memory to allocate (see kmalloc).
- *
- * Return: allocated buffer or NULL if failed
- *
- * It is possible that policy being loaded from the user is larger than
- * what can be allocated by kmalloc, in those cases fall back to vmalloc.
- */
-void *__aa_kvmalloc(size_t size, gfp_t flags)
-{
-	void *buffer = NULL;
-
-	if (size == 0)
-		return NULL;
-
-	/* do not attempt kmalloc if we need more than 16 pages at once */
-	if (size <= (16*PAGE_SIZE))
-		buffer = kmalloc(size, flags | GFP_KERNEL | __GFP_NORETRY |
-				 __GFP_NOWARN);
-	if (!buffer) {
-		if (flags & __GFP_ZERO)
-			buffer = vzalloc(size);
-		else
-			buffer = vmalloc(size);
-	}
-	return buffer;
-}
-
-/**
  * aa_policy_init - initialize a policy structure
  * @policy: policy to initialize  (NOT NULL)
  * @prefix: prefix name if any is required.  (MAYBE NULL)
diff --git a/security/apparmor/match.c b/security/apparmor/match.c
index eb0efef746f5..960c913381e2 100644
--- a/security/apparmor/match.c
+++ b/security/apparmor/match.c
@@ -88,7 +88,7 @@ static struct table_header *unpack_table(char *blob, size_t bsize)
 	if (bsize < tsize)
 		goto out;
 
-	table = kvzalloc(tsize);
+	table = kvzalloc(tsize, GFP_KERNEL);
 	if (table) {
 		table->td_id = th.td_id;
 		table->td_flags = th.td_flags;
diff --git a/security/apparmor/policy_unpack.c b/security/apparmor/policy_unpack.c
index 2e37c9c26bbd..f3422a91353c 100644
--- a/security/apparmor/policy_unpack.c
+++ b/security/apparmor/policy_unpack.c
@@ -487,7 +487,7 @@ static bool unpack_rlimits(struct aa_ext *e, struct aa_profile *profile)
 
 static void *kvmemdup(const void *src, size_t len)
 {
-	void *p = kvmalloc(len);
+	void *p = kvmalloc(len, GFP_KERNEL);
 
 	if (p)
 		memcpy(p, src, len);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dcd1c12940e6..795c8269ef63 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -502,7 +502,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
 	int i;
 	struct kvm_memslots *slots;
 
-	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
+	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!slots)
 		return NULL;
 
@@ -685,18 +685,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	return ERR_PTR(r);
 }
 
-/*
- * Avoid using vmalloc for a small buffer.
- * Should not be used when the size is statically known.
- */
-void *kvm_kvzalloc(unsigned long size)
-{
-	if (size > PAGE_SIZE)
-		return vzalloc(size);
-	else
-		return kzalloc(size, GFP_KERNEL);
-}
-
 static void kvm_destroy_devices(struct kvm *kvm)
 {
 	struct kvm_device *dev, *tmp;
@@ -775,7 +763,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
 	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
 
-	memslot->dirty_bitmap = kvm_kvzalloc(dirty_bytes);
+	memslot->dirty_bitmap = kvzalloc(dirty_bytes, GFP_KERNEL);
 	if (!memslot->dirty_bitmap)
 		return -ENOMEM;
 
@@ -995,7 +983,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			goto out_free;
 	}
 
-	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
+	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!slots)
 		goto out_free;
 	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-26 12:09     ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 12:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Anatoly Stepanov, Paolo Bonzini,
	Mike Snitzer, Michael S. Tsirkin, Theodore Ts'o,
	Andreas Dilger

On Thu 12-01-17 16:37:12, Michal Hocko wrote:
[...]
> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	gfp_t kmalloc_flags = flags;
> +	void *ret;
> +
> +	/*
> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> +	 * so the given set of flags has to be compatible.
> +	 */
> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> +
> +	/*
> +	 * Make sure that larger requests are not too disruptive - no OOM
> +	 * killer and no allocation failure warnings as we have a fallback
> +	 */
> +	if (size > PAGE_SIZE)
> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +
> +	ret = kmalloc_node(size, kmalloc_flags, node);
> +
> +	/*
> +	 * It doesn't really make sense to fallback to vmalloc for sub page
> +	 * requests
> +	 */
> +	if (ret || size <= PAGE_SIZE)
> +		return ret;
> +
> +	return __vmalloc_node_flags(size, node, flags);
> +}
> +EXPORT_SYMBOL(kvmalloc_node);

While discussing bpf change I've realized that the vmalloc fallback
doesn't request __GFP_HIGHMEM. So I've updated the patch to do so. All
the current users except for f2fs_kv[zm]alloc which just seemed to
forgot or didn't know about the flag. In the next step, I would like to
check whether we actually have any __vmalloc* user which would strictly
refuse __GFP_HIGHMEM because I do not really see any reason for that and
if there is none then I would simply pull __GFP_HIGHMEM handling into
the vmalloc.

So before I resend the full series again, can I keep acks with the
following?

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
  2017-01-26 12:09     ` Michal Hocko
@ 2017-01-30  8:42       ` Vlastimil Babka
  -1 siblings, 0 replies; 180+ messages in thread
From: Vlastimil Babka @ 2017-01-30  8:42 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: David Rientjes, Mel Gorman, Johannes Weiner, Al Viro, linux-mm,
	LKML, Anatoly Stepanov, Paolo Bonzini, Mike Snitzer,
	Michael S. Tsirkin, Theodore Ts'o, Andreas Dilger

On 01/26/2017 01:09 PM, Michal Hocko wrote:
> On Thu 12-01-17 16:37:12, Michal Hocko wrote:
> [...]
>> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
>> +{
>> +	gfp_t kmalloc_flags = flags;
>> +	void *ret;
>> +
>> +	/*
>> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>> +	 * so the given set of flags has to be compatible.
>> +	 */
>> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
>> +
>> +	/*
>> +	 * Make sure that larger requests are not too disruptive - no OOM
>> +	 * killer and no allocation failure warnings as we have a fallback
>> +	 */
>> +	if (size > PAGE_SIZE)
>> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
>> +
>> +	ret = kmalloc_node(size, kmalloc_flags, node);
>> +
>> +	/*
>> +	 * It doesn't really make sense to fallback to vmalloc for sub page
>> +	 * requests
>> +	 */
>> +	if (ret || size <= PAGE_SIZE)
>> +		return ret;
>> +
>> +	return __vmalloc_node_flags(size, node, flags);
>> +}
>> +EXPORT_SYMBOL(kvmalloc_node);
>
> While discussing bpf change I've realized that the vmalloc fallback
> doesn't request __GFP_HIGHMEM. So I've updated the patch to do so. All
> the current users except for f2fs_kv[zm]alloc which just seemed to
> forgot or didn't know about the flag. In the next step, I would like to
> check whether we actually have any __vmalloc* user which would strictly
> refuse __GFP_HIGHMEM because I do not really see any reason for that and
> if there is none then I would simply pull __GFP_HIGHMEM handling into
> the vmalloc.
>
> So before I resend the full series again, can I keep acks with the
> following?

OK!

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
@ 2017-01-30  8:42       ` Vlastimil Babka
  0 siblings, 0 replies; 180+ messages in thread
From: Vlastimil Babka @ 2017-01-30  8:42 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: David Rientjes, Mel Gorman, Johannes Weiner, Al Viro, linux-mm,
	LKML, Anatoly Stepanov, Paolo Bonzini, Mike Snitzer,
	Michael S. Tsirkin, Theodore Ts'o, Andreas Dilger

On 01/26/2017 01:09 PM, Michal Hocko wrote:
> On Thu 12-01-17 16:37:12, Michal Hocko wrote:
> [...]
>> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
>> +{
>> +	gfp_t kmalloc_flags = flags;
>> +	void *ret;
>> +
>> +	/*
>> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>> +	 * so the given set of flags has to be compatible.
>> +	 */
>> +	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
>> +
>> +	/*
>> +	 * Make sure that larger requests are not too disruptive - no OOM
>> +	 * killer and no allocation failure warnings as we have a fallback
>> +	 */
>> +	if (size > PAGE_SIZE)
>> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
>> +
>> +	ret = kmalloc_node(size, kmalloc_flags, node);
>> +
>> +	/*
>> +	 * It doesn't really make sense to fallback to vmalloc for sub page
>> +	 * requests
>> +	 */
>> +	if (ret || size <= PAGE_SIZE)
>> +		return ret;
>> +
>> +	return __vmalloc_node_flags(size, node, flags);
>> +}
>> +EXPORT_SYMBOL(kvmalloc_node);
>
> While discussing bpf change I've realized that the vmalloc fallback
> doesn't request __GFP_HIGHMEM. So I've updated the patch to do so. All
> the current users except for f2fs_kv[zm]alloc which just seemed to
> forgot or didn't know about the flag. In the next step, I would like to
> check whether we actually have any __vmalloc* user which would strictly
> refuse __GFP_HIGHMEM because I do not really see any reason for that and
> if there is none then I would simply pull __GFP_HIGHMEM handling into
> the vmalloc.
>
> So before I resend the full series again, can I keep acks with the
> following?

OK!

Thanks,
Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-30  9:49 ` Michal Hocko
@ 2017-02-05 10:23   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-02-05 10:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Alexei Starovoitov, Andreas Dilger,
	Andreas Dilger, Andrey Konovalov, Anton Vorontsov, Ben Skeggs,
	Boris Ostrovsky, Christian Borntraeger, Colin Cross,
	Daniel Borkmann, Dan Williams, David Sterba, Eric Dumazet,
	Eric Dumazet, Hariprasad S, Heiko Carstens, Herbert Xu,
	Ilya Dryomov, John Hubbard, Kees Cook, Kent Overstreet,
	Marcelo Ricardo Leitner, Martin Schwidefsky, Michael S. Tsirkin,
	Mike Snitzer, Mikulas Patocka, Oleg Drokin, Pablo Neira Ayuso,
	Rafael J. Wysocki, Santosh Raspatur, Tariq Toukan, Tom Herbert,
	Tony Luck, Yan, Zheng, Yishai Hadas

Is there anything more to be done before this can get merged? I would
relly like to target this to the next merge window. I already have some
more changes which depend on this.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-02-05 10:23   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-02-05 10:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Alexei Starovoitov, Andreas Dilger,
	Andreas Dilger, Andrey Konovalov, Anton Vorontsov, Ben Skeggs,
	Boris Ostrovsky, Christian Borntraeger, Colin Cross,
	Daniel Borkmann, Dan Williams, David Sterba, Eric Dumazet,
	Eric Dumazet, Hariprasad S, Heiko Carstens, Herbert Xu,
	Ilya Dryomov, John Hubbard, Kees Cook, Kent Overstreet,
	Marcelo Ricardo Leitner, Martin Schwidefsky, Michael S. Tsirkin,
	Mike Snitzer, Mikulas Patocka, Oleg Drokin, Pablo Neira Ayuso,
	Rafael J. Wysocki, Santosh Raspatur, Tariq Toukan, Tom Herbert,
	Tony Luck, Yan, Zheng, Yishai Hadas

Is there anything more to be done before this can get merged? I would
relly like to target this to the next merge window. I already have some
more changes which depend on this.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-30 16:28                             ` Michal Hocko
@ 2017-01-30 16:45                               ` Daniel Borkmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-30 16:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/30/2017 05:28 PM, Michal Hocko wrote:
> On Mon 30-01-17 17:15:08, Daniel Borkmann wrote:
>> On 01/30/2017 08:56 AM, Michal Hocko wrote:
>>> On Fri 27-01-17 21:12:26, Daniel Borkmann wrote:
>>>> On 01/27/2017 11:05 AM, Michal Hocko wrote:
>>>>> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
>>> [...]
>>>>>> So to answer your second email with the bpf and netfilter hunks, why
>>>>>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
>>>>>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
>>>>>> is not harmful though has only /partial/ effect right now and that full
>>>>>> support needs to be implemented in future. That would still be better
>>>>>> that not having it, imo, and the FIXME would make expectations clear
>>>>>> to anyone reading that code.
>>>>>
>>>>> Well, we can do that, I just would like to prevent from this (ab)use
>>>>> if there is no _real_ and _sensible_ usecase for it. Having a real bug
>>>>
>>>> Understandable.
>>>>
>>>>> report or a fallback mechanism you are mentioning above would justify
>>>>> the (ab)use IMHO. But that abuse would be documented properly and have a
>>>>> real reason to exist. That sounds like a better approach to me.
>>>>>
>>>>> But if you absolutely _insist_ I can change that.
>>>>
>>>> Yeah, please do (with a big FIXME comment as mentioned), this originally
>>>> came from a real bug report. Anyway, feel free to add my Acked-by then.
>>>
>>> Thanks! I will repost the whole series today.
>>
>> Looks like I got only Cc'ed on the cover letter of your v3 from today
>> (should have been v4 actually?).
>
> Yes
>
>> Anyway, I looked up the last patch
>> on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about?
>
> I misread your response. I thought you were OK with the FIXME
> explanation.
>
>> At least that was what was discussed above (insisting on __GFP_NORETRY
>> plus FIXME comment) for providing my Acked-by then. Can you still fix
>> that up in a final respin?
>
> I will probably just drop that last patch instead. I am not convinced
> that we should bend the new API over and let people mimic that
> throughout the code. I have just seen too many examples of this pattern
> already.
>
> I would also like to prevent the next rebase, unless there any issues
> with some patches of course.

Ok, I'm fine with that as well.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-30 16:45                               ` Daniel Borkmann
  0 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-30 16:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/30/2017 05:28 PM, Michal Hocko wrote:
> On Mon 30-01-17 17:15:08, Daniel Borkmann wrote:
>> On 01/30/2017 08:56 AM, Michal Hocko wrote:
>>> On Fri 27-01-17 21:12:26, Daniel Borkmann wrote:
>>>> On 01/27/2017 11:05 AM, Michal Hocko wrote:
>>>>> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
>>> [...]
>>>>>> So to answer your second email with the bpf and netfilter hunks, why
>>>>>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
>>>>>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
>>>>>> is not harmful though has only /partial/ effect right now and that full
>>>>>> support needs to be implemented in future. That would still be better
>>>>>> that not having it, imo, and the FIXME would make expectations clear
>>>>>> to anyone reading that code.
>>>>>
>>>>> Well, we can do that, I just would like to prevent from this (ab)use
>>>>> if there is no _real_ and _sensible_ usecase for it. Having a real bug
>>>>
>>>> Understandable.
>>>>
>>>>> report or a fallback mechanism you are mentioning above would justify
>>>>> the (ab)use IMHO. But that abuse would be documented properly and have a
>>>>> real reason to exist. That sounds like a better approach to me.
>>>>>
>>>>> But if you absolutely _insist_ I can change that.
>>>>
>>>> Yeah, please do (with a big FIXME comment as mentioned), this originally
>>>> came from a real bug report. Anyway, feel free to add my Acked-by then.
>>>
>>> Thanks! I will repost the whole series today.
>>
>> Looks like I got only Cc'ed on the cover letter of your v3 from today
>> (should have been v4 actually?).
>
> Yes
>
>> Anyway, I looked up the last patch
>> on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about?
>
> I misread your response. I thought you were OK with the FIXME
> explanation.
>
>> At least that was what was discussed above (insisting on __GFP_NORETRY
>> plus FIXME comment) for providing my Acked-by then. Can you still fix
>> that up in a final respin?
>
> I will probably just drop that last patch instead. I am not convinced
> that we should bend the new API over and let people mimic that
> throughout the code. I have just seen too many examples of this pattern
> already.
>
> I would also like to prevent the next rebase, unless there any issues
> with some patches of course.

Ok, I'm fine with that as well.

Thanks,
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-30 16:15                           ` Daniel Borkmann
@ 2017-01-30 16:28                             ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-30 16:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Mon 30-01-17 17:15:08, Daniel Borkmann wrote:
> On 01/30/2017 08:56 AM, Michal Hocko wrote:
> > On Fri 27-01-17 21:12:26, Daniel Borkmann wrote:
> > > On 01/27/2017 11:05 AM, Michal Hocko wrote:
> > > > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
> > [...]
> > > > > So to answer your second email with the bpf and netfilter hunks, why
> > > > > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
> > > > > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
> > > > > is not harmful though has only /partial/ effect right now and that full
> > > > > support needs to be implemented in future. That would still be better
> > > > > that not having it, imo, and the FIXME would make expectations clear
> > > > > to anyone reading that code.
> > > > 
> > > > Well, we can do that, I just would like to prevent from this (ab)use
> > > > if there is no _real_ and _sensible_ usecase for it. Having a real bug
> > > 
> > > Understandable.
> > > 
> > > > report or a fallback mechanism you are mentioning above would justify
> > > > the (ab)use IMHO. But that abuse would be documented properly and have a
> > > > real reason to exist. That sounds like a better approach to me.
> > > > 
> > > > But if you absolutely _insist_ I can change that.
> > > 
> > > Yeah, please do (with a big FIXME comment as mentioned), this originally
> > > came from a real bug report. Anyway, feel free to add my Acked-by then.
> > 
> > Thanks! I will repost the whole series today.
> 
> Looks like I got only Cc'ed on the cover letter of your v3 from today
> (should have been v4 actually?).

Yes

> Anyway, I looked up the last patch
> on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about?

I misread your response. I thought you were OK with the FIXME
explanation.

> At least that was what was discussed above (insisting on __GFP_NORETRY
> plus FIXME comment) for providing my Acked-by then. Can you still fix
> that up in a final respin?

I will probably just drop that last patch instead. I am not convinced
that we should bend the new API over and let people mimic that
throughout the code. I have just seen too many examples of this pattern
already.

I would also like to prevent the next rebase, unless there any issues
with some patches of course.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-30 16:28                             ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-30 16:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Mon 30-01-17 17:15:08, Daniel Borkmann wrote:
> On 01/30/2017 08:56 AM, Michal Hocko wrote:
> > On Fri 27-01-17 21:12:26, Daniel Borkmann wrote:
> > > On 01/27/2017 11:05 AM, Michal Hocko wrote:
> > > > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
> > [...]
> > > > > So to answer your second email with the bpf and netfilter hunks, why
> > > > > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
> > > > > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
> > > > > is not harmful though has only /partial/ effect right now and that full
> > > > > support needs to be implemented in future. That would still be better
> > > > > that not having it, imo, and the FIXME would make expectations clear
> > > > > to anyone reading that code.
> > > > 
> > > > Well, we can do that, I just would like to prevent from this (ab)use
> > > > if there is no _real_ and _sensible_ usecase for it. Having a real bug
> > > 
> > > Understandable.
> > > 
> > > > report or a fallback mechanism you are mentioning above would justify
> > > > the (ab)use IMHO. But that abuse would be documented properly and have a
> > > > real reason to exist. That sounds like a better approach to me.
> > > > 
> > > > But if you absolutely _insist_ I can change that.
> > > 
> > > Yeah, please do (with a big FIXME comment as mentioned), this originally
> > > came from a real bug report. Anyway, feel free to add my Acked-by then.
> > 
> > Thanks! I will repost the whole series today.
> 
> Looks like I got only Cc'ed on the cover letter of your v3 from today
> (should have been v4 actually?).

Yes

> Anyway, I looked up the last patch
> on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about?

I misread your response. I thought you were OK with the FIXME
explanation.

> At least that was what was discussed above (insisting on __GFP_NORETRY
> plus FIXME comment) for providing my Acked-by then. Can you still fix
> that up in a final respin?

I will probably just drop that last patch instead. I am not convinced
that we should bend the new API over and let people mimic that
throughout the code. I have just seen too many examples of this pattern
already.

I would also like to prevent the next rebase, unless there any issues
with some patches of course.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-30  7:56                         ` Michal Hocko
@ 2017-01-30 16:15                           ` Daniel Borkmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-30 16:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/30/2017 08:56 AM, Michal Hocko wrote:
> On Fri 27-01-17 21:12:26, Daniel Borkmann wrote:
>> On 01/27/2017 11:05 AM, Michal Hocko wrote:
>>> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
> [...]
>>>> So to answer your second email with the bpf and netfilter hunks, why
>>>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
>>>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
>>>> is not harmful though has only /partial/ effect right now and that full
>>>> support needs to be implemented in future. That would still be better
>>>> that not having it, imo, and the FIXME would make expectations clear
>>>> to anyone reading that code.
>>>
>>> Well, we can do that, I just would like to prevent from this (ab)use
>>> if there is no _real_ and _sensible_ usecase for it. Having a real bug
>>
>> Understandable.
>>
>>> report or a fallback mechanism you are mentioning above would justify
>>> the (ab)use IMHO. But that abuse would be documented properly and have a
>>> real reason to exist. That sounds like a better approach to me.
>>>
>>> But if you absolutely _insist_ I can change that.
>>
>> Yeah, please do (with a big FIXME comment as mentioned), this originally
>> came from a real bug report. Anyway, feel free to add my Acked-by then.
>
> Thanks! I will repost the whole series today.

Looks like I got only Cc'ed on the cover letter of your v3 from today
(should have been v4 actually?). Anyway, I looked up the last patch
on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about?
At least that was what was discussed above (insisting on __GFP_NORETRY
plus FIXME comment) for providing my Acked-by then. Can you still fix
that up in a final respin?

Thanks again,
Daniel

   [1] https://lkml.org/lkml/2017/1/30/129

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-30 16:15                           ` Daniel Borkmann
  0 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-30 16:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/30/2017 08:56 AM, Michal Hocko wrote:
> On Fri 27-01-17 21:12:26, Daniel Borkmann wrote:
>> On 01/27/2017 11:05 AM, Michal Hocko wrote:
>>> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
> [...]
>>>> So to answer your second email with the bpf and netfilter hunks, why
>>>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
>>>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
>>>> is not harmful though has only /partial/ effect right now and that full
>>>> support needs to be implemented in future. That would still be better
>>>> that not having it, imo, and the FIXME would make expectations clear
>>>> to anyone reading that code.
>>>
>>> Well, we can do that, I just would like to prevent from this (ab)use
>>> if there is no _real_ and _sensible_ usecase for it. Having a real bug
>>
>> Understandable.
>>
>>> report or a fallback mechanism you are mentioning above would justify
>>> the (ab)use IMHO. But that abuse would be documented properly and have a
>>> real reason to exist. That sounds like a better approach to me.
>>>
>>> But if you absolutely _insist_ I can change that.
>>
>> Yeah, please do (with a big FIXME comment as mentioned), this originally
>> came from a real bug report. Anyway, feel free to add my Acked-by then.
>
> Thanks! I will repost the whole series today.

Looks like I got only Cc'ed on the cover letter of your v3 from today
(should have been v4 actually?). Anyway, I looked up the last patch
on lkml [1] and it seems you forgot the __GFP_NORETRY we talked about?
At least that was what was discussed above (insisting on __GFP_NORETRY
plus FIXME comment) for providing my Acked-by then. Can you still fix
that up in a final respin?

Thanks again,
Daniel

   [1] https://lkml.org/lkml/2017/1/30/129

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 0/6 v3] kvmalloc
@ 2017-01-30  9:49 ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-30  9:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Alexei Starovoitov, Andreas Dilger,
	Andreas Dilger, Andrey Konovalov, Anton Vorontsov, Ben Skeggs,
	Boris Ostrovsky, Christian Borntraeger, Colin Cross,
	Daniel Borkmann, Dan Williams, David Sterba, Eric Dumazet,
	Eric Dumazet, Hariprasad S, Heiko Carstens, Herbert Xu,
	Ilya Dryomov, John Hubbard, Kees Cook, Kent Overstreet,
	Marcelo Ricardo Leitner, Martin Schwidefsky, Michael S. Tsirkin,
	Michal Hocko, Mike Snitzer, Mikulas Patocka, Oleg Drokin,
	Pablo Neira Ayuso, Rafael J. Wysocki, Santosh Raspatur,
	Tariq Toukan, Tom Herbert, Tony Luck, Yan, Zheng, Yishai Hadas

Hi,
this has been previously posted here [1] and it received quite some
feedback. As a result the number of patches has grown again. We are at
9 patches right now. I have rebased the series on top of the current
next-20170130. There were some changes since the last posting, namely
a7f6c1b63b86 ("AppArmor: Use GFP_KERNEL for __aa_kvmalloc().") which
dropped GFP_NOIO from __aa_kvmalloc and d407bd25a204 ("bpf: don't
trigger OOM killer under pressure with map alloc") which has created a
kvmalloc alternative for bpf code. Both have been changed to use the mm
kvmalloc but it is worth noting this dependency during the merge window.

I hope there are no further obstacles to have this merged into the mmotm
tree and go in in the next merge window.

Original cover:

There are many open coded kmalloc with vmalloc fallback instances in
the tree.  Most of them are not careful enough or simply do not care
about the underlying semantic of the kmalloc/page allocator which means
that a) some vmalloc fallbacks are basically unreachable because the
kmalloc part will keep retrying until it succeeds b) the page allocator
can invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers, I could find, have been converted to use the helper
instead.  This is patch 5. There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.

[1] http://lkml.kernel.org/r/20170112153717.28943-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

Michal Hocko (9):
      mm: introduce kv[mz]alloc helpers
      mm: support __GFP_REPEAT in kvmalloc_node for >32kB
      rhashtable: simplify a strange allocation pattern
      ila: simplify a strange allocation pattern
      treewide: use kv[mz]alloc* rather than opencoded variants
      net: use kvmalloc with __GFP_REPEAT rather than open coded variant
      md: use kvmalloc rather than opencoded variant
      bcache: use kvmalloc
      net, bpf: use kvzalloc helper

 arch/s390/kvm/kvm-s390.c                           | 10 +---
 arch/x86/kvm/lapic.c                               |  4 +-
 arch/x86/kvm/page_track.c                          |  4 +-
 arch/x86/kvm/x86.c                                 |  4 +-
 crypto/lzo.c                                       |  4 +-
 drivers/acpi/apei/erst.c                           |  8 +--
 drivers/char/agp/generic.c                         |  8 +--
 drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +-
 drivers/md/bcache/super.c                          |  8 +--
 drivers/md/bcache/util.h                           | 12 +----
 drivers/md/dm-ioctl.c                              | 13 ++---
 drivers/md/dm-stats.c                              |  7 +--
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 29 ++---------
 drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  8 +--
 drivers/net/ethernet/chelsio/cxgb3/l2t.h           |  1 -
 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c      | 12 ++---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |  3 --
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 10 ++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c |  8 +--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++----------
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c  | 13 +++--
 drivers/net/ethernet/chelsio/cxgb4/l2t.c           |  2 +-
 drivers/net/ethernet/chelsio/cxgb4/sched.c         | 12 ++---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++--
 drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++--
 drivers/nvdimm/dimm_devs.c                         |  5 +-
 .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +----
 drivers/vhost/net.c                                |  9 ++--
 drivers/vhost/vhost.c                              | 15 ++----
 drivers/vhost/vsock.c                              |  9 ++--
 drivers/xen/evtchn.c                               | 14 +-----
 fs/btrfs/ctree.c                                   |  9 ++--
 fs/btrfs/ioctl.c                                   |  9 ++--
 fs/btrfs/send.c                                    | 27 ++++------
 fs/ceph/file.c                                     |  9 ++--
 fs/ext4/mballoc.c                                  |  2 +-
 fs/ext4/super.c                                    |  4 +-
 fs/f2fs/f2fs.h                                     | 20 --------
 fs/f2fs/file.c                                     |  4 +-
 fs/f2fs/segment.c                                  | 14 +++---
 fs/select.c                                        |  5 +-
 fs/seq_file.c                                      | 16 +-----
 fs/xattr.c                                         | 27 ++++------
 include/linux/kvm_host.h                           |  2 -
 include/linux/mlx5/driver.h                        |  7 +--
 include/linux/mm.h                                 | 22 +++++++++
 include/linux/vmalloc.h                            |  1 +
 ipc/util.c                                         |  7 +--
 kernel/bpf/syscall.c                               | 19 ++------
 lib/iov_iter.c                                     |  5 +-
 lib/rhashtable.c                                   | 13 ++---
 mm/frame_vector.c                                  |  5 +-
 mm/nommu.c                                         |  5 ++
 mm/util.c                                          | 57 ++++++++++++++++++++++
 mm/vmalloc.c                                       |  9 +++-
 net/core/dev.c                                     | 24 ++++-----
 net/ipv4/inet_hashtables.c                         |  6 +--
 net/ipv4/tcp_metrics.c                             |  5 +-
 net/ipv6/ila/ila_xlat.c                            |  8 +--
 net/mpls/af_mpls.c                                 |  5 +-
 net/netfilter/x_tables.c                           | 37 ++++----------
 net/netfilter/xt_recent.c                          |  5 +-
 net/sched/sch_choke.c                              |  5 +-
 net/sched/sch_fq.c                                 | 12 +----
 net/sched/sch_fq_codel.c                           | 26 +++-------
 net/sched/sch_hhf.c                                | 33 ++++---------
 net/sched/sch_netem.c                              |  6 +--
 net/sched/sch_sfq.c                                |  6 +--
 security/apparmor/apparmorfs.c                     |  2 +-
 security/apparmor/include/lib.h                    | 11 -----
 security/apparmor/lib.c                            | 30 ------------
 security/apparmor/match.c                          |  2 +-
 security/apparmor/policy_unpack.c                  |  2 +-
 security/keys/keyctl.c                             | 22 +++------
 virt/kvm/kvm_main.c                                | 18 ++-----
 76 files changed, 279 insertions(+), 583 deletions(-)

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 0/6 v3] kvmalloc
@ 2017-01-30  9:49 ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-30  9:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, Alexei Starovoitov, Andreas Dilger,
	Andreas Dilger, Andrey Konovalov, Anton Vorontsov, Ben Skeggs,
	Boris Ostrovsky, Christian Borntraeger, Colin Cross,
	Daniel Borkmann, Dan Williams, David Sterba, Eric Dumazet,
	Eric Dumazet, Hariprasad S, Heiko Carstens, Herbert Xu,
	Ilya Dryomov, John Hubbard, Kees Cook, Kent Overstreet,
	Marcelo Ricardo Leitner, Martin Schwidefsky, Michael S. Tsirkin,
	Michal Hocko, Mike Snitzer, Mikulas Patocka, Oleg Drokin,
	Pablo Neira Ayuso, Rafael J. Wysocki, Santosh Raspatur,
	Tariq Toukan, Tom Herbert, Tony Luck, Yan, Zheng, Yishai Hadas

Hi,
this has been previously posted here [1] and it received quite some
feedback. As a result the number of patches has grown again. We are at
9 patches right now. I have rebased the series on top of the current
next-20170130. There were some changes since the last posting, namely
a7f6c1b63b86 ("AppArmor: Use GFP_KERNEL for __aa_kvmalloc().") which
dropped GFP_NOIO from __aa_kvmalloc and d407bd25a204 ("bpf: don't
trigger OOM killer under pressure with map alloc") which has created a
kvmalloc alternative for bpf code. Both have been changed to use the mm
kvmalloc but it is worth noting this dependency during the merge window.

I hope there are no further obstacles to have this merged into the mmotm
tree and go in in the next merge window.

Original cover:

There are many open coded kmalloc with vmalloc fallback instances in
the tree.  Most of them are not careful enough or simply do not care
about the underlying semantic of the kmalloc/page allocator which means
that a) some vmalloc fallbacks are basically unreachable because the
kmalloc part will keep retrying until it succeeds b) the page allocator
can invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers, I could find, have been converted to use the helper
instead.  This is patch 5. There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.

[1] http://lkml.kernel.org/r/20170112153717.28943-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

Michal Hocko (9):
      mm: introduce kv[mz]alloc helpers
      mm: support __GFP_REPEAT in kvmalloc_node for >32kB
      rhashtable: simplify a strange allocation pattern
      ila: simplify a strange allocation pattern
      treewide: use kv[mz]alloc* rather than opencoded variants
      net: use kvmalloc with __GFP_REPEAT rather than open coded variant
      md: use kvmalloc rather than opencoded variant
      bcache: use kvmalloc
      net, bpf: use kvzalloc helper

 arch/s390/kvm/kvm-s390.c                           | 10 +---
 arch/x86/kvm/lapic.c                               |  4 +-
 arch/x86/kvm/page_track.c                          |  4 +-
 arch/x86/kvm/x86.c                                 |  4 +-
 crypto/lzo.c                                       |  4 +-
 drivers/acpi/apei/erst.c                           |  8 +--
 drivers/char/agp/generic.c                         |  8 +--
 drivers/gpu/drm/nouveau/nouveau_gem.c              |  4 +-
 drivers/md/bcache/super.c                          |  8 +--
 drivers/md/bcache/util.h                           | 12 +----
 drivers/md/dm-ioctl.c                              | 13 ++---
 drivers/md/dm-stats.c                              |  7 +--
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_defs.h    |  3 --
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 29 ++---------
 drivers/net/ethernet/chelsio/cxgb3/l2t.c           |  8 +--
 drivers/net/ethernet/chelsio/cxgb3/l2t.h           |  1 -
 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c      | 12 ++---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |  3 --
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 10 ++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c |  8 +--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 31 ++----------
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c  | 13 +++--
 drivers/net/ethernet/chelsio/cxgb4/l2t.c           |  2 +-
 drivers/net/ethernet/chelsio/cxgb4/sched.c         | 12 ++---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |  9 ++--
 drivers/net/ethernet/mellanox/mlx4/mr.c            |  9 ++--
 drivers/nvdimm/dimm_devs.c                         |  5 +-
 .../staging/lustre/lnet/libcfs/linux/linux-mem.c   | 11 +----
 drivers/vhost/net.c                                |  9 ++--
 drivers/vhost/vhost.c                              | 15 ++----
 drivers/vhost/vsock.c                              |  9 ++--
 drivers/xen/evtchn.c                               | 14 +-----
 fs/btrfs/ctree.c                                   |  9 ++--
 fs/btrfs/ioctl.c                                   |  9 ++--
 fs/btrfs/send.c                                    | 27 ++++------
 fs/ceph/file.c                                     |  9 ++--
 fs/ext4/mballoc.c                                  |  2 +-
 fs/ext4/super.c                                    |  4 +-
 fs/f2fs/f2fs.h                                     | 20 --------
 fs/f2fs/file.c                                     |  4 +-
 fs/f2fs/segment.c                                  | 14 +++---
 fs/select.c                                        |  5 +-
 fs/seq_file.c                                      | 16 +-----
 fs/xattr.c                                         | 27 ++++------
 include/linux/kvm_host.h                           |  2 -
 include/linux/mlx5/driver.h                        |  7 +--
 include/linux/mm.h                                 | 22 +++++++++
 include/linux/vmalloc.h                            |  1 +
 ipc/util.c                                         |  7 +--
 kernel/bpf/syscall.c                               | 19 ++------
 lib/iov_iter.c                                     |  5 +-
 lib/rhashtable.c                                   | 13 ++---
 mm/frame_vector.c                                  |  5 +-
 mm/nommu.c                                         |  5 ++
 mm/util.c                                          | 57 ++++++++++++++++++++++
 mm/vmalloc.c                                       |  9 +++-
 net/core/dev.c                                     | 24 ++++-----
 net/ipv4/inet_hashtables.c                         |  6 +--
 net/ipv4/tcp_metrics.c                             |  5 +-
 net/ipv6/ila/ila_xlat.c                            |  8 +--
 net/mpls/af_mpls.c                                 |  5 +-
 net/netfilter/x_tables.c                           | 37 ++++----------
 net/netfilter/xt_recent.c                          |  5 +-
 net/sched/sch_choke.c                              |  5 +-
 net/sched/sch_fq.c                                 | 12 +----
 net/sched/sch_fq_codel.c                           | 26 +++-------
 net/sched/sch_hhf.c                                | 33 ++++---------
 net/sched/sch_netem.c                              |  6 +--
 net/sched/sch_sfq.c                                |  6 +--
 security/apparmor/apparmorfs.c                     |  2 +-
 security/apparmor/include/lib.h                    | 11 -----
 security/apparmor/lib.c                            | 30 ------------
 security/apparmor/match.c                          |  2 +-
 security/apparmor/policy_unpack.c                  |  2 +-
 security/keys/keyctl.c                             | 22 +++------
 virt/kvm/kvm_main.c                                | 18 ++-----
 76 files changed, 279 insertions(+), 583 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-27 20:12                       ` Daniel Borkmann
@ 2017-01-30  7:56                         ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-30  7:56 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Fri 27-01-17 21:12:26, Daniel Borkmann wrote:
> On 01/27/2017 11:05 AM, Michal Hocko wrote:
> > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
[...]
> > > So to answer your second email with the bpf and netfilter hunks, why
> > > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
> > > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
> > > is not harmful though has only /partial/ effect right now and that full
> > > support needs to be implemented in future. That would still be better
> > > that not having it, imo, and the FIXME would make expectations clear
> > > to anyone reading that code.
> > 
> > Well, we can do that, I just would like to prevent from this (ab)use
> > if there is no _real_ and _sensible_ usecase for it. Having a real bug
> 
> Understandable.
> 
> > report or a fallback mechanism you are mentioning above would justify
> > the (ab)use IMHO. But that abuse would be documented properly and have a
> > real reason to exist. That sounds like a better approach to me.
> > 
> > But if you absolutely _insist_ I can change that.
> 
> Yeah, please do (with a big FIXME comment as mentioned), this originally
> came from a real bug report. Anyway, feel free to add my Acked-by then.

Thanks! I will repost the whole series today.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-30  7:56                         ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-30  7:56 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Fri 27-01-17 21:12:26, Daniel Borkmann wrote:
> On 01/27/2017 11:05 AM, Michal Hocko wrote:
> > On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
[...]
> > > So to answer your second email with the bpf and netfilter hunks, why
> > > not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
> > > big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
> > > is not harmful though has only /partial/ effect right now and that full
> > > support needs to be implemented in future. That would still be better
> > > that not having it, imo, and the FIXME would make expectations clear
> > > to anyone reading that code.
> > 
> > Well, we can do that, I just would like to prevent from this (ab)use
> > if there is no _real_ and _sensible_ usecase for it. Having a real bug
> 
> Understandable.
> 
> > report or a fallback mechanism you are mentioning above would justify
> > the (ab)use IMHO. But that abuse would be documented properly and have a
> > real reason to exist. That sounds like a better approach to me.
> > 
> > But if you absolutely _insist_ I can change that.
> 
> Yeah, please do (with a big FIXME comment as mentioned), this originally
> came from a real bug report. Anyway, feel free to add my Acked-by then.

Thanks! I will repost the whole series today.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-27 10:05                     ` Michal Hocko
@ 2017-01-27 20:12                       ` Daniel Borkmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-27 20:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/27/2017 11:05 AM, Michal Hocko wrote:
> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
>> On 01/26/2017 02:40 PM, Michal Hocko wrote:
> [...]
>>> But realistically, how big is this problem really? Is it really worth
>>> it? You said this is an admin only interface and admin can kill the
>>> machine by OOM and other means already.
>>>
>>> Moreover and I should probably mention it explicitly, your d407bd25a204b
>>> reduced the likelyhood of oom for other reason. kmalloc used GPF_USER
>>> previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this
>>> could indeed hit the OOM e.g. due to memory fragmentation. It would be
>>> much harder to hit the OOM killer from vmalloc which doesn't issue
>>> higher order allocation requests. Or have you ever seen the OOM killer
>>> pointing to the vmalloc fallback path?
>>
>> The case I was concerned about was from vmalloc() path, not kmalloc().
>> That was where the stack trace indicating OOM pointed to. As an example,
>> there could be really large allocation requests for maps where the map
>> has pre-allocated memory for its elements. Thus, if we get to the point
>> where we need to kill others due to shortage of mem for satisfying this,
>> I'd much much rather prefer to just not let vmalloc() work really hard
>> and fail early on instead.
>
> I see, but as already mentioned, chances are that by the time you get
> close to the OOM somebody else will hit the OOM before the vmalloc path
> manages to free the allocated memory.
>
>> In my (crafted) test case, I was connected
>> via ssh and it each time reliably killed my connection, which is really
>> suboptimal.
>>
>> F.e., I could also imagine a buggy or miscalculated map definition for
>> a prog that is provisioned to multiple places, which then accidentally
>> triggers this. Or if large on purpose, but we crossed the line, it
>> could be handled more gracefully, f.e. I could imagine an option to
>> falling back to a non-pre-allocated map flavor from the application
>> loading the program. Trade-off for sure, but still allowing it to
>> operate up to a certain extend. Granted, if vmalloc() succeeded without
>> trying hard and we then OOM elsewhere, too bad, but we don't have much
>> control over that one anyway, only about our own request. Reason I
>> asked above was whether having __GFP_NORETRY in would be fatal
>> somewhere down the path, but seems not as you say.
>>
>> So to answer your second email with the bpf and netfilter hunks, why
>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
>> is not harmful though has only /partial/ effect right now and that full
>> support needs to be implemented in future. That would still be better
>> that not having it, imo, and the FIXME would make expectations clear
>> to anyone reading that code.
>
> Well, we can do that, I just would like to prevent from this (ab)use
> if there is no _real_ and _sensible_ usecase for it. Having a real bug

Understandable.

> report or a fallback mechanism you are mentioning above would justify
> the (ab)use IMHO. But that abuse would be documented properly and have a
> real reason to exist. That sounds like a better approach to me.
>
> But if you absolutely _insist_ I can change that.

Yeah, please do (with a big FIXME comment as mentioned), this originally
came from a real bug report. Anyway, feel free to add my Acked-by then.

Thanks again,
Daniel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-27 20:12                       ` Daniel Borkmann
  0 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-27 20:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/27/2017 11:05 AM, Michal Hocko wrote:
> On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
>> On 01/26/2017 02:40 PM, Michal Hocko wrote:
> [...]
>>> But realistically, how big is this problem really? Is it really worth
>>> it? You said this is an admin only interface and admin can kill the
>>> machine by OOM and other means already.
>>>
>>> Moreover and I should probably mention it explicitly, your d407bd25a204b
>>> reduced the likelyhood of oom for other reason. kmalloc used GPF_USER
>>> previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this
>>> could indeed hit the OOM e.g. due to memory fragmentation. It would be
>>> much harder to hit the OOM killer from vmalloc which doesn't issue
>>> higher order allocation requests. Or have you ever seen the OOM killer
>>> pointing to the vmalloc fallback path?
>>
>> The case I was concerned about was from vmalloc() path, not kmalloc().
>> That was where the stack trace indicating OOM pointed to. As an example,
>> there could be really large allocation requests for maps where the map
>> has pre-allocated memory for its elements. Thus, if we get to the point
>> where we need to kill others due to shortage of mem for satisfying this,
>> I'd much much rather prefer to just not let vmalloc() work really hard
>> and fail early on instead.
>
> I see, but as already mentioned, chances are that by the time you get
> close to the OOM somebody else will hit the OOM before the vmalloc path
> manages to free the allocated memory.
>
>> In my (crafted) test case, I was connected
>> via ssh and it each time reliably killed my connection, which is really
>> suboptimal.
>>
>> F.e., I could also imagine a buggy or miscalculated map definition for
>> a prog that is provisioned to multiple places, which then accidentally
>> triggers this. Or if large on purpose, but we crossed the line, it
>> could be handled more gracefully, f.e. I could imagine an option to
>> falling back to a non-pre-allocated map flavor from the application
>> loading the program. Trade-off for sure, but still allowing it to
>> operate up to a certain extend. Granted, if vmalloc() succeeded without
>> trying hard and we then OOM elsewhere, too bad, but we don't have much
>> control over that one anyway, only about our own request. Reason I
>> asked above was whether having __GFP_NORETRY in would be fatal
>> somewhere down the path, but seems not as you say.
>>
>> So to answer your second email with the bpf and netfilter hunks, why
>> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
>> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
>> is not harmful though has only /partial/ effect right now and that full
>> support needs to be implemented in future. That would still be better
>> that not having it, imo, and the FIXME would make expectations clear
>> to anyone reading that code.
>
> Well, we can do that, I just would like to prevent from this (ab)use
> if there is no _real_ and _sensible_ usecase for it. Having a real bug

Understandable.

> report or a fallback mechanism you are mentioning above would justify
> the (ab)use IMHO. But that abuse would be documented properly and have a
> real reason to exist. That sounds like a better approach to me.
>
> But if you absolutely _insist_ I can change that.

Yeah, please do (with a big FIXME comment as mentioned), this originally
came from a real bug report. Anyway, feel free to add my Acked-by then.

Thanks again,
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 20:34                   ` Daniel Borkmann
@ 2017-01-27 10:05                     ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-27 10:05 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
> On 01/26/2017 02:40 PM, Michal Hocko wrote:
[...]
> > But realistically, how big is this problem really? Is it really worth
> > it? You said this is an admin only interface and admin can kill the
> > machine by OOM and other means already.
> > 
> > Moreover and I should probably mention it explicitly, your d407bd25a204b
> > reduced the likelyhood of oom for other reason. kmalloc used GPF_USER
> > previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this
> > could indeed hit the OOM e.g. due to memory fragmentation. It would be
> > much harder to hit the OOM killer from vmalloc which doesn't issue
> > higher order allocation requests. Or have you ever seen the OOM killer
> > pointing to the vmalloc fallback path?
> 
> The case I was concerned about was from vmalloc() path, not kmalloc().
> That was where the stack trace indicating OOM pointed to. As an example,
> there could be really large allocation requests for maps where the map
> has pre-allocated memory for its elements. Thus, if we get to the point
> where we need to kill others due to shortage of mem for satisfying this,
> I'd much much rather prefer to just not let vmalloc() work really hard
> and fail early on instead. 

I see, but as already mentioned, chances are that by the time you get
close to the OOM somebody else will hit the OOM before the vmalloc path
manages to free the allocated memory.

> In my (crafted) test case, I was connected
> via ssh and it each time reliably killed my connection, which is really
> suboptimal.
> 
> F.e., I could also imagine a buggy or miscalculated map definition for
> a prog that is provisioned to multiple places, which then accidentally
> triggers this. Or if large on purpose, but we crossed the line, it
> could be handled more gracefully, f.e. I could imagine an option to
> falling back to a non-pre-allocated map flavor from the application
> loading the program. Trade-off for sure, but still allowing it to
> operate up to a certain extend. Granted, if vmalloc() succeeded without
> trying hard and we then OOM elsewhere, too bad, but we don't have much
> control over that one anyway, only about our own request. Reason I
> asked above was whether having __GFP_NORETRY in would be fatal
> somewhere down the path, but seems not as you say.
> 
> So to answer your second email with the bpf and netfilter hunks, why
> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
> is not harmful though has only /partial/ effect right now and that full
> support needs to be implemented in future. That would still be better
> that not having it, imo, and the FIXME would make expectations clear
> to anyone reading that code.

Well, we can do that, I just would like to prevent from this (ab)use
if there is no _real_ and _sensible_ usecase for it. Having a real bug
report or a fallback mechanism you are mentioning above would justify
the (ab)use IMHO. But that abuse would be documented properly and have a
real reason to exist. That sounds like a better approach to me.

But if you absolutely _insist_ I can change that.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-27 10:05                     ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-27 10:05 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 21:34:04, Daniel Borkmann wrote:
> On 01/26/2017 02:40 PM, Michal Hocko wrote:
[...]
> > But realistically, how big is this problem really? Is it really worth
> > it? You said this is an admin only interface and admin can kill the
> > machine by OOM and other means already.
> > 
> > Moreover and I should probably mention it explicitly, your d407bd25a204b
> > reduced the likelyhood of oom for other reason. kmalloc used GPF_USER
> > previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this
> > could indeed hit the OOM e.g. due to memory fragmentation. It would be
> > much harder to hit the OOM killer from vmalloc which doesn't issue
> > higher order allocation requests. Or have you ever seen the OOM killer
> > pointing to the vmalloc fallback path?
> 
> The case I was concerned about was from vmalloc() path, not kmalloc().
> That was where the stack trace indicating OOM pointed to. As an example,
> there could be really large allocation requests for maps where the map
> has pre-allocated memory for its elements. Thus, if we get to the point
> where we need to kill others due to shortage of mem for satisfying this,
> I'd much much rather prefer to just not let vmalloc() work really hard
> and fail early on instead. 

I see, but as already mentioned, chances are that by the time you get
close to the OOM somebody else will hit the OOM before the vmalloc path
manages to free the allocated memory.

> In my (crafted) test case, I was connected
> via ssh and it each time reliably killed my connection, which is really
> suboptimal.
> 
> F.e., I could also imagine a buggy or miscalculated map definition for
> a prog that is provisioned to multiple places, which then accidentally
> triggers this. Or if large on purpose, but we crossed the line, it
> could be handled more gracefully, f.e. I could imagine an option to
> falling back to a non-pre-allocated map flavor from the application
> loading the program. Trade-off for sure, but still allowing it to
> operate up to a certain extend. Granted, if vmalloc() succeeded without
> trying hard and we then OOM elsewhere, too bad, but we don't have much
> control over that one anyway, only about our own request. Reason I
> asked above was whether having __GFP_NORETRY in would be fatal
> somewhere down the path, but seems not as you say.
> 
> So to answer your second email with the bpf and netfilter hunks, why
> not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
> big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
> is not harmful though has only /partial/ effect right now and that full
> support needs to be implemented in future. That would still be better
> that not having it, imo, and the FIXME would make expectations clear
> to anyone reading that code.

Well, we can do that, I just would like to prevent from this (ab)use
if there is no _real_ and _sensible_ usecase for it. Having a real bug
report or a fallback mechanism you are mentioning above would justify
the (ab)use IMHO. But that abuse would be documented properly and have a
real reason to exist. That sounds like a better approach to me.

But if you absolutely _insist_ I can change that.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 13:40                 ` Michal Hocko
@ 2017-01-26 20:34                   ` Daniel Borkmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26 20:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 02:40 PM, Michal Hocko wrote:
> On Thu 26-01-17 14:10:06, Daniel Borkmann wrote:
>> On 01/26/2017 12:58 PM, Michal Hocko wrote:
>>> On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
>>>> On 01/26/2017 11:08 AM, Michal Hocko wrote:
>>> [...]
>>>>> If you disagree I can drop the bpf part of course...
>>>>
>>>> If we could consolidate these spots with kvmalloc() eventually, I'm
>>>> all for it. But even if __GFP_NORETRY is not covered down to all
>>>> possible paths, it kind of does have an effect already of saying
>>>> 'don't try too hard', so would it be harmful to still keep that for
>>>> now? If it's not, I'd personally prefer to just leave it as is until
>>>> there's some form of support by kvmalloc() and friends.
>>>
>>> Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
>>> disallowed. It is not _supported_ which means that if it doesn't work as
>>> you expect you are on your own. Which is actually the situation right
>>> now as well. But I still think that this is just not right thing to do.
>>> Even though it might happen to work in some cases it gives a false
>>> impression of a solution. So I would rather go with
>>
>> Hmm. 'On my own' means, we could potentially BUG somewhere down the
>> vmalloc implementation, etc, presumably? So it might in-fact be
>> harmful to pass that, right?
>
> No it would mean that it might eventually hit the behavior which you are
> trying to avoid - in other words it may invoke OOM killer even though
> __GFP_NORETRY means giving up before any system wide disruptive actions
> a re taken.

Ok, thanks for clarifying, more on that further below.

>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>> index 8697f43cf93c..a6dc4d596f14 100644
>>> --- a/kernel/bpf/syscall.c
>>> +++ b/kernel/bpf/syscall.c
>>> @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>>>
>>>    void *bpf_map_area_alloc(size_t size)
>>>    {
>>> +	/*
>>> +	 * FIXME: we would really like to not trigger the OOM killer and rather
>>> +	 * fail instead. This is not supported right now. Please nag MM people
>>> +	 * if these OOM start bothering people.
>>> +	 */
>>
>> Ok, I know this is out of scope for this series, but since i) this
>> is _not_ the _only_ spot right now which has such a construct and ii)
>> I am already kind of nagging a bit ;), my question would be, what
>> would it take to start supporting it?
>
> propagate gfp mask all the way down from vmalloc to all places which
> might allocate down the path and especially page table allocation
> function are PITA because they are really deep. This is a lot of work...
>
> But realistically, how big is this problem really? Is it really worth
> it? You said this is an admin only interface and admin can kill the
> machine by OOM and other means already.
>
> Moreover and I should probably mention it explicitly, your d407bd25a204b
> reduced the likelyhood of oom for other reason. kmalloc used GPF_USER
> previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this
> could indeed hit the OOM e.g. due to memory fragmentation. It would be
> much harder to hit the OOM killer from vmalloc which doesn't issue
> higher order allocation requests. Or have you ever seen the OOM killer
> pointing to the vmalloc fallback path?

The case I was concerned about was from vmalloc() path, not kmalloc().
That was where the stack trace indicating OOM pointed to. As an example,
there could be really large allocation requests for maps where the map
has pre-allocated memory for its elements. Thus, if we get to the point
where we need to kill others due to shortage of mem for satisfying this,
I'd much much rather prefer to just not let vmalloc() work really hard
and fail early on instead. In my (crafted) test case, I was connected
via ssh and it each time reliably killed my connection, which is really
suboptimal.

F.e., I could also imagine a buggy or miscalculated map definition for
a prog that is provisioned to multiple places, which then accidentally
triggers this. Or if large on purpose, but we crossed the line, it
could be handled more gracefully, f.e. I could imagine an option to
falling back to a non-pre-allocated map flavor from the application
loading the program. Trade-off for sure, but still allowing it to
operate up to a certain extend. Granted, if vmalloc() succeeded without
trying hard and we then OOM elsewhere, too bad, but we don't have much
control over that one anyway, only about our own request. Reason I
asked above was whether having __GFP_NORETRY in would be fatal
somewhere down the path, but seems not as you say.

So to answer your second email with the bpf and netfilter hunks, why
not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
is not harmful though has only /partial/ effect right now and that full
support needs to be implemented in future. That would still be better
that not having it, imo, and the FIXME would make expectations clear
to anyone reading that code.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 20:34                   ` Daniel Borkmann
  0 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26 20:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 02:40 PM, Michal Hocko wrote:
> On Thu 26-01-17 14:10:06, Daniel Borkmann wrote:
>> On 01/26/2017 12:58 PM, Michal Hocko wrote:
>>> On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
>>>> On 01/26/2017 11:08 AM, Michal Hocko wrote:
>>> [...]
>>>>> If you disagree I can drop the bpf part of course...
>>>>
>>>> If we could consolidate these spots with kvmalloc() eventually, I'm
>>>> all for it. But even if __GFP_NORETRY is not covered down to all
>>>> possible paths, it kind of does have an effect already of saying
>>>> 'don't try too hard', so would it be harmful to still keep that for
>>>> now? If it's not, I'd personally prefer to just leave it as is until
>>>> there's some form of support by kvmalloc() and friends.
>>>
>>> Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
>>> disallowed. It is not _supported_ which means that if it doesn't work as
>>> you expect you are on your own. Which is actually the situation right
>>> now as well. But I still think that this is just not right thing to do.
>>> Even though it might happen to work in some cases it gives a false
>>> impression of a solution. So I would rather go with
>>
>> Hmm. 'On my own' means, we could potentially BUG somewhere down the
>> vmalloc implementation, etc, presumably? So it might in-fact be
>> harmful to pass that, right?
>
> No it would mean that it might eventually hit the behavior which you are
> trying to avoid - in other words it may invoke OOM killer even though
> __GFP_NORETRY means giving up before any system wide disruptive actions
> a re taken.

Ok, thanks for clarifying, more on that further below.

>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>> index 8697f43cf93c..a6dc4d596f14 100644
>>> --- a/kernel/bpf/syscall.c
>>> +++ b/kernel/bpf/syscall.c
>>> @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>>>
>>>    void *bpf_map_area_alloc(size_t size)
>>>    {
>>> +	/*
>>> +	 * FIXME: we would really like to not trigger the OOM killer and rather
>>> +	 * fail instead. This is not supported right now. Please nag MM people
>>> +	 * if these OOM start bothering people.
>>> +	 */
>>
>> Ok, I know this is out of scope for this series, but since i) this
>> is _not_ the _only_ spot right now which has such a construct and ii)
>> I am already kind of nagging a bit ;), my question would be, what
>> would it take to start supporting it?
>
> propagate gfp mask all the way down from vmalloc to all places which
> might allocate down the path and especially page table allocation
> function are PITA because they are really deep. This is a lot of work...
>
> But realistically, how big is this problem really? Is it really worth
> it? You said this is an admin only interface and admin can kill the
> machine by OOM and other means already.
>
> Moreover and I should probably mention it explicitly, your d407bd25a204b
> reduced the likelyhood of oom for other reason. kmalloc used GPF_USER
> previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this
> could indeed hit the OOM e.g. due to memory fragmentation. It would be
> much harder to hit the OOM killer from vmalloc which doesn't issue
> higher order allocation requests. Or have you ever seen the OOM killer
> pointing to the vmalloc fallback path?

The case I was concerned about was from vmalloc() path, not kmalloc().
That was where the stack trace indicating OOM pointed to. As an example,
there could be really large allocation requests for maps where the map
has pre-allocated memory for its elements. Thus, if we get to the point
where we need to kill others due to shortage of mem for satisfying this,
I'd much much rather prefer to just not let vmalloc() work really hard
and fail early on instead. In my (crafted) test case, I was connected
via ssh and it each time reliably killed my connection, which is really
suboptimal.

F.e., I could also imagine a buggy or miscalculated map definition for
a prog that is provisioned to multiple places, which then accidentally
triggers this. Or if large on purpose, but we crossed the line, it
could be handled more gracefully, f.e. I could imagine an option to
falling back to a non-pre-allocated map flavor from the application
loading the program. Trade-off for sure, but still allowing it to
operate up to a certain extend. Granted, if vmalloc() succeeded without
trying hard and we then OOM elsewhere, too bad, but we don't have much
control over that one anyway, only about our own request. Reason I
asked above was whether having __GFP_NORETRY in would be fatal
somewhere down the path, but seems not as you say.

So to answer your second email with the bpf and netfilter hunks, why
not replacing them with kvmalloc() and __GFP_NORETRY flag and add that
big fat FIXME comment above there, saying explicitly that __GFP_NORETRY
is not harmful though has only /partial/ effect right now and that full
support needs to be implemented in future. That would still be better
that not having it, imo, and the FIXME would make expectations clear
to anyone reading that code.

Thanks,
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 13:40                 ` Michal Hocko
  (?)
@ 2017-01-26 14:13                   ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 14:13 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 14:40:04, Michal Hocko wrote:
> On Thu 26-01-17 14:10:06, Daniel Borkmann wrote:
> > On 01/26/2017 12:58 PM, Michal Hocko wrote:
> > > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
> > > > On 01/26/2017 11:08 AM, Michal Hocko wrote:
> > > [...]
> > > > > If you disagree I can drop the bpf part of course...
> > > > 
> > > > If we could consolidate these spots with kvmalloc() eventually, I'm
> > > > all for it. But even if __GFP_NORETRY is not covered down to all
> > > > possible paths, it kind of does have an effect already of saying
> > > > 'don't try too hard', so would it be harmful to still keep that for
> > > > now? If it's not, I'd personally prefer to just leave it as is until
> > > > there's some form of support by kvmalloc() and friends.
> > > 
> > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
> > > disallowed. It is not _supported_ which means that if it doesn't work as
> > > you expect you are on your own. Which is actually the situation right
> > > now as well. But I still think that this is just not right thing to do.
> > > Even though it might happen to work in some cases it gives a false
> > > impression of a solution. So I would rather go with
> > 
> > Hmm. 'On my own' means, we could potentially BUG somewhere down the
> > vmalloc implementation, etc, presumably? So it might in-fact be
> > harmful to pass that, right?
> 
> No it would mean that it might eventually hit the behavior which you are
> trying to avoid - in other words it may invoke OOM killer even though
> __GFP_NORETRY means giving up before any system wide disruptive actions
> a re taken.

I will separate both bpf and netfilter hunks into its own patch with the
clarification. Does the following look better?
---
>From ab6b2d724228e4abcc69c44f5ab1ce91009aa91d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 26 Jan 2017 14:59:21 +0100
Subject: [PATCH] net, bpf: use kvzalloc helper

both bpf_map_area_alloc and xt_alloc_table_info try really hard to
play nicely with large memory requests which can be triggered from
the userspace (by an admin). See 5bad87348c70 ("netfilter: x_tables:
avoid warn and OOM killer on vmalloc call") resp. d407bd25a204 ("bpf:
don't trigger OOM killer under pressure with map alloc").

The current allocation pattern strongly resembles kvmalloc helper except
for one thing __GFP_NORETRY is not used for the vmalloc fallback. The
main reason why kvmalloc doesn't really support __GFP_NORETRY is
because vmalloc doesn't support this flag properly and it is far from
straightforward to make it understand it because there are some hard
coded GFP_KERNEL allocation deep in the call chains. This patch simply
replaces the open coded variants with kvmalloc and puts a note to
push on MM people to support __GFP_NORETRY in kvmalloc it this turns out
to be really needed along with OOM report pointing at vmalloc.

If there is an immediate need and no full support yet then
	kvmalloc(size, gfp | __GFP_NORETRY)
will work as good as __vmalloc(gfp | __GFP_NORETRY) - in other words it
might trigger the OOM in some cases.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/bpf/syscall.c     | 19 +++++--------------
 net/netfilter/x_tables.c | 16 ++++++----------
 2 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19b6129eab23..a6dc4d596f14 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -53,21 +53,12 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
 
 void *bpf_map_area_alloc(size_t size)
 {
-	/* We definitely need __GFP_NORETRY, so OOM killer doesn't
-	 * trigger under memory pressure as we really just want to
-	 * fail instead.
+	/*
+	 * FIXME: we would really like to not trigger the OOM killer and rather
+	 * fail instead. This is not supported right now. Please nag MM people
+	 * if these OOM start bothering people.
 	 */
-	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
-	void *area;
-
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		area = kmalloc(size, GFP_USER | flags);
-		if (area != NULL)
-			return area;
-	}
-
-	return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
-			 PAGE_KERNEL);
+	return kvzalloc(size, GFP_USER);
 }
 
 void bpf_map_area_free(void *area)
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d529989f5791..ba8ba633da72 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -995,16 +995,12 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (!info) {
-		info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN |
-				     __GFP_NORETRY | __GFP_HIGHMEM,
-				 PAGE_KERNEL);
-		if (!info)
-			return NULL;
-	}
-	memset(info, 0, sizeof(*info));
+	/*
+	 * FIXME: we would really like to not trigger the OOM killer and rather
+	 * fail instead. This is not supported right now. Please nag MM people
+	 * if these OOM start bothering people.
+	 */
+	info = kvzalloc(sz, GFP_KERNEL);
 	info->size = size;
 	return info;
 }
-- 
2.11.0


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 14:13                   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 14:13 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 14:40:04, Michal Hocko wrote:
> On Thu 26-01-17 14:10:06, Daniel Borkmann wrote:
> > On 01/26/2017 12:58 PM, Michal Hocko wrote:
> > > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
> > > > On 01/26/2017 11:08 AM, Michal Hocko wrote:
> > > [...]
> > > > > If you disagree I can drop the bpf part of course...
> > > > 
> > > > If we could consolidate these spots with kvmalloc() eventually, I'm
> > > > all for it. But even if __GFP_NORETRY is not covered down to all
> > > > possible paths, it kind of does have an effect already of saying
> > > > 'don't try too hard', so would it be harmful to still keep that for
> > > > now? If it's not, I'd personally prefer to just leave it as is until
> > > > there's some form of support by kvmalloc() and friends.
> > > 
> > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
> > > disallowed. It is not _supported_ which means that if it doesn't work as
> > > you expect you are on your own. Which is actually the situation right
> > > now as well. But I still think that this is just not right thing to do.
> > > Even though it might happen to work in some cases it gives a false
> > > impression of a solution. So I would rather go with
> > 
> > Hmm. 'On my own' means, we could potentially BUG somewhere down the
> > vmalloc implementation, etc, presumably? So it might in-fact be
> > harmful to pass that, right?
> 
> No it would mean that it might eventually hit the behavior which you are
> trying to avoid - in other words it may invoke OOM killer even though
> __GFP_NORETRY means giving up before any system wide disruptive actions
> a re taken.

I will separate both bpf and netfilter hunks into its own patch with the
clarification. Does the following look better?
---
>From ab6b2d724228e4abcc69c44f5ab1ce91009aa91d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 26 Jan 2017 14:59:21 +0100
Subject: [PATCH] net, bpf: use kvzalloc helper

both bpf_map_area_alloc and xt_alloc_table_info try really hard to
play nicely with large memory requests which can be triggered from
the userspace (by an admin). See 5bad87348c70 ("netfilter: x_tables:
avoid warn and OOM killer on vmalloc call") resp. d407bd25a204 ("bpf:
don't trigger OOM killer under pressure with map alloc").

The current allocation pattern strongly resembles kvmalloc helper except
for one thing __GFP_NORETRY is not used for the vmalloc fallback. The
main reason why kvmalloc doesn't really support __GFP_NORETRY is
because vmalloc doesn't support this flag properly and it is far from
straightforward to make it understand it because there are some hard
coded GFP_KERNEL allocation deep in the call chains. This patch simply
replaces the open coded variants with kvmalloc and puts a note to
push on MM people to support __GFP_NORETRY in kvmalloc it this turns out
to be really needed along with OOM report pointing at vmalloc.

If there is an immediate need and no full support yet then
	kvmalloc(size, gfp | __GFP_NORETRY)
will work as good as __vmalloc(gfp | __GFP_NORETRY) - in other words it
might trigger the OOM in some cases.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/bpf/syscall.c     | 19 +++++--------------
 net/netfilter/x_tables.c | 16 ++++++----------
 2 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19b6129eab23..a6dc4d596f14 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -53,21 +53,12 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
 
 void *bpf_map_area_alloc(size_t size)
 {
-	/* We definitely need __GFP_NORETRY, so OOM killer doesn't
-	 * trigger under memory pressure as we really just want to
-	 * fail instead.
+	/*
+	 * FIXME: we would really like to not trigger the OOM killer and rather
+	 * fail instead. This is not supported right now. Please nag MM people
+	 * if these OOM start bothering people.
 	 */
-	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
-	void *area;
-
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		area = kmalloc(size, GFP_USER | flags);
-		if (area != NULL)
-			return area;
-	}
-
-	return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
-			 PAGE_KERNEL);
+	return kvzalloc(size, GFP_USER);
 }
 
 void bpf_map_area_free(void *area)
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d529989f5791..ba8ba633da72 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -995,16 +995,12 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (!info) {
-		info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN |
-				     __GFP_NORETRY | __GFP_HIGHMEM,
-				 PAGE_KERNEL);
-		if (!info)
-			return NULL;
-	}
-	memset(info, 0, sizeof(*info));
+	/*
+	 * FIXME: we would really like to not trigger the OOM killer and rather
+	 * fail instead. This is not supported right now. Please nag MM people
+	 * if these OOM start bothering people.
+	 */
+	info = kvzalloc(sz, GFP_KERNEL);
 	info->size = size;
 	return info;
 }
-- 
2.11.0


-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 14:13                   ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 14:13 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 14:40:04, Michal Hocko wrote:
> On Thu 26-01-17 14:10:06, Daniel Borkmann wrote:
> > On 01/26/2017 12:58 PM, Michal Hocko wrote:
> > > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
> > > > On 01/26/2017 11:08 AM, Michal Hocko wrote:
> > > [...]
> > > > > If you disagree I can drop the bpf part of course...
> > > > 
> > > > If we could consolidate these spots with kvmalloc() eventually, I'm
> > > > all for it. But even if __GFP_NORETRY is not covered down to all
> > > > possible paths, it kind of does have an effect already of saying
> > > > 'don't try too hard', so would it be harmful to still keep that for
> > > > now? If it's not, I'd personally prefer to just leave it as is until
> > > > there's some form of support by kvmalloc() and friends.
> > > 
> > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
> > > disallowed. It is not _supported_ which means that if it doesn't work as
> > > you expect you are on your own. Which is actually the situation right
> > > now as well. But I still think that this is just not right thing to do.
> > > Even though it might happen to work in some cases it gives a false
> > > impression of a solution. So I would rather go with
> > 
> > Hmm. 'On my own' means, we could potentially BUG somewhere down the
> > vmalloc implementation, etc, presumably? So it might in-fact be
> > harmful to pass that, right?
> 
> No it would mean that it might eventually hit the behavior which you are
> trying to avoid - in other words it may invoke OOM killer even though
> __GFP_NORETRY means giving up before any system wide disruptive actions
> a re taken.

I will separate both bpf and netfilter hunks into its own patch with the
clarification. Does the following look better?
---

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 13:10               ` Daniel Borkmann
@ 2017-01-26 13:40                 ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 13:40 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 14:10:06, Daniel Borkmann wrote:
> On 01/26/2017 12:58 PM, Michal Hocko wrote:
> > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
> > > On 01/26/2017 11:08 AM, Michal Hocko wrote:
> > [...]
> > > > If you disagree I can drop the bpf part of course...
> > > 
> > > If we could consolidate these spots with kvmalloc() eventually, I'm
> > > all for it. But even if __GFP_NORETRY is not covered down to all
> > > possible paths, it kind of does have an effect already of saying
> > > 'don't try too hard', so would it be harmful to still keep that for
> > > now? If it's not, I'd personally prefer to just leave it as is until
> > > there's some form of support by kvmalloc() and friends.
> > 
> > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
> > disallowed. It is not _supported_ which means that if it doesn't work as
> > you expect you are on your own. Which is actually the situation right
> > now as well. But I still think that this is just not right thing to do.
> > Even though it might happen to work in some cases it gives a false
> > impression of a solution. So I would rather go with
> 
> Hmm. 'On my own' means, we could potentially BUG somewhere down the
> vmalloc implementation, etc, presumably? So it might in-fact be
> harmful to pass that, right?

No it would mean that it might eventually hit the behavior which you are
trying to avoid - in other words it may invoke OOM killer even though
__GFP_NORETRY means giving up before any system wide disruptive actions
a re taken.

> 
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 8697f43cf93c..a6dc4d596f14 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
> > 
> >   void *bpf_map_area_alloc(size_t size)
> >   {
> > +	/*
> > +	 * FIXME: we would really like to not trigger the OOM killer and rather
> > +	 * fail instead. This is not supported right now. Please nag MM people
> > +	 * if these OOM start bothering people.
> > +	 */
> 
> Ok, I know this is out of scope for this series, but since i) this
> is _not_ the _only_ spot right now which has such a construct and ii)
> I am already kind of nagging a bit ;), my question would be, what
> would it take to start supporting it?

propagate gfp mask all the way down from vmalloc to all places which
might allocate down the path and especially page table allocation
function are PITA because they are really deep. This is a lot of work...

But realistically, how big is this problem really? Is it really worth
it? You said this is an admin only interface and admin can kill the
machine by OOM and other means already.

Moreover and I should probably mention it explicitly, your d407bd25a204b
reduced the likelyhood of oom for other reason. kmalloc used GPF_USER
previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this
could indeed hit the OOM e.g. due to memory fragmentation. It would be
much harder to hit the OOM killer from vmalloc which doesn't issue
higher order allocation requests. Or have you ever seen the OOM killer
pointing to the vmalloc fallback path?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 13:40                 ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 13:40 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 14:10:06, Daniel Borkmann wrote:
> On 01/26/2017 12:58 PM, Michal Hocko wrote:
> > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
> > > On 01/26/2017 11:08 AM, Michal Hocko wrote:
> > [...]
> > > > If you disagree I can drop the bpf part of course...
> > > 
> > > If we could consolidate these spots with kvmalloc() eventually, I'm
> > > all for it. But even if __GFP_NORETRY is not covered down to all
> > > possible paths, it kind of does have an effect already of saying
> > > 'don't try too hard', so would it be harmful to still keep that for
> > > now? If it's not, I'd personally prefer to just leave it as is until
> > > there's some form of support by kvmalloc() and friends.
> > 
> > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
> > disallowed. It is not _supported_ which means that if it doesn't work as
> > you expect you are on your own. Which is actually the situation right
> > now as well. But I still think that this is just not right thing to do.
> > Even though it might happen to work in some cases it gives a false
> > impression of a solution. So I would rather go with
> 
> Hmm. 'On my own' means, we could potentially BUG somewhere down the
> vmalloc implementation, etc, presumably? So it might in-fact be
> harmful to pass that, right?

No it would mean that it might eventually hit the behavior which you are
trying to avoid - in other words it may invoke OOM killer even though
__GFP_NORETRY means giving up before any system wide disruptive actions
a re taken.

> 
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 8697f43cf93c..a6dc4d596f14 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
> > 
> >   void *bpf_map_area_alloc(size_t size)
> >   {
> > +	/*
> > +	 * FIXME: we would really like to not trigger the OOM killer and rather
> > +	 * fail instead. This is not supported right now. Please nag MM people
> > +	 * if these OOM start bothering people.
> > +	 */
> 
> Ok, I know this is out of scope for this series, but since i) this
> is _not_ the _only_ spot right now which has such a construct and ii)
> I am already kind of nagging a bit ;), my question would be, what
> would it take to start supporting it?

propagate gfp mask all the way down from vmalloc to all places which
might allocate down the path and especially page table allocation
function are PITA because they are really deep. This is a lot of work...

But realistically, how big is this problem really? Is it really worth
it? You said this is an admin only interface and admin can kill the
machine by OOM and other means already.

Moreover and I should probably mention it explicitly, your d407bd25a204b
reduced the likelyhood of oom for other reason. kmalloc used GPF_USER
previously and with order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER this
could indeed hit the OOM e.g. due to memory fragmentation. It would be
much harder to hit the OOM killer from vmalloc which doesn't issue
higher order allocation requests. Or have you ever seen the OOM killer
pointing to the vmalloc fallback path?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 11:58             ` Michal Hocko
@ 2017-01-26 13:10               ` Daniel Borkmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26 13:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 12:58 PM, Michal Hocko wrote:
> On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
>> On 01/26/2017 11:08 AM, Michal Hocko wrote:
> [...]
>>> If you disagree I can drop the bpf part of course...
>>
>> If we could consolidate these spots with kvmalloc() eventually, I'm
>> all for it. But even if __GFP_NORETRY is not covered down to all
>> possible paths, it kind of does have an effect already of saying
>> 'don't try too hard', so would it be harmful to still keep that for
>> now? If it's not, I'd personally prefer to just leave it as is until
>> there's some form of support by kvmalloc() and friends.
>
> Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
> disallowed. It is not _supported_ which means that if it doesn't work as
> you expect you are on your own. Which is actually the situation right
> now as well. But I still think that this is just not right thing to do.
> Even though it might happen to work in some cases it gives a false
> impression of a solution. So I would rather go with

Hmm. 'On my own' means, we could potentially BUG somewhere down the
vmalloc implementation, etc, presumably? So it might in-fact be
harmful to pass that, right?

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 8697f43cf93c..a6dc4d596f14 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>
>   void *bpf_map_area_alloc(size_t size)
>   {
> +	/*
> +	 * FIXME: we would really like to not trigger the OOM killer and rather
> +	 * fail instead. This is not supported right now. Please nag MM people
> +	 * if these OOM start bothering people.
> +	 */

Ok, I know this is out of scope for this series, but since i) this
is _not_ the _only_ spot right now which has such a construct and ii)
I am already kind of nagging a bit ;), my question would be, what
would it take to start supporting it?

>   	return kvzalloc(size, GFP_USER);
>   }

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 13:10               ` Daniel Borkmann
  0 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26 13:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 12:58 PM, Michal Hocko wrote:
> On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
>> On 01/26/2017 11:08 AM, Michal Hocko wrote:
> [...]
>>> If you disagree I can drop the bpf part of course...
>>
>> If we could consolidate these spots with kvmalloc() eventually, I'm
>> all for it. But even if __GFP_NORETRY is not covered down to all
>> possible paths, it kind of does have an effect already of saying
>> 'don't try too hard', so would it be harmful to still keep that for
>> now? If it's not, I'd personally prefer to just leave it as is until
>> there's some form of support by kvmalloc() and friends.
>
> Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
> disallowed. It is not _supported_ which means that if it doesn't work as
> you expect you are on your own. Which is actually the situation right
> now as well. But I still think that this is just not right thing to do.
> Even though it might happen to work in some cases it gives a false
> impression of a solution. So I would rather go with

Hmm. 'On my own' means, we could potentially BUG somewhere down the
vmalloc implementation, etc, presumably? So it might in-fact be
harmful to pass that, right?

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 8697f43cf93c..a6dc4d596f14 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>
>   void *bpf_map_area_alloc(size_t size)
>   {
> +	/*
> +	 * FIXME: we would really like to not trigger the OOM killer and rather
> +	 * fail instead. This is not supported right now. Please nag MM people
> +	 * if these OOM start bothering people.
> +	 */

Ok, I know this is out of scope for this series, but since i) this
is _not_ the _only_ spot right now which has such a construct and ii)
I am already kind of nagging a bit ;), my question would be, what
would it take to start supporting it?

>   	return kvzalloc(size, GFP_USER);
>   }

Thanks,
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 12:14             ` Joe Perches
@ 2017-01-26 12:27               ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 12:27 UTC (permalink / raw)
  To: Joe Perches
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML,
	netdev, marcelo.leitner

On Thu 26-01-17 04:14:37, Joe Perches wrote:
> On Thu, 2017-01-26 at 11:32 +0100, Michal Hocko wrote:
> > So I have folded the following to the patch 1. It is in line with
> > kvmalloc and hopefully at least tell more than the current code.
> []
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> []
> > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >   *	Allocate enough pages to cover @size from the page level
> >   *	allocator with @gfp_mask flags.  Map them into contiguous
> >   *	kernel virtual space, using a pagetable protection of @prot.
> > + *
> > + *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
> > + *	and __GFP_NOFAIL are not supported
> 
> Maybe add a BUILD_BUG or a WARN_ON_ONCE to catch new occurrences?

I would really like to not touch vmalloc in this series.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 12:27               ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 12:27 UTC (permalink / raw)
  To: Joe Perches
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Johannes Weiner, linux-mm, LKML,
	netdev, marcelo.leitner

On Thu 26-01-17 04:14:37, Joe Perches wrote:
> On Thu, 2017-01-26 at 11:32 +0100, Michal Hocko wrote:
> > So I have folded the following to the patch 1. It is in line with
> > kvmalloc and hopefully at least tell more than the current code.
> []
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> []
> > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >   *	Allocate enough pages to cover @size from the page level
> >   *	allocator with @gfp_mask flags.  Map them into contiguous
> >   *	kernel virtual space, using a pagetable protection of @prot.
> > + *
> > + *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
> > + *	and __GFP_NOFAIL are not supported
> 
> Maybe add a BUILD_BUG or a WARN_ON_ONCE to catch new occurrences?

I would really like to not touch vmalloc in this series.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 10:32           ` Michal Hocko
@ 2017-01-26 12:14             ` Joe Perches
  -1 siblings, 0 replies; 180+ messages in thread
From: Joe Perches @ 2017-01-26 12:14 UTC (permalink / raw)
  To: Michal Hocko, Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu, 2017-01-26 at 11:32 +0100, Michal Hocko wrote:
> So I have folded the following to the patch 1. It is in line with
> kvmalloc and hopefully at least tell more than the current code.
[]
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
[]
> @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>   *	Allocate enough pages to cover @size from the page level
>   *	allocator with @gfp_mask flags.  Map them into contiguous
>   *	kernel virtual space, using a pagetable protection of @prot.
> + *
> + *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
> + *	and __GFP_NOFAIL are not supported

Maybe add a BUILD_BUG or a WARN_ON_ONCE to catch new occurrences?

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 12:14             ` Joe Perches
  0 siblings, 0 replies; 180+ messages in thread
From: Joe Perches @ 2017-01-26 12:14 UTC (permalink / raw)
  To: Michal Hocko, Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu, 2017-01-26 at 11:32 +0100, Michal Hocko wrote:
> So I have folded the following to the patch 1. It is in line with
> kvmalloc and hopefully at least tell more than the current code.
[]
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
[]
> @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>   *	Allocate enough pages to cover @size from the page level
>   *	allocator with @gfp_mask flags.  Map them into contiguous
>   *	kernel virtual space, using a pagetable protection of @prot.
> + *
> + *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
> + *	and __GFP_NOFAIL are not supported

Maybe add a BUILD_BUG or a WARN_ON_ONCE to catch new occurrences?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 11:33           ` Daniel Borkmann
@ 2017-01-26 11:58             ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 11:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
> On 01/26/2017 11:08 AM, Michal Hocko wrote:
[...]
> > If you disagree I can drop the bpf part of course...
> 
> If we could consolidate these spots with kvmalloc() eventually, I'm
> all for it. But even if __GFP_NORETRY is not covered down to all
> possible paths, it kind of does have an effect already of saying
> 'don't try too hard', so would it be harmful to still keep that for
> now? If it's not, I'd personally prefer to just leave it as is until
> there's some form of support by kvmalloc() and friends.

Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
disallowed. It is not _supported_ which means that if it doesn't work as
you expect you are on your own. Which is actually the situation right
now as well. But I still think that this is just not right thing to do.
Even though it might happen to work in some cases it gives a false
impression of a solution. So I would rather go with
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8697f43cf93c..a6dc4d596f14 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
 
 void *bpf_map_area_alloc(size_t size)
 {
+	/*
+	 * FIXME: we would really like to not trigger the OOM killer and rather
+	 * fail instead. This is not supported right now. Please nag MM people
+	 * if these OOM start bothering people.
+	 */
 	return kvzalloc(size, GFP_USER);
 }
 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 11:58             ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 11:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
> On 01/26/2017 11:08 AM, Michal Hocko wrote:
[...]
> > If you disagree I can drop the bpf part of course...
> 
> If we could consolidate these spots with kvmalloc() eventually, I'm
> all for it. But even if __GFP_NORETRY is not covered down to all
> possible paths, it kind of does have an effect already of saying
> 'don't try too hard', so would it be harmful to still keep that for
> now? If it's not, I'd personally prefer to just leave it as is until
> there's some form of support by kvmalloc() and friends.

Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
disallowed. It is not _supported_ which means that if it doesn't work as
you expect you are on your own. Which is actually the situation right
now as well. But I still think that this is just not right thing to do.
Even though it might happen to work in some cases it gives a false
impression of a solution. So I would rather go with
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8697f43cf93c..a6dc4d596f14 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -53,6 +53,11 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
 
 void *bpf_map_area_alloc(size_t size)
 {
+	/*
+	 * FIXME: we would really like to not trigger the OOM killer and rather
+	 * fail instead. This is not supported right now. Please nag MM people
+	 * if these OOM start bothering people.
+	 */
 	return kvzalloc(size, GFP_USER);
 }
 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 11:04             ` Daniel Borkmann
@ 2017-01-26 11:49               ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 11:49 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 12:04:13, Daniel Borkmann wrote:
> On 01/26/2017 11:32 AM, Michal Hocko wrote:
> > On Thu 26-01-17 11:08:02, Michal Hocko wrote:
> > > On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
> > > > On 01/26/2017 08:43 AM, Michal Hocko wrote:
> > > > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
> > > [...]
> > > > > > I assume that kvzalloc() is still the same from [1], right? If so, then
> > > > > > it would unfortunately (partially) reintroduce the issue that was fixed.
> > > > > > If you look above at flags, they're also passed to __vmalloc() to not
> > > > > > trigger OOM in these situations I've experienced.
> > > > > 
> > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> > > > > think it would. It can still trigger the OOM killer becauset the flags
> > > > > are no propagated all the way down to all allocations requests (e.g.
> > > > > page tables). This is the same reason why GFP_NOFS is not supported in
> > > > > vmalloc.
> > > > 
> > > > Ok, good to know, is that somewhere clearly documented (like for the
> > > > case with kmalloc())?
> > > 
> > > I am afraid that we really suck on this front. I will add something.
> > 
> > So I have folded the following to the patch 1. It is in line with
> > kvmalloc and hopefully at least tell more than the current code.
> > ---
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index d89034a393f2..6c1aa2c68887 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >    *	Allocate enough pages to cover @size from the page level
> >    *	allocator with @gfp_mask flags.  Map them into contiguous
> >    *	kernel virtual space, using a pagetable protection of @prot.
> > + *
> > + *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
> > + *	and __GFP_NOFAIL are not supported
> 
> We could probably also mention that __GFP_ZERO in @gfp_mask is
> supported, though.

There are others which would be supported so I would rather stay with
explicit unsupported.

> 
> > + *	Any use of gfp flags outside of GFP_KERNEL should be consulted
> > + *	with mm people.
> 
> Just a question: should that read 'GFP_KERNEL | __GFP_HIGHMEM' as
> that is what vmalloc() resp. vzalloc() and others pass as flags?

yes, even though I think that specifying __GFP_HIGHMEM shouldn't be
really necessary. Are there any users who would really insist on vmalloc
pages in lowmem? Anyway this made me recheck kvmalloc_node
implementation and I am not adding this flags which would mean a
regression from the current state. Will fix it up.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 11:49               ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 11:49 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 12:04:13, Daniel Borkmann wrote:
> On 01/26/2017 11:32 AM, Michal Hocko wrote:
> > On Thu 26-01-17 11:08:02, Michal Hocko wrote:
> > > On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
> > > > On 01/26/2017 08:43 AM, Michal Hocko wrote:
> > > > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
> > > [...]
> > > > > > I assume that kvzalloc() is still the same from [1], right? If so, then
> > > > > > it would unfortunately (partially) reintroduce the issue that was fixed.
> > > > > > If you look above at flags, they're also passed to __vmalloc() to not
> > > > > > trigger OOM in these situations I've experienced.
> > > > > 
> > > > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> > > > > think it would. It can still trigger the OOM killer becauset the flags
> > > > > are no propagated all the way down to all allocations requests (e.g.
> > > > > page tables). This is the same reason why GFP_NOFS is not supported in
> > > > > vmalloc.
> > > > 
> > > > Ok, good to know, is that somewhere clearly documented (like for the
> > > > case with kmalloc())?
> > > 
> > > I am afraid that we really suck on this front. I will add something.
> > 
> > So I have folded the following to the patch 1. It is in line with
> > kvmalloc and hopefully at least tell more than the current code.
> > ---
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index d89034a393f2..6c1aa2c68887 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >    *	Allocate enough pages to cover @size from the page level
> >    *	allocator with @gfp_mask flags.  Map them into contiguous
> >    *	kernel virtual space, using a pagetable protection of @prot.
> > + *
> > + *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
> > + *	and __GFP_NOFAIL are not supported
> 
> We could probably also mention that __GFP_ZERO in @gfp_mask is
> supported, though.

There are others which would be supported so I would rather stay with
explicit unsupported.

> 
> > + *	Any use of gfp flags outside of GFP_KERNEL should be consulted
> > + *	with mm people.
> 
> Just a question: should that read 'GFP_KERNEL | __GFP_HIGHMEM' as
> that is what vmalloc() resp. vzalloc() and others pass as flags?

yes, even though I think that specifying __GFP_HIGHMEM shouldn't be
really necessary. Are there any users who would really insist on vmalloc
pages in lowmem? Anyway this made me recheck kvmalloc_node
implementation and I am not adding this flags which would mean a
regression from the current state. Will fix it up.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 10:08         ` Michal Hocko
@ 2017-01-26 11:33           ` Daniel Borkmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26 11:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 11:08 AM, Michal Hocko wrote:
> On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
>> On 01/26/2017 08:43 AM, Michal Hocko wrote:
>>> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
> [...]
>>>> I assume that kvzalloc() is still the same from [1], right? If so, then
>>>> it would unfortunately (partially) reintroduce the issue that was fixed.
>>>> If you look above at flags, they're also passed to __vmalloc() to not
>>>> trigger OOM in these situations I've experienced.
>>>
>>> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
>>> think it would. It can still trigger the OOM killer becauset the flags
>>> are no propagated all the way down to all allocations requests (e.g.
>>> page tables). This is the same reason why GFP_NOFS is not supported in
>>> vmalloc.
>>
>> Ok, good to know, is that somewhere clearly documented (like for the
>> case with kmalloc())?
>
> I am afraid that we really suck on this front. I will add something.

Thanks for doing that, much appreciated!

>> If not, could we do that for non-mm folks, or
>> at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make
>> it obvious to users that a given flag combination is not supported all
>> the way down?
>
> I am not sure that triggering a warning that somebody has used
> __GFP_NOWARN is very helpful ;). I also do not think that covering all the
> supported flags is really feasible. Most of them will not have bad side
> effects. I have added the warning because this API is new and I wanted
> to catch new abusers. Old ones would have to die slowly.

Okay, makes sense then. Just the kdoc comment from your other
mail should help fine already.

>>>> This is effectively the
>>>> same requirement as in other networking areas f.e. that 5bad87348c70
>>>> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
>>>> In your comment in kvzalloc() you eventually say that some of the above
>>>> modifiers are not supported. So there would be two options, i) just leave
>>>> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
>>>> it later (along with similar code from 5bad87348c70), or ii) implement
>>>> support for these modifiers as well to your original set. I guess it's not
>>>> too urgent, so we could also proceed with i) if that is easier for you to
>>>> proceed (I don't mind either way).
>>>
>>> Could you clarify why the oom killer in vmalloc matters actually?
>>
>> For both mentioned commits, (privileged) user space can potentially
>> create large allocation requests, where we thus switch to vmalloc()
>> flavor eventually and then OOM starts killing processes to try to
>> satisfy the allocation request. This is bad, because we want the
>> request to just fail instead as it's non-critical and f.e. not kill
>> ssh connection et al. Failing is totally fine in this case, whereas
>> triggering OOM is not.
>
> I see your intention but does it really make any real difference?
> Consider you would back off right before you would have OOMed. Any
> parallel request would just hit the OOM for you. You are (almost) never
> doing an allocation in an isolation.
>
>> In my testing, __GFP_NORETRY did satisfy this
>> just fine, but as you say it seems it's not enough.
>
> Yeah, ptes have been most probably popullated already.
>
>> Given there are
>> multiple places like these in the kernel, could we instead add an
>> option such as __GFP_NOOOM, or just make __GFP_NORETRY supported?
>
> As said above I do not really think that suppressing the OOM killer
> makes any difference because it might be just somebody else doing that
> for you. Also the OOM killer is the MM internal implementation "detail"
> users shouldn't really care. I agree that callers should have a way to
> say they do not want to try really hard and that is not that simple
> for vmalloc unfortunatelly. The main problem here is that gfp mask
> propagation is not that easy to fix without a lot of code churn as some
> of those hardcoded allocation requests are deep in call chains.

I see, that's unfortunate. I understand that there are requests
in parallel and that we might end up with OOM eventually if we're
unlucky, but having some way to tell vmalloc to just not try as
hard as usual would be nice.

> I know this sucks and it would be great to support __GFP_NORETRY to
> [k]vmalloc and maybe we will get there eventually. But for the mean time
> I really think that using kvmalloc wherever possible is much better than
> open coded variants whith expectations which do not hold sometimes.

I totally agree with you that having kvmalloc() as helper is awesome
and probably long overdue as well. :)

> If you disagree I can drop the bpf part of course...

If we could consolidate these spots with kvmalloc() eventually, I'm
all for it. But even if __GFP_NORETRY is not covered down to all
possible paths, it kind of does have an effect already of saying
'don't try too hard', so would it be harmful to still keep that for
now? If it's not, I'd personally prefer to just leave it as is until
there's some form of support by kvmalloc() and friends.

Thanks for your input, Michal!

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 11:33           ` Daniel Borkmann
  0 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26 11:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 11:08 AM, Michal Hocko wrote:
> On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
>> On 01/26/2017 08:43 AM, Michal Hocko wrote:
>>> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
> [...]
>>>> I assume that kvzalloc() is still the same from [1], right? If so, then
>>>> it would unfortunately (partially) reintroduce the issue that was fixed.
>>>> If you look above at flags, they're also passed to __vmalloc() to not
>>>> trigger OOM in these situations I've experienced.
>>>
>>> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
>>> think it would. It can still trigger the OOM killer becauset the flags
>>> are no propagated all the way down to all allocations requests (e.g.
>>> page tables). This is the same reason why GFP_NOFS is not supported in
>>> vmalloc.
>>
>> Ok, good to know, is that somewhere clearly documented (like for the
>> case with kmalloc())?
>
> I am afraid that we really suck on this front. I will add something.

Thanks for doing that, much appreciated!

>> If not, could we do that for non-mm folks, or
>> at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make
>> it obvious to users that a given flag combination is not supported all
>> the way down?
>
> I am not sure that triggering a warning that somebody has used
> __GFP_NOWARN is very helpful ;). I also do not think that covering all the
> supported flags is really feasible. Most of them will not have bad side
> effects. I have added the warning because this API is new and I wanted
> to catch new abusers. Old ones would have to die slowly.

Okay, makes sense then. Just the kdoc comment from your other
mail should help fine already.

>>>> This is effectively the
>>>> same requirement as in other networking areas f.e. that 5bad87348c70
>>>> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
>>>> In your comment in kvzalloc() you eventually say that some of the above
>>>> modifiers are not supported. So there would be two options, i) just leave
>>>> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
>>>> it later (along with similar code from 5bad87348c70), or ii) implement
>>>> support for these modifiers as well to your original set. I guess it's not
>>>> too urgent, so we could also proceed with i) if that is easier for you to
>>>> proceed (I don't mind either way).
>>>
>>> Could you clarify why the oom killer in vmalloc matters actually?
>>
>> For both mentioned commits, (privileged) user space can potentially
>> create large allocation requests, where we thus switch to vmalloc()
>> flavor eventually and then OOM starts killing processes to try to
>> satisfy the allocation request. This is bad, because we want the
>> request to just fail instead as it's non-critical and f.e. not kill
>> ssh connection et al. Failing is totally fine in this case, whereas
>> triggering OOM is not.
>
> I see your intention but does it really make any real difference?
> Consider you would back off right before you would have OOMed. Any
> parallel request would just hit the OOM for you. You are (almost) never
> doing an allocation in an isolation.
>
>> In my testing, __GFP_NORETRY did satisfy this
>> just fine, but as you say it seems it's not enough.
>
> Yeah, ptes have been most probably popullated already.
>
>> Given there are
>> multiple places like these in the kernel, could we instead add an
>> option such as __GFP_NOOOM, or just make __GFP_NORETRY supported?
>
> As said above I do not really think that suppressing the OOM killer
> makes any difference because it might be just somebody else doing that
> for you. Also the OOM killer is the MM internal implementation "detail"
> users shouldn't really care. I agree that callers should have a way to
> say they do not want to try really hard and that is not that simple
> for vmalloc unfortunatelly. The main problem here is that gfp mask
> propagation is not that easy to fix without a lot of code churn as some
> of those hardcoded allocation requests are deep in call chains.

I see, that's unfortunate. I understand that there are requests
in parallel and that we might end up with OOM eventually if we're
unlucky, but having some way to tell vmalloc to just not try as
hard as usual would be nice.

> I know this sucks and it would be great to support __GFP_NORETRY to
> [k]vmalloc and maybe we will get there eventually. But for the mean time
> I really think that using kvmalloc wherever possible is much better than
> open coded variants whith expectations which do not hold sometimes.

I totally agree with you that having kvmalloc() as helper is awesome
and probably long overdue as well. :)

> If you disagree I can drop the bpf part of course...

If we could consolidate these spots with kvmalloc() eventually, I'm
all for it. But even if __GFP_NORETRY is not covered down to all
possible paths, it kind of does have an effect already of saying
'don't try too hard', so would it be harmful to still keep that for
now? If it's not, I'd personally prefer to just leave it as is until
there's some form of support by kvmalloc() and friends.

Thanks for your input, Michal!

Cheers,
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 10:32           ` Michal Hocko
@ 2017-01-26 11:04             ` Daniel Borkmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26 11:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 11:32 AM, Michal Hocko wrote:
> On Thu 26-01-17 11:08:02, Michal Hocko wrote:
>> On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
>>> On 01/26/2017 08:43 AM, Michal Hocko wrote:
>>>> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
>> [...]
>>>>> I assume that kvzalloc() is still the same from [1], right? If so, then
>>>>> it would unfortunately (partially) reintroduce the issue that was fixed.
>>>>> If you look above at flags, they're also passed to __vmalloc() to not
>>>>> trigger OOM in these situations I've experienced.
>>>>
>>>> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
>>>> think it would. It can still trigger the OOM killer becauset the flags
>>>> are no propagated all the way down to all allocations requests (e.g.
>>>> page tables). This is the same reason why GFP_NOFS is not supported in
>>>> vmalloc.
>>>
>>> Ok, good to know, is that somewhere clearly documented (like for the
>>> case with kmalloc())?
>>
>> I am afraid that we really suck on this front. I will add something.
>
> So I have folded the following to the patch 1. It is in line with
> kvmalloc and hopefully at least tell more than the current code.
> ---
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d89034a393f2..6c1aa2c68887 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>    *	Allocate enough pages to cover @size from the page level
>    *	allocator with @gfp_mask flags.  Map them into contiguous
>    *	kernel virtual space, using a pagetable protection of @prot.
> + *
> + *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
> + *	and __GFP_NOFAIL are not supported

We could probably also mention that __GFP_ZERO in @gfp_mask is
supported, though.

> + *	Any use of gfp flags outside of GFP_KERNEL should be consulted
> + *	with mm people.

Just a question: should that read 'GFP_KERNEL | __GFP_HIGHMEM' as
that is what vmalloc() resp. vzalloc() and others pass as flags?

> + *
>    */

Sounds good otherwise, thanks Michal!

>   static void *__vmalloc_node(unsigned long size, unsigned long align,
>   			    gfp_t gfp_mask, pgprot_t prot,

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 11:04             ` Daniel Borkmann
  0 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26 11:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 11:32 AM, Michal Hocko wrote:
> On Thu 26-01-17 11:08:02, Michal Hocko wrote:
>> On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
>>> On 01/26/2017 08:43 AM, Michal Hocko wrote:
>>>> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
>> [...]
>>>>> I assume that kvzalloc() is still the same from [1], right? If so, then
>>>>> it would unfortunately (partially) reintroduce the issue that was fixed.
>>>>> If you look above at flags, they're also passed to __vmalloc() to not
>>>>> trigger OOM in these situations I've experienced.
>>>>
>>>> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
>>>> think it would. It can still trigger the OOM killer becauset the flags
>>>> are no propagated all the way down to all allocations requests (e.g.
>>>> page tables). This is the same reason why GFP_NOFS is not supported in
>>>> vmalloc.
>>>
>>> Ok, good to know, is that somewhere clearly documented (like for the
>>> case with kmalloc())?
>>
>> I am afraid that we really suck on this front. I will add something.
>
> So I have folded the following to the patch 1. It is in line with
> kvmalloc and hopefully at least tell more than the current code.
> ---
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d89034a393f2..6c1aa2c68887 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>    *	Allocate enough pages to cover @size from the page level
>    *	allocator with @gfp_mask flags.  Map them into contiguous
>    *	kernel virtual space, using a pagetable protection of @prot.
> + *
> + *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
> + *	and __GFP_NOFAIL are not supported

We could probably also mention that __GFP_ZERO in @gfp_mask is
supported, though.

> + *	Any use of gfp flags outside of GFP_KERNEL should be consulted
> + *	with mm people.

Just a question: should that read 'GFP_KERNEL | __GFP_HIGHMEM' as
that is what vmalloc() resp. vzalloc() and others pass as flags?

> + *
>    */

Sounds good otherwise, thanks Michal!

>   static void *__vmalloc_node(unsigned long size, unsigned long align,
>   			    gfp_t gfp_mask, pgprot_t prot,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26 10:08         ` Michal Hocko
@ 2017-01-26 10:32           ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 10:32 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 11:08:02, Michal Hocko wrote:
> On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
> > On 01/26/2017 08:43 AM, Michal Hocko wrote:
> > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
> [...]
> > > > I assume that kvzalloc() is still the same from [1], right? If so, then
> > > > it would unfortunately (partially) reintroduce the issue that was fixed.
> > > > If you look above at flags, they're also passed to __vmalloc() to not
> > > > trigger OOM in these situations I've experienced.
> > > 
> > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> > > think it would. It can still trigger the OOM killer becauset the flags
> > > are no propagated all the way down to all allocations requests (e.g.
> > > page tables). This is the same reason why GFP_NOFS is not supported in
> > > vmalloc.
> > 
> > Ok, good to know, is that somewhere clearly documented (like for the
> > case with kmalloc())?
> 
> I am afraid that we really suck on this front. I will add something.

So I have folded the following to the patch 1. It is in line with
kvmalloc and hopefully at least tell more than the current code.
---
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d89034a393f2..6c1aa2c68887 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
  *	Allocate enough pages to cover @size from the page level
  *	allocator with @gfp_mask flags.  Map them into contiguous
  *	kernel virtual space, using a pagetable protection of @prot.
+ *
+ *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
+ *	and __GFP_NOFAIL are not supported
+ *
+ *	Any use of gfp flags outside of GFP_KERNEL should be consulted
+ *	with mm people.
+ *
  */
 static void *__vmalloc_node(unsigned long size, unsigned long align,
 			    gfp_t gfp_mask, pgprot_t prot,
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 10:32           ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 10:32 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 11:08:02, Michal Hocko wrote:
> On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
> > On 01/26/2017 08:43 AM, Michal Hocko wrote:
> > > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
> [...]
> > > > I assume that kvzalloc() is still the same from [1], right? If so, then
> > > > it would unfortunately (partially) reintroduce the issue that was fixed.
> > > > If you look above at flags, they're also passed to __vmalloc() to not
> > > > trigger OOM in these situations I've experienced.
> > > 
> > > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> > > think it would. It can still trigger the OOM killer becauset the flags
> > > are no propagated all the way down to all allocations requests (e.g.
> > > page tables). This is the same reason why GFP_NOFS is not supported in
> > > vmalloc.
> > 
> > Ok, good to know, is that somewhere clearly documented (like for the
> > case with kmalloc())?
> 
> I am afraid that we really suck on this front. I will add something.

So I have folded the following to the patch 1. It is in line with
kvmalloc and hopefully at least tell more than the current code.
---
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d89034a393f2..6c1aa2c68887 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1741,6 +1741,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
  *	Allocate enough pages to cover @size from the page level
  *	allocator with @gfp_mask flags.  Map them into contiguous
  *	kernel virtual space, using a pagetable protection of @prot.
+ *
+ *	Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_REPEAT
+ *	and __GFP_NOFAIL are not supported
+ *
+ *	Any use of gfp flags outside of GFP_KERNEL should be consulted
+ *	with mm people.
+ *
  */
 static void *__vmalloc_node(unsigned long size, unsigned long align,
 			    gfp_t gfp_mask, pgprot_t prot,
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26  9:36       ` Daniel Borkmann
@ 2017-01-26 10:08         ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 10:08 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
> On 01/26/2017 08:43 AM, Michal Hocko wrote:
> > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
[...]
> > > I assume that kvzalloc() is still the same from [1], right? If so, then
> > > it would unfortunately (partially) reintroduce the issue that was fixed.
> > > If you look above at flags, they're also passed to __vmalloc() to not
> > > trigger OOM in these situations I've experienced.
> > 
> > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> > think it would. It can still trigger the OOM killer becauset the flags
> > are no propagated all the way down to all allocations requests (e.g.
> > page tables). This is the same reason why GFP_NOFS is not supported in
> > vmalloc.
> 
> Ok, good to know, is that somewhere clearly documented (like for the
> case with kmalloc())?

I am afraid that we really suck on this front. I will add something.

> If not, could we do that for non-mm folks, or
> at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make
> it obvious to users that a given flag combination is not supported all
> the way down?

I am not sure that triggering a warning that somebody has used
__GFP_NOWARN is very helpful ;). I also do not think that covering all the
supported flags is really feasible. Most of them will not have bad side
effects. I have added the warning because this API is new and I wanted
to catch new abusers. Old ones would have to die slowly.

> > > This is effectively the
> > > same requirement as in other networking areas f.e. that 5bad87348c70
> > > ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
> > > In your comment in kvzalloc() you eventually say that some of the above
> > > modifiers are not supported. So there would be two options, i) just leave
> > > out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
> > > it later (along with similar code from 5bad87348c70), or ii) implement
> > > support for these modifiers as well to your original set. I guess it's not
> > > too urgent, so we could also proceed with i) if that is easier for you to
> > > proceed (I don't mind either way).
> > 
> > Could you clarify why the oom killer in vmalloc matters actually?
> 
> For both mentioned commits, (privileged) user space can potentially
> create large allocation requests, where we thus switch to vmalloc()
> flavor eventually and then OOM starts killing processes to try to
> satisfy the allocation request. This is bad, because we want the
> request to just fail instead as it's non-critical and f.e. not kill
> ssh connection et al. Failing is totally fine in this case, whereas
> triggering OOM is not.

I see your intention but does it really make any real difference?
Consider you would back off right before you would have OOMed. Any
parallel request would just hit the OOM for you. You are (almost) never
doing an allocation in an isolation.

> In my testing, __GFP_NORETRY did satisfy this
> just fine, but as you say it seems it's not enough.

Yeah, ptes have been most probably popullated already.

> Given there are
> multiple places like these in the kernel, could we instead add an
> option such as __GFP_NOOOM, or just make __GFP_NORETRY supported?

As said above I do not really think that suppressing the OOM killer
makes any difference because it might be just somebody else doing that
for you. Also the OOM killer is the MM internal implementation "detail"
users shouldn't really care. I agree that callers should have a way to
say they do not want to try really hard and that is not that simple
for vmalloc unfortunatelly. The main problem here is that gfp mask
propagation is not that easy to fix without a lot of code churn as some
of those hardcoded allocation requests are deep in call chains.

I know this sucks and it would be great to support __GFP_NORETRY to
[k]vmalloc and maybe we will get there eventually. But for the mean time
I really think that using kvmalloc wherever possible is much better than
open coded variants whith expectations which do not hold sometimes.

If you disagree I can drop the bpf part of course...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26 10:08         ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26 10:08 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On Thu 26-01-17 10:36:49, Daniel Borkmann wrote:
> On 01/26/2017 08:43 AM, Michal Hocko wrote:
> > On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
[...]
> > > I assume that kvzalloc() is still the same from [1], right? If so, then
> > > it would unfortunately (partially) reintroduce the issue that was fixed.
> > > If you look above at flags, they're also passed to __vmalloc() to not
> > > trigger OOM in these situations I've experienced.
> > 
> > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> > think it would. It can still trigger the OOM killer becauset the flags
> > are no propagated all the way down to all allocations requests (e.g.
> > page tables). This is the same reason why GFP_NOFS is not supported in
> > vmalloc.
> 
> Ok, good to know, is that somewhere clearly documented (like for the
> case with kmalloc())?

I am afraid that we really suck on this front. I will add something.

> If not, could we do that for non-mm folks, or
> at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make
> it obvious to users that a given flag combination is not supported all
> the way down?

I am not sure that triggering a warning that somebody has used
__GFP_NOWARN is very helpful ;). I also do not think that covering all the
supported flags is really feasible. Most of them will not have bad side
effects. I have added the warning because this API is new and I wanted
to catch new abusers. Old ones would have to die slowly.

> > > This is effectively the
> > > same requirement as in other networking areas f.e. that 5bad87348c70
> > > ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
> > > In your comment in kvzalloc() you eventually say that some of the above
> > > modifiers are not supported. So there would be two options, i) just leave
> > > out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
> > > it later (along with similar code from 5bad87348c70), or ii) implement
> > > support for these modifiers as well to your original set. I guess it's not
> > > too urgent, so we could also proceed with i) if that is easier for you to
> > > proceed (I don't mind either way).
> > 
> > Could you clarify why the oom killer in vmalloc matters actually?
> 
> For both mentioned commits, (privileged) user space can potentially
> create large allocation requests, where we thus switch to vmalloc()
> flavor eventually and then OOM starts killing processes to try to
> satisfy the allocation request. This is bad, because we want the
> request to just fail instead as it's non-critical and f.e. not kill
> ssh connection et al. Failing is totally fine in this case, whereas
> triggering OOM is not.

I see your intention but does it really make any real difference?
Consider you would back off right before you would have OOMed. Any
parallel request would just hit the OOM for you. You are (almost) never
doing an allocation in an isolation.

> In my testing, __GFP_NORETRY did satisfy this
> just fine, but as you say it seems it's not enough.

Yeah, ptes have been most probably popullated already.

> Given there are
> multiple places like these in the kernel, could we instead add an
> option such as __GFP_NOOOM, or just make __GFP_NORETRY supported?

As said above I do not really think that suppressing the OOM killer
makes any difference because it might be just somebody else doing that
for you. Also the OOM killer is the MM internal implementation "detail"
users shouldn't really care. I agree that callers should have a way to
say they do not want to try really hard and that is not that simple
for vmalloc unfortunatelly. The main problem here is that gfp mask
propagation is not that easy to fix without a lot of code churn as some
of those hardcoded allocation requests are deep in call chains.

I know this sucks and it would be great to support __GFP_NORETRY to
[k]vmalloc and maybe we will get there eventually. But for the mean time
I really think that using kvmalloc wherever possible is much better than
open coded variants whith expectations which do not hold sometimes.

If you disagree I can drop the bpf part of course...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH 0/6 v3] kvmalloc
  2017-01-26  9:36       ` Daniel Borkmann
@ 2017-01-26  9:48         ` David Laight
  -1 siblings, 0 replies; 180+ messages in thread
From: David Laight @ 2017-01-26  9:48 UTC (permalink / raw)
  To: 'Daniel Borkmann', Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

From: Daniel Borkmann
> Sent: 26 January 2017 09:37
...
> >> I assume that kvzalloc() is still the same from [1], right? If so, then
> >> it would unfortunately (partially) reintroduce the issue that was fixed.
> >> If you look above at flags, they're also passed to __vmalloc() to not
> >> trigger OOM in these situations I've experienced.
> >
> > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> > think it would. It can still trigger the OOM killer becauset the flags
> > are no propagated all the way down to all allocations requests (e.g.
> > page tables). This is the same reason why GFP_NOFS is not supported in
> > vmalloc.
> 
> Ok, good to know, is that somewhere clearly documented (like for the
> case with kmalloc())? If not, could we do that for non-mm folks, or
> at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make
> it obvious to users that a given flag combination is not supported all
> the way down?

ISTM that requests for the relatively small memory blocks needed for page
tables aren't really likely to invoke the OOM killer when it isn't already
being invoked by other actions. So that isn't really a problem.

More of a problem is that requests that you really don't mind failing
can use the last 'reasonably available' memory.
This will cause the next allocate to fail when it would be better for
the earlier one to fail instead.

	David

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26  9:48         ` David Laight
  0 siblings, 0 replies; 180+ messages in thread
From: David Laight @ 2017-01-26  9:48 UTC (permalink / raw)
  To: 'Daniel Borkmann', Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

From: Daniel Borkmann
> Sent: 26 January 2017 09:37
...
> >> I assume that kvzalloc() is still the same from [1], right? If so, then
> >> it would unfortunately (partially) reintroduce the issue that was fixed.
> >> If you look above at flags, they're also passed to __vmalloc() to not
> >> trigger OOM in these situations I've experienced.
> >
> > Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> > think it would. It can still trigger the OOM killer becauset the flags
> > are no propagated all the way down to all allocations requests (e.g.
> > page tables). This is the same reason why GFP_NOFS is not supported in
> > vmalloc.
> 
> Ok, good to know, is that somewhere clearly documented (like for the
> case with kmalloc())? If not, could we do that for non-mm folks, or
> at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make
> it obvious to users that a given flag combination is not supported all
> the way down?

ISTM that requests for the relatively small memory blocks needed for page
tables aren't really likely to invoke the OOM killer when it isn't already
being invoked by other actions. So that isn't really a problem.

More of a problem is that requests that you really don't mind failing
can use the last 'reasonably available' memory.
This will cause the next allocate to fail when it would be better for
the earlier one to fail instead.

	David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-26  7:43     ` Michal Hocko
@ 2017-01-26  9:36       ` Daniel Borkmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26  9:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 08:43 AM, Michal Hocko wrote:
> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
>> On 01/25/2017 07:14 PM, Alexei Starovoitov wrote:
>>> On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
>>>> On Wed 25-01-17 14:10:06, Michal Hocko wrote:
>>>>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
>> [...]
>>>>>>> Are there any more comments? I would really appreciate to hear from
>>>>>>> networking folks before I resubmit the series.
>>>>>>
>>>>>> while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
>>>>>> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
>>>>>> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
>>>>>> it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
>>>>>> So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
>>>>>
>>>>> OK, will do. Thanks for the heads up.
>>>>
>>>> Just for the record, I will fold the following into the patch 1
>>>> ---
>>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>>> index 19b6129eab23..8697f43cf93c 100644
>>>> --- a/kernel/bpf/syscall.c
>>>> +++ b/kernel/bpf/syscall.c
>>>> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>>>>
>>>>    void *bpf_map_area_alloc(size_t size)
>>>>    {
>>>> -       /* We definitely need __GFP_NORETRY, so OOM killer doesn't
>>>> -        * trigger under memory pressure as we really just want to
>>>> -        * fail instead.
>>>> -        */
>>>> -       const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
>>>> -       void *area;
>>>> -
>>>> -       if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>> -               area = kmalloc(size, GFP_USER | flags);
>>>> -               if (area != NULL)
>>>> -                       return area;
>>>> -       }
>>>> -
>>>> -       return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
>>>> -                        PAGE_KERNEL);
>>>> +       return kvzalloc(size, GFP_USER);
>>>>    }
>>>>
>>>>    void bpf_map_area_free(void *area)
>>>
>>> Looks fine by me.
>>> Daniel, thoughts?
>>
>> I assume that kvzalloc() is still the same from [1], right? If so, then
>> it would unfortunately (partially) reintroduce the issue that was fixed.
>> If you look above at flags, they're also passed to __vmalloc() to not
>> trigger OOM in these situations I've experienced.
>
> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> think it would. It can still trigger the OOM killer becauset the flags
> are no propagated all the way down to all allocations requests (e.g.
> page tables). This is the same reason why GFP_NOFS is not supported in
> vmalloc.

Ok, good to know, is that somewhere clearly documented (like for the
case with kmalloc())? If not, could we do that for non-mm folks, or
at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make
it obvious to users that a given flag combination is not supported all
the way down?

>> This is effectively the
>> same requirement as in other networking areas f.e. that 5bad87348c70
>> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
>> In your comment in kvzalloc() you eventually say that some of the above
>> modifiers are not supported. So there would be two options, i) just leave
>> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
>> it later (along with similar code from 5bad87348c70), or ii) implement
>> support for these modifiers as well to your original set. I guess it's not
>> too urgent, so we could also proceed with i) if that is easier for you to
>> proceed (I don't mind either way).
>
> Could you clarify why the oom killer in vmalloc matters actually?

For both mentioned commits, (privileged) user space can potentially
create large allocation requests, where we thus switch to vmalloc()
flavor eventually and then OOM starts killing processes to try to
satisfy the allocation request. This is bad, because we want the
request to just fail instead as it's non-critical and f.e. not kill
ssh connection et al. Failing is totally fine in this case, whereas
triggering OOM is not. In my testing, __GFP_NORETRY did satisfy this
just fine, but as you say it seems it's not enough. Given there are
multiple places like these in the kernel, could we instead add an
option such as __GFP_NOOOM, or just make __GFP_NORETRY supported?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26  9:36       ` Daniel Borkmann
  0 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-26  9:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev, marcelo.leitner

On 01/26/2017 08:43 AM, Michal Hocko wrote:
> On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
>> On 01/25/2017 07:14 PM, Alexei Starovoitov wrote:
>>> On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
>>>> On Wed 25-01-17 14:10:06, Michal Hocko wrote:
>>>>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
>> [...]
>>>>>>> Are there any more comments? I would really appreciate to hear from
>>>>>>> networking folks before I resubmit the series.
>>>>>>
>>>>>> while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
>>>>>> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
>>>>>> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
>>>>>> it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
>>>>>> So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
>>>>>
>>>>> OK, will do. Thanks for the heads up.
>>>>
>>>> Just for the record, I will fold the following into the patch 1
>>>> ---
>>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>>> index 19b6129eab23..8697f43cf93c 100644
>>>> --- a/kernel/bpf/syscall.c
>>>> +++ b/kernel/bpf/syscall.c
>>>> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>>>>
>>>>    void *bpf_map_area_alloc(size_t size)
>>>>    {
>>>> -       /* We definitely need __GFP_NORETRY, so OOM killer doesn't
>>>> -        * trigger under memory pressure as we really just want to
>>>> -        * fail instead.
>>>> -        */
>>>> -       const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
>>>> -       void *area;
>>>> -
>>>> -       if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>> -               area = kmalloc(size, GFP_USER | flags);
>>>> -               if (area != NULL)
>>>> -                       return area;
>>>> -       }
>>>> -
>>>> -       return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
>>>> -                        PAGE_KERNEL);
>>>> +       return kvzalloc(size, GFP_USER);
>>>>    }
>>>>
>>>>    void bpf_map_area_free(void *area)
>>>
>>> Looks fine by me.
>>> Daniel, thoughts?
>>
>> I assume that kvzalloc() is still the same from [1], right? If so, then
>> it would unfortunately (partially) reintroduce the issue that was fixed.
>> If you look above at flags, they're also passed to __vmalloc() to not
>> trigger OOM in these situations I've experienced.
>
> Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
> think it would. It can still trigger the OOM killer becauset the flags
> are no propagated all the way down to all allocations requests (e.g.
> page tables). This is the same reason why GFP_NOFS is not supported in
> vmalloc.

Ok, good to know, is that somewhere clearly documented (like for the
case with kmalloc())? If not, could we do that for non-mm folks, or
at least add a similar WARN_ON_ONCE() as you did for kvmalloc() to make
it obvious to users that a given flag combination is not supported all
the way down?

>> This is effectively the
>> same requirement as in other networking areas f.e. that 5bad87348c70
>> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
>> In your comment in kvzalloc() you eventually say that some of the above
>> modifiers are not supported. So there would be two options, i) just leave
>> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
>> it later (along with similar code from 5bad87348c70), or ii) implement
>> support for these modifiers as well to your original set. I guess it's not
>> too urgent, so we could also proceed with i) if that is easier for you to
>> proceed (I don't mind either way).
>
> Could you clarify why the oom killer in vmalloc matters actually?

For both mentioned commits, (privileged) user space can potentially
create large allocation requests, where we thus switch to vmalloc()
flavor eventually and then OOM starts killing processes to try to
satisfy the allocation request. This is bad, because we want the
request to just fail instead as it's non-critical and f.e. not kill
ssh connection et al. Failing is totally fine in this case, whereas
triggering OOM is not. In my testing, __GFP_NORETRY did satisfy this
just fine, but as you say it seems it's not enough. Given there are
multiple places like these in the kernel, could we instead add an
option such as __GFP_NOOOM, or just make __GFP_NORETRY supported?

Thanks,
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-25 20:16   ` Daniel Borkmann
@ 2017-01-26  7:43     ` Michal Hocko
  -1 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26  7:43 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev

On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
> On 01/25/2017 07:14 PM, Alexei Starovoitov wrote:
> > On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > On Wed 25-01-17 14:10:06, Michal Hocko wrote:
> > > > On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
> [...]
> > > > > > Are there any more comments? I would really appreciate to hear from
> > > > > > networking folks before I resubmit the series.
> > > > > 
> > > > > while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
> > > > > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
> > > > > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
> > > > > it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
> > > > > So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
> > > > 
> > > > OK, will do. Thanks for the heads up.
> > > 
> > > Just for the record, I will fold the following into the patch 1
> > > ---
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 19b6129eab23..8697f43cf93c 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
> > > 
> > >   void *bpf_map_area_alloc(size_t size)
> > >   {
> > > -       /* We definitely need __GFP_NORETRY, so OOM killer doesn't
> > > -        * trigger under memory pressure as we really just want to
> > > -        * fail instead.
> > > -        */
> > > -       const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
> > > -       void *area;
> > > -
> > > -       if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> > > -               area = kmalloc(size, GFP_USER | flags);
> > > -               if (area != NULL)
> > > -                       return area;
> > > -       }
> > > -
> > > -       return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
> > > -                        PAGE_KERNEL);
> > > +       return kvzalloc(size, GFP_USER);
> > >   }
> > > 
> > >   void bpf_map_area_free(void *area)
> > 
> > Looks fine by me.
> > Daniel, thoughts?
> 
> I assume that kvzalloc() is still the same from [1], right? If so, then
> it would unfortunately (partially) reintroduce the issue that was fixed.
> If you look above at flags, they're also passed to __vmalloc() to not
> trigger OOM in these situations I've experienced.

Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
think it would. It can still trigger the OOM killer becauset the flags
are no propagated all the way down to all allocations requests (e.g.
page tables). This is the same reason why GFP_NOFS is not supported in
vmalloc.

> This is effectively the
> same requirement as in other networking areas f.e. that 5bad87348c70
> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
> In your comment in kvzalloc() you eventually say that some of the above
> modifiers are not supported. So there would be two options, i) just leave
> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
> it later (along with similar code from 5bad87348c70), or ii) implement
> support for these modifiers as well to your original set. I guess it's not
> too urgent, so we could also proceed with i) if that is easier for you to
> proceed (I don't mind either way).

Could you clarify why the oom killer in vmalloc matters actually?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-26  7:43     ` Michal Hocko
  0 siblings, 0 replies; 180+ messages in thread
From: Michal Hocko @ 2017-01-26  7:43 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, linux-mm, LKML, netdev

On Wed 25-01-17 21:16:42, Daniel Borkmann wrote:
> On 01/25/2017 07:14 PM, Alexei Starovoitov wrote:
> > On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > On Wed 25-01-17 14:10:06, Michal Hocko wrote:
> > > > On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
> [...]
> > > > > > Are there any more comments? I would really appreciate to hear from
> > > > > > networking folks before I resubmit the series.
> > > > > 
> > > > > while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
> > > > > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
> > > > > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
> > > > > it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
> > > > > So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
> > > > 
> > > > OK, will do. Thanks for the heads up.
> > > 
> > > Just for the record, I will fold the following into the patch 1
> > > ---
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 19b6129eab23..8697f43cf93c 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
> > > 
> > >   void *bpf_map_area_alloc(size_t size)
> > >   {
> > > -       /* We definitely need __GFP_NORETRY, so OOM killer doesn't
> > > -        * trigger under memory pressure as we really just want to
> > > -        * fail instead.
> > > -        */
> > > -       const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
> > > -       void *area;
> > > -
> > > -       if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> > > -               area = kmalloc(size, GFP_USER | flags);
> > > -               if (area != NULL)
> > > -                       return area;
> > > -       }
> > > -
> > > -       return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
> > > -                        PAGE_KERNEL);
> > > +       return kvzalloc(size, GFP_USER);
> > >   }
> > > 
> > >   void bpf_map_area_free(void *area)
> > 
> > Looks fine by me.
> > Daniel, thoughts?
> 
> I assume that kvzalloc() is still the same from [1], right? If so, then
> it would unfortunately (partially) reintroduce the issue that was fixed.
> If you look above at flags, they're also passed to __vmalloc() to not
> trigger OOM in these situations I've experienced.

Pushing __GFP_NORETRY to __vmalloc doesn't have the effect you might
think it would. It can still trigger the OOM killer becauset the flags
are no propagated all the way down to all allocations requests (e.g.
page tables). This is the same reason why GFP_NOFS is not supported in
vmalloc.

> This is effectively the
> same requirement as in other networking areas f.e. that 5bad87348c70
> ("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
> In your comment in kvzalloc() you eventually say that some of the above
> modifiers are not supported. So there would be two options, i) just leave
> out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
> it later (along with similar code from 5bad87348c70), or ii) implement
> support for these modifiers as well to your original set. I guess it's not
> too urgent, so we could also proceed with i) if that is easier for you to
> proceed (I don't mind either way).

Could you clarify why the oom killer in vmalloc matters actually?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
  2017-01-25 18:14 ` Alexei Starovoitov
@ 2017-01-25 20:16   ` Daniel Borkmann
  -1 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-25 20:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner,
	linux-mm, LKML, netdev

On 01/25/2017 07:14 PM, Alexei Starovoitov wrote:
> On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> On Wed 25-01-17 14:10:06, Michal Hocko wrote:
>>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
[...]
>>>>> Are there any more comments? I would really appreciate to hear from
>>>>> networking folks before I resubmit the series.
>>>>
>>>> while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
>>>> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
>>>> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
>>>> it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
>>>> So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
>>>
>>> OK, will do. Thanks for the heads up.
>>
>> Just for the record, I will fold the following into the patch 1
>> ---
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 19b6129eab23..8697f43cf93c 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>>
>>   void *bpf_map_area_alloc(size_t size)
>>   {
>> -       /* We definitely need __GFP_NORETRY, so OOM killer doesn't
>> -        * trigger under memory pressure as we really just want to
>> -        * fail instead.
>> -        */
>> -       const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
>> -       void *area;
>> -
>> -       if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>> -               area = kmalloc(size, GFP_USER | flags);
>> -               if (area != NULL)
>> -                       return area;
>> -       }
>> -
>> -       return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
>> -                        PAGE_KERNEL);
>> +       return kvzalloc(size, GFP_USER);
>>   }
>>
>>   void bpf_map_area_free(void *area)
>
> Looks fine by me.
> Daniel, thoughts?

I assume that kvzalloc() is still the same from [1], right? If so, then
it would unfortunately (partially) reintroduce the issue that was fixed.
If you look above at flags, they're also passed to __vmalloc() to not
trigger OOM in these situations I've experienced. This is effectively the
same requirement as in other networking areas f.e. that 5bad87348c70
("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
In your comment in kvzalloc() you eventually say that some of the above
modifiers are not supported. So there would be two options, i) just leave
out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
it later (along with similar code from 5bad87348c70), or ii) implement
support for these modifiers as well to your original set. I guess it's not
too urgent, so we could also proceed with i) if that is easier for you to
proceed (I don't mind either way).

Thanks a lot,
Daniel

   [1] https://lkml.org/lkml/2017/1/12/442

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-25 20:16   ` Daniel Borkmann
  0 siblings, 0 replies; 180+ messages in thread
From: Daniel Borkmann @ 2017-01-25 20:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner,
	linux-mm, LKML, netdev

On 01/25/2017 07:14 PM, Alexei Starovoitov wrote:
> On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> On Wed 25-01-17 14:10:06, Michal Hocko wrote:
>>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
[...]
>>>>> Are there any more comments? I would really appreciate to hear from
>>>>> networking folks before I resubmit the series.
>>>>
>>>> while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
>>>> which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
>>>> See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
>>>> it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
>>>> So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
>>>
>>> OK, will do. Thanks for the heads up.
>>
>> Just for the record, I will fold the following into the patch 1
>> ---
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 19b6129eab23..8697f43cf93c 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>>
>>   void *bpf_map_area_alloc(size_t size)
>>   {
>> -       /* We definitely need __GFP_NORETRY, so OOM killer doesn't
>> -        * trigger under memory pressure as we really just want to
>> -        * fail instead.
>> -        */
>> -       const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
>> -       void *area;
>> -
>> -       if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>> -               area = kmalloc(size, GFP_USER | flags);
>> -               if (area != NULL)
>> -                       return area;
>> -       }
>> -
>> -       return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
>> -                        PAGE_KERNEL);
>> +       return kvzalloc(size, GFP_USER);
>>   }
>>
>>   void bpf_map_area_free(void *area)
>
> Looks fine by me.
> Daniel, thoughts?

I assume that kvzalloc() is still the same from [1], right? If so, then
it would unfortunately (partially) reintroduce the issue that was fixed.
If you look above at flags, they're also passed to __vmalloc() to not
trigger OOM in these situations I've experienced. This is effectively the
same requirement as in other networking areas f.e. that 5bad87348c70
("netfilter: x_tables: avoid warn and OOM killer on vmalloc call") has.
In your comment in kvzalloc() you eventually say that some of the above
modifiers are not supported. So there would be two options, i) just leave
out the kvzalloc() chunk for BPF area to avoid the merge conflict and tackle
it later (along with similar code from 5bad87348c70), or ii) implement
support for these modifiers as well to your original set. I guess it's not
too urgent, so we could also proceed with i) if that is easier for you to
proceed (I don't mind either way).

Thanks a lot,
Daniel

   [1] https://lkml.org/lkml/2017/1/12/442

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-25 18:14 ` Alexei Starovoitov
  0 siblings, 0 replies; 180+ messages in thread
From: Alexei Starovoitov @ 2017-01-25 18:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner,
	linux-mm, LKML, Daniel Borkmann, netdev

On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 25-01-17 14:10:06, Michal Hocko wrote:
>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
>> > On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
>> > > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
>> > > > Hi,
>> > > > this has been previously posted as a single patch [1] but later on more
>> > > > built on top. It turned out that there are users who would like to have
>> > > > __GFP_REPEAT semantic. This is currently implemented for costly >64B
>> > > > requests. Doing the same for smaller requests would require to redefine
>> > > > __GFP_REPEAT semantic in the page allocator which is out of scope of
>> > > > this series.
>> > > >
>> > > > There are many open coded kmalloc with vmalloc fallback instances in
>> > > > the tree.  Most of them are not careful enough or simply do not care
>> > > > about the underlying semantic of the kmalloc/page allocator which means
>> > > > that a) some vmalloc fallbacks are basically unreachable because the
>> > > > kmalloc part will keep retrying until it succeeds b) the page allocator
>> > > > can invoke a really disruptive steps like the OOM killer to move forward
>> > > > which doesn't sound appropriate when we consider that the vmalloc
>> > > > fallback is available.
>> > > >
>> > > > As it can be seen implementing kvmalloc requires quite an intimate
>> > > > knowledge if the page allocator and the memory reclaim internals which
>> > > > strongly suggests that a helper should be implemented in the memory
>> > > > subsystem proper.
>> > > >
>> > > > Most callers I could find have been converted to use the helper instead.
>> > > > This is patch 5. There are some more relying on __GFP_REPEAT in the
>> > > > networking stack which I have converted as well but considering we do
>> > > > not have a support for __GFP_REPEAT for requests smaller than 64kB I
>> > > > have marked it RFC.
>> > >
>> > > Are there any more comments? I would really appreciate to hear from
>> > > networking folks before I resubmit the series.
>> >
>> > while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
>> > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
>> > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
>> > it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
>> > So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
>>
>> OK, will do. Thanks for the heads up.
>
> Just for the record, I will fold the following into the patch 1
> ---
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 19b6129eab23..8697f43cf93c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>
>  void *bpf_map_area_alloc(size_t size)
>  {
> -       /* We definitely need __GFP_NORETRY, so OOM killer doesn't
> -        * trigger under memory pressure as we really just want to
> -        * fail instead.
> -        */
> -       const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
> -       void *area;
> -
> -       if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> -               area = kmalloc(size, GFP_USER | flags);
> -               if (area != NULL)
> -                       return area;
> -       }
> -
> -       return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
> -                        PAGE_KERNEL);
> +       return kvzalloc(size, GFP_USER);
>  }
>
>  void bpf_map_area_free(void *area)

Looks fine by me.
Daniel, thoughts?

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 0/6 v3] kvmalloc
@ 2017-01-25 18:14 ` Alexei Starovoitov
  0 siblings, 0 replies; 180+ messages in thread
From: Alexei Starovoitov @ 2017-01-25 18:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Johannes Weiner,
	linux-mm, LKML, Daniel Borkmann, netdev

On Wed, Jan 25, 2017 at 5:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 25-01-17 14:10:06, Michal Hocko wrote:
>> On Tue 24-01-17 11:17:21, Alexei Starovoitov wrote:
>> > On Tue, Jan 24, 2017 at 04:17:52PM +0100, Michal Hocko wrote:
>> > > On Thu 12-01-17 16:37:11, Michal Hocko wrote:
>> > > > Hi,
>> > > > this has been previously posted as a single patch [1] but later on more
>> > > > built on top. It turned out that there are users who would like to have
>> > > > __GFP_REPEAT semantic. This is currently implemented for costly >64B
>> > > > requests. Doing the same for smaller requests would require to redefine
>> > > > __GFP_REPEAT semantic in the page allocator which is out of scope of
>> > > > this series.
>> > > >
>> > > > There are many open coded kmalloc with vmalloc fallback instances in
>> > > > the tree.  Most of them are not careful enough or simply do not care
>> > > > about the underlying semantic of the kmalloc/page allocator which means
>> > > > that a) some vmalloc fallbacks are basically unreachable because the
>> > > > kmalloc part will keep retrying until it succeeds b) the page allocator
>> > > > can invoke a really disruptive steps like the OOM killer to move forward
>> > > > which doesn't sound appropriate when we consider that the vmalloc
>> > > > fallback is available.
>> > > >
>> > > > As it can be seen implementing kvmalloc requires quite an intimate
>> > > > knowledge if the page allocator and the memory reclaim internals which
>> > > > strongly suggests that a helper should be implemented in the memory
>> > > > subsystem proper.
>> > > >
>> > > > Most callers I could find have been converted to use the helper instead.
>> > > > This is patch 5. There are some more relying on __GFP_REPEAT in the
>> > > > networking stack which I have converted as well but considering we do
>> > > > not have a support for __GFP_REPEAT for requests smaller than 64kB I
>> > > > have marked it RFC.
>> > >
>> > > Are there any more comments? I would really appreciate to hear from
>> > > networking folks before I resubmit the series.
>> >
>> > while this patchset was baking the bpf side switched to use bpf_map_area_alloc()
>> > which fixes the issue with missing __GFP_NORETRY that we had to fix quickly.
>> > See commit d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
>> > it covers all kmalloc/vmalloc pairs instead of just one place as in this set.
>> > So please rebase and switch bpf_map_area_alloc() to use kvmalloc().
>>
>> OK, will do. Thanks for the heads up.
>
> Just for the record, I will fold the following into the patch 1
> ---
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 19b6129eab23..8697f43cf93c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -53,21 +53,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl)
>
>  void *bpf_map_area_alloc(size_t size)
>  {
> -       /* We definitely need __GFP_NORETRY, so OOM killer doesn't
> -        * trigger under memory pressure as we really just want to
> -        * fail instead.
> -        */
> -       const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
> -       void *area;
> -
> -       if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> -               area = kmalloc(size, GFP_USER | flags);
> -               if (area != NULL)
> -                       return area;
> -       }
> -
> -       return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
> -                        PAGE_KERNEL);
> +       return kvzalloc(size, GFP_USER);
>  }
>
>  void bpf_map_area_free(void *area)

Looks fine by me.
Daniel, thoughts?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 180+ messages in thread

end of thread, other threads:[~2017-02-05 10:23 UTC | newest]

Thread overview: 180+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12 15:37 [PATCH 0/6 v3] kvmalloc Michal Hocko
2017-01-12 15:37 ` Michal Hocko
2017-01-12 15:37 ` [PATCH 1/6] mm: introduce kv[mz]alloc helpers Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-16  4:34   ` John Hubbard
2017-01-16  4:34     ` John Hubbard
2017-01-16  8:47     ` Michal Hocko
2017-01-16  8:47       ` Michal Hocko
2017-01-16 19:09       ` John Hubbard
2017-01-16 19:09         ` John Hubbard
2017-01-16 19:40         ` Michal Hocko
2017-01-16 19:40           ` Michal Hocko
2017-01-16 21:15           ` John Hubbard
2017-01-16 21:15             ` John Hubbard
2017-01-16 21:48             ` Michal Hocko
2017-01-16 21:48               ` Michal Hocko
2017-01-16 21:57               ` John Hubbard
2017-01-16 21:57                 ` John Hubbard
2017-01-17  7:51                 ` Michal Hocko
2017-01-17  7:51                   ` Michal Hocko
2017-01-18  5:59                   ` John Hubbard
2017-01-18  5:59                     ` John Hubbard
2017-01-18  8:21                     ` Michal Hocko
2017-01-18  8:21                       ` Michal Hocko
2017-01-19  8:37                       ` John Hubbard
2017-01-19  8:37                         ` John Hubbard
2017-01-19  8:45                         ` Michal Hocko
2017-01-19  8:45                           ` Michal Hocko
2017-01-19  9:09                           ` John Hubbard
2017-01-19  9:09                             ` John Hubbard
2017-01-19  9:56                             ` Michal Hocko
2017-01-19  9:56                               ` Michal Hocko
2017-01-19 21:28                               ` John Hubbard
2017-01-19 21:28                                 ` John Hubbard
2017-01-26 12:09   ` Michal Hocko
2017-01-26 12:09     ` Michal Hocko
2017-01-30  8:42     ` Vlastimil Babka
2017-01-30  8:42       ` Vlastimil Babka
2017-01-12 15:37 ` [PATCH 2/6] mm: support __GFP_REPEAT in kvmalloc_node for >=64kB Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 16:12   ` Michael S. Tsirkin
2017-01-12 16:12     ` Michael S. Tsirkin
2017-01-14  2:42   ` Tetsuo Handa
2017-01-14  2:42     ` Tetsuo Handa
2017-01-14  8:45     ` Michal Hocko
2017-01-14  8:45       ` Michal Hocko
2017-01-24 15:40   ` Michael S. Tsirkin
2017-01-24 15:40     ` Michael S. Tsirkin
2017-01-12 15:37 ` [PATCH 3/6] rhashtable: simplify a strange allocation pattern Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:37 ` [PATCH 4/6] ila: " Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:37 ` [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:57   ` David Sterba
2017-01-12 15:57     ` David Sterba
2017-01-12 15:57     ` David Sterba
2017-01-12 16:05   ` Christian Borntraeger
2017-01-12 16:05     ` Christian Borntraeger
2017-01-12 16:05     ` Christian Borntraeger
2017-01-12 16:54   ` Ilya Dryomov
2017-01-12 16:54     ` Ilya Dryomov
2017-01-12 16:54     ` Ilya Dryomov
2017-01-12 17:18     ` Michal Hocko
2017-01-12 17:18       ` Michal Hocko
2017-01-12 17:18       ` Michal Hocko
2017-01-12 17:00   ` Dan Williams
2017-01-12 17:00     ` Dan Williams
2017-01-12 17:00     ` Dan Williams
2017-01-12 17:26   ` Kees Cook
2017-01-12 17:26     ` Kees Cook
2017-01-12 17:26     ` Kees Cook
2017-01-12 17:37     ` Michal Hocko
2017-01-12 17:37       ` Michal Hocko
2017-01-12 17:37       ` Michal Hocko
2017-01-20 13:41       ` Vlastimil Babka
2017-01-20 13:41         ` Vlastimil Babka
2017-01-20 13:41         ` Vlastimil Babka
2017-01-24 15:00         ` Michal Hocko
2017-01-24 15:00           ` Michal Hocko
2017-01-24 15:00           ` Michal Hocko
2017-01-25 11:15           ` Vlastimil Babka
2017-01-25 11:15             ` Vlastimil Babka
2017-01-25 11:15             ` Vlastimil Babka
2017-01-25 13:09             ` Michal Hocko
2017-01-25 13:09               ` Michal Hocko
2017-01-25 13:09               ` Michal Hocko
2017-01-25 13:40               ` Ilya Dryomov
2017-01-25 13:40                 ` Ilya Dryomov
2017-01-25 13:40                 ` Ilya Dryomov
2017-01-12 17:29   ` Michal Hocko
2017-01-12 17:29     ` Michal Hocko
2017-01-12 17:29     ` Michal Hocko
2017-01-14  3:01     ` Tetsuo Handa
2017-01-14  3:01       ` Tetsuo Handa
2017-01-14  8:49       ` Michal Hocko
2017-01-14  8:49         ` Michal Hocko
2017-01-12 20:14   ` Boris Ostrovsky
2017-01-12 20:14     ` Boris Ostrovsky
2017-01-12 20:14     ` Boris Ostrovsky
2017-01-13  1:11   ` Dilger, Andreas
2017-01-13  1:11     ` Dilger, Andreas
2017-01-13  1:11     ` Dilger, Andreas
2017-01-14 10:56   ` Leon Romanovsky
2017-01-14 10:56     ` Leon Romanovsky
2017-01-16  7:33     ` Michal Hocko
2017-01-16  7:33       ` Michal Hocko
2017-01-16  7:33       ` Michal Hocko
2017-01-16  8:28       ` Leon Romanovsky
2017-01-16  8:28         ` Leon Romanovsky
2017-01-16  8:18   ` Tariq Toukan
2017-01-16  8:18     ` Tariq Toukan
2017-01-16  8:18     ` Tariq Toukan
2017-01-12 15:37 ` [RFC PATCH 6/6] net: use kvmalloc with __GFP_REPEAT rather than open coded variant Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-12 15:37   ` Michal Hocko
2017-01-24 15:17 ` [PATCH 0/6 v3] kvmalloc Michal Hocko
2017-01-24 15:17   ` Michal Hocko
2017-01-24 16:00   ` Eric Dumazet
2017-01-24 16:00     ` Eric Dumazet
2017-01-25 13:10     ` Michal Hocko
2017-01-25 13:10       ` Michal Hocko
2017-01-24 19:17   ` Alexei Starovoitov
2017-01-24 19:17     ` Alexei Starovoitov
2017-01-25 13:10     ` Michal Hocko
2017-01-25 13:10       ` Michal Hocko
2017-01-25 13:21       ` Michal Hocko
2017-01-25 13:21         ` Michal Hocko
2017-01-25 18:14 Alexei Starovoitov
2017-01-25 18:14 ` Alexei Starovoitov
2017-01-25 20:16 ` Daniel Borkmann
2017-01-25 20:16   ` Daniel Borkmann
2017-01-26  7:43   ` Michal Hocko
2017-01-26  7:43     ` Michal Hocko
2017-01-26  9:36     ` Daniel Borkmann
2017-01-26  9:36       ` Daniel Borkmann
2017-01-26  9:48       ` David Laight
2017-01-26  9:48         ` David Laight
2017-01-26 10:08       ` Michal Hocko
2017-01-26 10:08         ` Michal Hocko
2017-01-26 10:32         ` Michal Hocko
2017-01-26 10:32           ` Michal Hocko
2017-01-26 11:04           ` Daniel Borkmann
2017-01-26 11:04             ` Daniel Borkmann
2017-01-26 11:49             ` Michal Hocko
2017-01-26 11:49               ` Michal Hocko
2017-01-26 12:14           ` Joe Perches
2017-01-26 12:14             ` Joe Perches
2017-01-26 12:27             ` Michal Hocko
2017-01-26 12:27               ` Michal Hocko
2017-01-26 11:33         ` Daniel Borkmann
2017-01-26 11:33           ` Daniel Borkmann
2017-01-26 11:58           ` Michal Hocko
2017-01-26 11:58             ` Michal Hocko
2017-01-26 13:10             ` Daniel Borkmann
2017-01-26 13:10               ` Daniel Borkmann
2017-01-26 13:40               ` Michal Hocko
2017-01-26 13:40                 ` Michal Hocko
2017-01-26 14:13                 ` Michal Hocko
2017-01-26 14:13                   ` Michal Hocko
2017-01-26 14:13                   ` Michal Hocko
2017-01-26 20:34                 ` Daniel Borkmann
2017-01-26 20:34                   ` Daniel Borkmann
2017-01-27 10:05                   ` Michal Hocko
2017-01-27 10:05                     ` Michal Hocko
2017-01-27 20:12                     ` Daniel Borkmann
2017-01-27 20:12                       ` Daniel Borkmann
2017-01-30  7:56                       ` Michal Hocko
2017-01-30  7:56                         ` Michal Hocko
2017-01-30 16:15                         ` Daniel Borkmann
2017-01-30 16:15                           ` Daniel Borkmann
2017-01-30 16:28                           ` Michal Hocko
2017-01-30 16:28                             ` Michal Hocko
2017-01-30 16:45                             ` Daniel Borkmann
2017-01-30 16:45                               ` Daniel Borkmann
2017-01-30  9:49 Michal Hocko
2017-01-30  9:49 ` Michal Hocko
2017-02-05 10:23 ` Michal Hocko
2017-02-05 10:23   ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.