linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: introduce kv[mz]alloc helpers
@ 2017-01-02 13:37 Michal Hocko
  2017-01-02 15:55 ` Joe Perches
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Michal Hocko @ 2017-01-02 13:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Joe Perches, Michal Hocko,
	Anatoly Stepanov, Paolo Bonzini, Mike Snitzer,
	Michael S. Tsirkin, Theodore Ts'o, Andreas Dilger

From: Michal Hocko <mhocko@suse.com>

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
in general (note that the page table allocation is GFP_KERNEL). Those
need to be fixed separately.

apparmor has already claimed kv[mz]alloc so remove those and use
__aa_kvmalloc instead to prevent from the naming clashes.

Changes since v1
- define __vmalloc_node_flags for CONFIG_MMU=n

Cc: Anatoly Stepanov <astepanov@cloudlinux.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> # ext4 part
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
Hi,
I've sent this as an RFC [1] and there were no major objections. There
is a follow up work to be done, namely check the current users of
GPF_NOFS vmalloc fallback users but that needs to be done separately.  I
strongly suggest not pulling any nofs logic into vmalloc directly but
that is a separate topic that is not directly related to this patch. The
chosen approach is triggering a warning when GFP_KERNEL incompatible flags
are used for kv[zm]alloc* function for easier debugging.

Can we have this merged?

[1] http://lkml.kernel.org/r/20161208103300.23217-1-mhocko@kernel.org

 arch/x86/kvm/lapic.c                 |  4 ++--
 arch/x86/kvm/page_track.c            |  4 ++--
 arch/x86/kvm/x86.c                   |  4 ++--
 drivers/md/dm-stats.c                |  7 +------
 drivers/vhost/vhost.c                | 15 +++-----------
 fs/ext4/mballoc.c                    |  2 +-
 fs/ext4/super.c                      |  4 ++--
 fs/f2fs/f2fs.h                       | 20 ------------------
 fs/f2fs/file.c                       |  4 ++--
 fs/f2fs/segment.c                    | 14 ++++++-------
 fs/seq_file.c                        | 16 +--------------
 include/linux/kvm_host.h             |  2 --
 include/linux/mm.h                   | 14 +++++++++++++
 include/linux/vmalloc.h              |  1 +
 mm/nommu.c                           |  5 +++++
 mm/util.c                            | 40 ++++++++++++++++++++++++++++++++++++
 mm/vmalloc.c                         |  2 +-
 security/apparmor/apparmorfs.c       |  2 +-
 security/apparmor/include/apparmor.h | 10 ---------
 security/apparmor/match.c            |  2 +-
 virt/kvm/kvm_main.c                  | 18 +++-------------
 21 files changed, 89 insertions(+), 101 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 34a66b2d47e6..285a84288dea 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -167,8 +167,8 @@ static void recalculate_apic_map(struct kvm *kvm)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
 
-	new = kvm_kvzalloc(sizeof(struct kvm_apic_map) +
-	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1));
+	new = kvzalloc(sizeof(struct kvm_apic_map) +
+	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1), GFP_KERNEL);
 
 	if (!new)
 		goto out;
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index 4a1c13eaa518..d46663e655b0 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -38,8 +38,8 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
 	int  i;
 
 	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		slot->arch.gfn_track[i] = kvm_kvzalloc(npages *
-					    sizeof(*slot->arch.gfn_track[i]));
+		slot->arch.gfn_track[i] = kvzalloc(npages *
+					    sizeof(*slot->arch.gfn_track[i]), GFP_KERNEL);
 		if (!slot->arch.gfn_track[i])
 			goto track_free;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f0d2383f5ee..ab4d4b8f00d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8064,13 +8064,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 				      slot->base_gfn, level) + 1;
 
 		slot->arch.rmap[i] =
-			kvm_kvzalloc(lpages * sizeof(*slot->arch.rmap[i]));
+			kvzalloc(lpages * sizeof(*slot->arch.rmap[i]), GFP_KERNEL);
 		if (!slot->arch.rmap[i])
 			goto out_free;
 		if (i == 0)
 			continue;
 
-		linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
+		linfo = kvzalloc(lpages * sizeof(*linfo), GFP_KERNEL);
 		if (!linfo)
 			goto out_free;
 
diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 38b05f23b96c..674f9a1686f7 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -146,12 +146,7 @@ static void *dm_kvzalloc(size_t alloc_size, int node)
 	if (!claim_shared_memory(alloc_size))
 		return NULL;
 
-	if (alloc_size <= KMALLOC_MAX_SIZE) {
-		p = kzalloc_node(alloc_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, node);
-		if (p)
-			return p;
-	}
-	p = vzalloc_node(alloc_size, node);
+	p = kvzalloc_node(alloc_size, GFP_KERNEL | __GFP_NOMEMALLOC, node);
 	if (p)
 		return p;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 28e2aebefd8f..08aa087840f9 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -514,18 +514,9 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
 
-static void *vhost_kvzalloc(unsigned long size)
-{
-	void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-
-	if (!n)
-		n = vzalloc(size);
-	return n;
-}
-
 struct vhost_umem *vhost_dev_reset_owner_prepare(void)
 {
-	return vhost_kvzalloc(sizeof(struct vhost_umem));
+	return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
 
@@ -1189,7 +1180,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
 
 static struct vhost_umem *vhost_umem_alloc(void)
 {
-	struct vhost_umem *umem = vhost_kvzalloc(sizeof(*umem));
+	struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL);
 
 	if (!umem)
 		return NULL;
@@ -1215,7 +1206,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 		return -EOPNOTSUPP;
 	if (mem.nregions > max_mem_regions)
 		return -E2BIG;
-	newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions));
+	newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
 	if (!newmem)
 		return -ENOMEM;
 
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 7ae43c59bc79..bb49409172d9 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2381,7 +2381,7 @@ int ext4_mb_alloc_groupinfo(struct super_block *sb, ext4_group_t ngroups)
 		return 0;
 
 	size = roundup_pow_of_two(sizeof(*sbi->s_group_info) * size);
-	new_groupinfo = ext4_kvzalloc(size, GFP_KERNEL);
+	new_groupinfo = kvzalloc(size, GFP_KERNEL);
 	if (!new_groupinfo) {
 		ext4_msg(sb, KERN_ERR, "can't allocate buddy meta group");
 		return -ENOMEM;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 63a6b6332682..1a81c77f0ea4 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2116,7 +2116,7 @@ int ext4_alloc_flex_bg_array(struct super_block *sb, ext4_group_t ngroup)
 		return 0;
 
 	size = roundup_pow_of_two(size * sizeof(struct flex_groups));
-	new_groups = ext4_kvzalloc(size, GFP_KERNEL);
+	new_groups = kvzalloc(size, GFP_KERNEL);
 	if (!new_groups) {
 		ext4_msg(sb, KERN_ERR, "not enough memory for %d flex groups",
 			 size / (int) sizeof(struct flex_groups));
@@ -3850,7 +3850,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 			goto failed_mount;
 		}
 	}
-	sbi->s_group_desc = ext4_kvmalloc(db_count *
+	sbi->s_group_desc = kvmalloc(db_count *
 					  sizeof(struct buffer_head *),
 					  GFP_KERNEL);
 	if (sbi->s_group_desc == NULL) {
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2da8c3aa0ce5..4130df0a8e64 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1929,26 +1929,6 @@ static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi,
 	return kmalloc(size, flags);
 }
 
-static inline void *f2fs_kvmalloc(size_t size, gfp_t flags)
-{
-	void *ret;
-
-	ret = kmalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags, PAGE_KERNEL);
-	return ret;
-}
-
-static inline void *f2fs_kvzalloc(size_t size, gfp_t flags)
-{
-	void *ret;
-
-	ret = kzalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags | __GFP_ZERO, PAGE_KERNEL);
-	return ret;
-}
-
 #define get_inode_mode(i) \
 	((is_inode_flag_set(i, FI_ACL_MODE)) ? \
 	 (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 49f10dce817d..fb2e0c156135 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1013,11 +1013,11 @@ static int __exchange_data_block(struct inode *src_inode,
 	while (len) {
 		olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len);
 
-		src_blkaddr = f2fs_kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
+		src_blkaddr = kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
 		if (!src_blkaddr)
 			return -ENOMEM;
 
-		do_replace = f2fs_kvzalloc(sizeof(int) * olen, GFP_KERNEL);
+		do_replace = kvzalloc(sizeof(int) * olen, GFP_KERNEL);
 		if (!do_replace) {
 			kvfree(src_blkaddr);
 			return -ENOMEM;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0738f48293cc..c50c883bfc1a 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2286,13 +2286,13 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
 
 	SM_I(sbi)->sit_info = sit_i;
 
-	sit_i->sentries = f2fs_kvzalloc(MAIN_SEGS(sbi) *
+	sit_i->sentries = kvzalloc(MAIN_SEGS(sbi) *
 					sizeof(struct seg_entry), GFP_KERNEL);
 	if (!sit_i->sentries)
 		return -ENOMEM;
 
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
-	sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+	sit_i->dirty_sentries_bitmap = kvzalloc(bitmap_size, GFP_KERNEL);
 	if (!sit_i->dirty_sentries_bitmap)
 		return -ENOMEM;
 
@@ -2318,7 +2318,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
 		return -ENOMEM;
 
 	if (sbi->segs_per_sec > 1) {
-		sit_i->sec_entries = f2fs_kvzalloc(MAIN_SECS(sbi) *
+		sit_i->sec_entries = kvzalloc(MAIN_SECS(sbi) *
 					sizeof(struct sec_entry), GFP_KERNEL);
 		if (!sit_i->sec_entries)
 			return -ENOMEM;
@@ -2364,12 +2364,12 @@ static int build_free_segmap(struct f2fs_sb_info *sbi)
 	SM_I(sbi)->free_info = free_i;
 
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
-	free_i->free_segmap = f2fs_kvmalloc(bitmap_size, GFP_KERNEL);
+	free_i->free_segmap = kvmalloc(bitmap_size, GFP_KERNEL);
 	if (!free_i->free_segmap)
 		return -ENOMEM;
 
 	sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
-	free_i->free_secmap = f2fs_kvmalloc(sec_bitmap_size, GFP_KERNEL);
+	free_i->free_secmap = kvmalloc(sec_bitmap_size, GFP_KERNEL);
 	if (!free_i->free_secmap)
 		return -ENOMEM;
 
@@ -2537,7 +2537,7 @@ static int init_victim_secmap(struct f2fs_sb_info *sbi)
 	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
 	unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
 
-	dirty_i->victim_secmap = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+	dirty_i->victim_secmap = kvzalloc(bitmap_size, GFP_KERNEL);
 	if (!dirty_i->victim_secmap)
 		return -ENOMEM;
 	return 0;
@@ -2559,7 +2559,7 @@ static int build_dirty_segmap(struct f2fs_sb_info *sbi)
 	bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
 
 	for (i = 0; i < NR_DIRTY_TYPE; i++) {
-		dirty_i->dirty_segmap[i] = f2fs_kvzalloc(bitmap_size, GFP_KERNEL);
+		dirty_i->dirty_segmap[i] = kvzalloc(bitmap_size, GFP_KERNEL);
 		if (!dirty_i->dirty_segmap[i])
 			return -ENOMEM;
 	}
diff --git a/fs/seq_file.c b/fs/seq_file.c
index 368bfb92b115..023d92dfffa9 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -25,21 +25,7 @@ static void seq_set_overflow(struct seq_file *m)
 
 static void *seq_buf_alloc(unsigned long size)
 {
-	void *buf;
-	gfp_t gfp = GFP_KERNEL;
-
-	/*
-	 * For high order allocations, use __GFP_NORETRY to avoid oom-killing -
-	 * it's better to fall back to vmalloc() than to kill things.  For small
-	 * allocations, just use GFP_KERNEL which will oom kill, thus no need
-	 * for vmalloc fallback.
-	 */
-	if (size > PAGE_SIZE)
-		gfp |= __GFP_NORETRY | __GFP_NOWARN;
-	buf = kmalloc(size, gfp);
-	if (!buf && size > PAGE_SIZE)
-		buf = vmalloc(size);
-	return buf;
+	return kvmalloc(size, GFP_KERNEL);
 }
 
 /**
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1c5190dab2c1..00e6f93d1ee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -768,8 +768,6 @@ void kvm_arch_check_processor_compat(void *rtn);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
 
-void *kvm_kvzalloc(unsigned long size);
-
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4424784ac374..38b3e2d62f0c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -484,6 +484,20 @@ static inline int is_vmalloc_or_module_addr(const void *x)
 }
 #endif
 
+extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
+static inline void *kvmalloc(size_t size, gfp_t flags)
+{
+	return kvmalloc_node(size, flags, NUMA_NO_NODE);
+}
+static inline void *kvzalloc_node(size_t size, gfp_t flags, int node)
+{
+	return kvmalloc_node(size, flags | __GFP_ZERO, node);
+}
+static inline void *kvzalloc(size_t size, gfp_t flags)
+{
+	return kvmalloc(size, flags | __GFP_ZERO);
+}
+
 extern void kvfree(const void *addr);
 
 static inline atomic_t *compound_mapcount_ptr(struct page *page)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index d68edffbf142..46991ad3ddd5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -80,6 +80,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			unsigned long start, unsigned long end, gfp_t gfp_mask,
 			pgprot_t prot, unsigned long vm_flags, int node,
 			const void *caller);
+extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
diff --git a/mm/nommu.c b/mm/nommu.c
index 210d7ec2843c..99e9f07e583b 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -236,6 +236,11 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
 }
 EXPORT_SYMBOL(__vmalloc);
 
+void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags)
+{
+	return __vmalloc(size, flags, PAGE_KERNEL);
+}
+
 void *vmalloc_user(unsigned long size)
 {
 	void *ret;
diff --git a/mm/util.c b/mm/util.c
index 449272271840..b9d7cc43d9d1 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -346,6 +346,46 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_mmap);
 
+/**
+ * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
+ * @size: size of the request.
+ * @flags: gfp mask for the allocation - must be compatible with GFP_KERNEL.
+ * @node: numa node to allocate from
+ *
+ * Uses kmalloc to get the memory but if the allocation fails then falls back
+ * to the vmalloc allocator. Use kvfree for freeing the memory.
+ */
+void *kvmalloc_node(size_t size, gfp_t flags, int node)
+{
+	gfp_t kmalloc_flags = flags;
+	void *ret;
+
+	/*
+	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
+	 * so the given set of flags has to be compatible.
+	 */
+	WARN_ON((flags & GFP_KERNEL) != GFP_KERNEL);
+
+	/*
+	 * Make sure that larger requests are not too disruptive - no OOM
+	 * killer and no allocation failure warnings as we have a fallback
+	 */
+	if (size > PAGE_SIZE)
+		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
+
+	ret = kmalloc_node(size, kmalloc_flags, node);
+
+	/*
+	 * It doesn't really make sense to fallback to vmalloc for sub page
+	 * requests
+	 */
+	if (ret || size < PAGE_SIZE)
+		return ret;
+
+	return __vmalloc_node_flags(size, node, flags);
+}
+EXPORT_SYMBOL(kvmalloc_node);
+
 void kvfree(const void *addr)
 {
 	if (is_vmalloc_addr(addr))
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a5584384eabc..5e2004aceec9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1757,7 +1757,7 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-static inline void *__vmalloc_node_flags(unsigned long size,
+void *__vmalloc_node_flags(unsigned long size,
 					int node, gfp_t flags)
 {
 	return __vmalloc_node(size, 1, flags, PAGE_KERNEL,
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 5923d5665209..83789a03379f 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -100,7 +100,7 @@ static char *aa_simple_write_to_buffer(int op, const char __user *userbuf,
 		return ERR_PTR(-EACCES);
 
 	/* freed by caller to simple_write_to_buffer */
-	data = kvmalloc(alloc_size);
+	data = __aa_kvmalloc(alloc_size, 0);
 	if (data == NULL)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/security/apparmor/include/apparmor.h b/security/apparmor/include/apparmor.h
index 5d721e990876..c88fb0ebc756 100644
--- a/security/apparmor/include/apparmor.h
+++ b/security/apparmor/include/apparmor.h
@@ -68,16 +68,6 @@ char *aa_split_fqname(char *args, char **ns_name);
 void aa_info_message(const char *str);
 void *__aa_kvmalloc(size_t size, gfp_t flags);
 
-static inline void *kvmalloc(size_t size)
-{
-	return __aa_kvmalloc(size, 0);
-}
-
-static inline void *kvzalloc(size_t size)
-{
-	return __aa_kvmalloc(size, __GFP_ZERO);
-}
-
 /* returns 0 if kref not incremented */
 static inline int kref_get_not0(struct kref *kref)
 {
diff --git a/security/apparmor/match.c b/security/apparmor/match.c
index 3f900fcca8fb..55f6ae0067a3 100644
--- a/security/apparmor/match.c
+++ b/security/apparmor/match.c
@@ -61,7 +61,7 @@ static struct table_header *unpack_table(char *blob, size_t bsize)
 	if (bsize < tsize)
 		goto out;
 
-	table = kvzalloc(tsize);
+	table = __aa_kvmalloc(tsize, __GFP_ZERO);
 	if (table) {
 		table->td_id = th.td_id;
 		table->td_flags = th.td_flags;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index de102cae7125..c2646d3d6855 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -502,7 +502,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
 	int i;
 	struct kvm_memslots *slots;
 
-	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
+	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!slots)
 		return NULL;
 
@@ -685,18 +685,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	return ERR_PTR(r);
 }
 
-/*
- * Avoid using vmalloc for a small buffer.
- * Should not be used when the size is statically known.
- */
-void *kvm_kvzalloc(unsigned long size)
-{
-	if (size > PAGE_SIZE)
-		return vzalloc(size);
-	else
-		return kzalloc(size, GFP_KERNEL);
-}
-
 static void kvm_destroy_devices(struct kvm *kvm)
 {
 	struct kvm_device *dev, *tmp;
@@ -775,7 +763,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
 	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
 
-	memslot->dirty_bitmap = kvm_kvzalloc(dirty_bytes);
+	memslot->dirty_bitmap = kvzalloc(dirty_bytes, GFP_KERNEL);
 	if (!memslot->dirty_bitmap)
 		return -ENOMEM;
 
@@ -995,7 +983,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			goto out_free;
 	}
 
-	slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
+	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!slots)
 		goto out_free;
 	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: introduce kv[mz]alloc helpers
  2017-01-02 13:37 [PATCH] mm: introduce kv[mz]alloc helpers Michal Hocko
@ 2017-01-02 15:55 ` Joe Perches
  2017-01-02 16:02   ` Michal Hocko
  2017-01-03 10:23 ` Vlastimil Babka
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Joe Perches @ 2017-01-02 15:55 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Michal Hocko,
	Anatoly Stepanov, Paolo Bonzini, Mike Snitzer,
	Michael S. Tsirkin, Theodore Ts'o, Andreas Dilger

On Mon, 2017-01-02 at 14:37 +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Using kmalloc with the vmalloc fallback for larger allocations is a
> common pattern in the kernel code. Yet we do not have any common helper
> for that and so users have invented their own helpers. Some of them are
> really creative when doing so. Let's just add kv[mz]alloc and make sure
> it is implemented properly. This implementation makes sure to not make
> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> to not warn about allocation failures. This also rules out the OOM
> killer as the vmalloc is a more approapriate fallback than a disruptive
> user visible action.
> 
> This patch also changes some existing users and removes helpers which
> are specific for them. In some cases this is not possible (e.g.
> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> in general (note that the page table allocation is GFP_KERNEL). Those
> need to be fixed separately.
> 
> apparmor has already claimed kv[mz]alloc so remove those and use
> __aa_kvmalloc instead to prevent from the naming clashes.

I have no real objection but perhaps this would
be better done as 3 or more patches

o rename apparmor uses
o introduce generic
o conversions to generic

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: introduce kv[mz]alloc helpers
  2017-01-02 15:55 ` Joe Perches
@ 2017-01-02 16:02   ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-01-02 16:02 UTC (permalink / raw)
  To: Joe Perches
  Cc: Andrew Morton, Vlastimil Babka, David Rientjes, Mel Gorman,
	Johannes Weiner, Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o, Andreas Dilger

On Mon 02-01-17 07:55:22, Joe Perches wrote:
> On Mon, 2017-01-02 at 14:37 +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Using kmalloc with the vmalloc fallback for larger allocations is a
> > common pattern in the kernel code. Yet we do not have any common helper
> > for that and so users have invented their own helpers. Some of them are
> > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > it is implemented properly. This implementation makes sure to not make
> > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > to not warn about allocation failures. This also rules out the OOM
> > killer as the vmalloc is a more approapriate fallback than a disruptive
> > user visible action.
> > 
> > This patch also changes some existing users and removes helpers which
> > are specific for them. In some cases this is not possible (e.g.
> > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > in general (note that the page table allocation is GFP_KERNEL). Those
> > need to be fixed separately.
> > 
> > apparmor has already claimed kv[mz]alloc so remove those and use
> > __aa_kvmalloc instead to prevent from the naming clashes.
> 
> I have no real objection but perhaps this would
> be better done as 3 or more patches
> 
> o rename apparmor uses
> o introduce generic
> o conversions to generic

Whatever maintainers of the respective code prefer. I usually prefer to
have newly introduced functions along with their users so that it is
clear how they are used. The patch doesn't seem to be too large
either...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: introduce kv[mz]alloc helpers
  2017-01-02 13:37 [PATCH] mm: introduce kv[mz]alloc helpers Michal Hocko
  2017-01-02 15:55 ` Joe Perches
@ 2017-01-03 10:23 ` Vlastimil Babka
  2017-01-03 10:33   ` Michal Hocko
  2017-01-04 14:20 ` Michal Hocko
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2017-01-03 10:23 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: David Rientjes, Mel Gorman, Johannes Weiner, Al Viro, linux-mm,
	LKML, kvm, linux-f2fs-devel, linux-security-module, linux-ext4,
	Joe Perches, Michal Hocko, Anatoly Stepanov, Paolo Bonzini,
	Mike Snitzer, Michael S. Tsirkin, Theodore Ts'o,
	Andreas Dilger

On 01/02/2017 02:37 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> Using kmalloc with the vmalloc fallback for larger allocations is a
> common pattern in the kernel code. Yet we do not have any common helper
> for that and so users have invented their own helpers. Some of them are
> really creative when doing so. Let's just add kv[mz]alloc and make sure
> it is implemented properly. This implementation makes sure to not make
> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> to not warn about allocation failures. This also rules out the OOM
> killer as the vmalloc is a more approapriate fallback than a disruptive
> user visible action.
>
> This patch also changes some existing users and removes helpers which
> are specific for them. In some cases this is not possible (e.g.
> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> in general (note that the page table allocation is GFP_KERNEL). Those
> need to be fixed separately.
>
> apparmor has already claimed kv[mz]alloc so remove those and use
> __aa_kvmalloc instead to prevent from the naming clashes.
>
> Changes since v1
> - define __vmalloc_node_flags for CONFIG_MMU=n
>
> Cc: Anatoly Stepanov <astepanov@cloudlinux.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Reviewed-by: Andreas Dilger <adilger@dilger.ca> # ext4 part
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>
(but with a small fix and suggestion below)

> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -346,6 +346,46 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
>  }
>  EXPORT_SYMBOL(vm_mmap);
>
> +/**
> + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
> + * @size: size of the request.
> + * @flags: gfp mask for the allocation - must be compatible with GFP_KERNEL.
> + * @node: numa node to allocate from
> + *
> + * Uses kmalloc to get the memory but if the allocation fails then falls back
> + * to the vmalloc allocator. Use kvfree for freeing the memory.
> + */
> +void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	gfp_t kmalloc_flags = flags;
> +	void *ret;
> +
> +	/*
> +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> +	 * so the given set of flags has to be compatible.
> +	 */
> +	WARN_ON((flags & GFP_KERNEL) != GFP_KERNEL);

Wouldn't a _ONCE be sufficient? It's unlikely that multiple wrong call sites 
appear out of the blue, but we don't want to flood the log from a single 
frequently called site. No strong feelings though.

> +
> +	/*
> +	 * Make sure that larger requests are not too disruptive - no OOM
> +	 * killer and no allocation failure warnings as we have a fallback
> +	 */
> +	if (size > PAGE_SIZE)
> +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +
> +	ret = kmalloc_node(size, kmalloc_flags, node);
> +
> +	/*
> +	 * It doesn't really make sense to fallback to vmalloc for sub page
> +	 * requests
> +	 */
> +	if (ret || size < PAGE_SIZE)

This should be size <= PAGE_SIZE.

> +		return ret;
> +
> +	return __vmalloc_node_flags(size, node, flags);
> +}
> +EXPORT_SYMBOL(kvmalloc_node);
> +
>  void kvfree(const void *addr)
>  {
>  	if (is_vmalloc_addr(addr))

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: introduce kv[mz]alloc helpers
  2017-01-03 10:23 ` Vlastimil Babka
@ 2017-01-03 10:33   ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-01-03 10:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Joe Perches, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o, Andreas Dilger

On Tue 03-01-17 11:23:04, Vlastimil Babka wrote:
> On 01/02/2017 02:37 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Using kmalloc with the vmalloc fallback for larger allocations is a
> > common pattern in the kernel code. Yet we do not have any common helper
> > for that and so users have invented their own helpers. Some of them are
> > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > it is implemented properly. This implementation makes sure to not make
> > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > to not warn about allocation failures. This also rules out the OOM
> > killer as the vmalloc is a more approapriate fallback than a disruptive
> > user visible action.
> > 
> > This patch also changes some existing users and removes helpers which
> > are specific for them. In some cases this is not possible (e.g.
> > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > in general (note that the page table allocation is GFP_KERNEL). Those
> > need to be fixed separately.
> > 
> > apparmor has already claimed kv[mz]alloc so remove those and use
> > __aa_kvmalloc instead to prevent from the naming clashes.
> > 
> > Changes since v1
> > - define __vmalloc_node_flags for CONFIG_MMU=n
> > 
> > Cc: Anatoly Stepanov <astepanov@cloudlinux.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Mike Snitzer <snitzer@redhat.com>
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: "Theodore Ts'o" <tytso@mit.edu>
> > Reviewed-by: Andreas Dilger <adilger@dilger.ca> # ext4 part
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> (but with a small fix and suggestion below)

Thanks!

> 
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -346,6 +346,46 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
> >  }
> >  EXPORT_SYMBOL(vm_mmap);
> > 
> > +/**
> > + * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
> > + * @size: size of the request.
> > + * @flags: gfp mask for the allocation - must be compatible with GFP_KERNEL.
> > + * @node: numa node to allocate from
> > + *
> > + * Uses kmalloc to get the memory but if the allocation fails then falls back
> > + * to the vmalloc allocator. Use kvfree for freeing the memory.
> > + */
> > +void *kvmalloc_node(size_t size, gfp_t flags, int node)
> > +{
> > +	gfp_t kmalloc_flags = flags;
> > +	void *ret;
> > +
> > +	/*
> > +	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> > +	 * so the given set of flags has to be compatible.
> > +	 */
> > +	WARN_ON((flags & GFP_KERNEL) != GFP_KERNEL);
> 
> Wouldn't a _ONCE be sufficient? It's unlikely that multiple wrong call sites
> appear out of the blue, but we don't want to flood the log from a single
> frequently called site. No strong feelings though.

Fair enough, I will make it WARN_ON_ONCE. I wish WARN_ON_ONCE would be
more clever, though. We can lose information about different call sites.
I was thinking about how to deal with it and I stackdepot sounds like it
could help here. But this is off-topic...

> > +
> > +	/*
> > +	 * Make sure that larger requests are not too disruptive - no OOM
> > +	 * killer and no allocation failure warnings as we have a fallback
> > +	 */
> > +	if (size > PAGE_SIZE)
> > +		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> > +
> > +	ret = kmalloc_node(size, kmalloc_flags, node);
> > +
> > +	/*
> > +	 * It doesn't really make sense to fallback to vmalloc for sub page
> > +	 * requests
> > +	 */
> > +	if (ret || size < PAGE_SIZE)
> 
> This should be size <= PAGE_SIZE.

You are right of course!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: introduce kv[mz]alloc helpers
  2017-01-02 13:37 [PATCH] mm: introduce kv[mz]alloc helpers Michal Hocko
  2017-01-02 15:55 ` Joe Perches
  2017-01-03 10:23 ` Vlastimil Babka
@ 2017-01-04 14:20 ` Michal Hocko
  2017-01-06 14:36   ` Vlastimil Babka
  2017-01-04 18:12 ` [PATCH] mm: support __GFP_REPEAT in kvmalloc_node Michal Hocko
  2017-01-06 13:29 ` [PATCH] mm: introduce kv[mz]alloc helpers Vlastimil Babka
  4 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-01-04 14:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Joe Perches, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o, Andreas Dilger

OK, so I've checked the open coded implementations and converted most of
them. There are few which are either confused and need some special
handling or need double checking.

I can fold this into the original patch or keep it as a separate patch.
Whatever works better for others.
---
>From 72a8493c4f692fca8980b59e0a79d09041164203 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 4 Jan 2017 13:30:32 +0100
Subject: [PATCH] treewide: use kv[mz]alloc* rather than opencoded variants

There are many code paths opencoding kvmalloc. Let's use the helper
instead.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/kvm/kvm-s390.c                   | 10 ++--------
 crypto/lzo.c                               |  4 +---
 drivers/acpi/apei/erst.c                   |  8 ++------
 drivers/char/agp/generic.c                 |  8 +-------
 drivers/gpu/drm/nouveau/nouveau_gem.c      |  4 +---
 drivers/md/bcache/util.h                   | 10 ++--------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |  9 +++------
 drivers/nvdimm/dimm_devs.c                 |  5 +----
 drivers/xen/evtchn.c                       | 14 +-------------
 fs/btrfs/ctree.c                           |  9 +++------
 fs/btrfs/ioctl.c                           |  9 +++------
 fs/btrfs/send.c                            | 27 +++++++++------------------
 fs/ceph/file.c                             |  9 +++------
 fs/select.c                                |  5 +----
 fs/xattr.c                                 | 27 +++++++++------------------
 lib/iov_iter.c                             |  5 +----
 mm/frame_vector.c                          |  5 +----
 net/netfilter/x_tables.c                   | 11 +----------
 net/sched/sch_fq.c                         | 12 +-----------
 net/sched/sch_netem.c                      |  6 +-----
 net/sched/sch_sfq.c                        |  6 +-----
 security/keys/keyctl.c                     | 22 ++++++----------------
 22 files changed, 54 insertions(+), 171 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index bec71e902be3..ca590def234b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1126,10 +1126,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
@@ -1171,10 +1168,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
 		return -EINVAL;
 
-	keys = kmalloc_array(args->count, sizeof(uint8_t),
-			     GFP_KERNEL | __GFP_NOWARN);
-	if (!keys)
-		keys = vmalloc(sizeof(uint8_t) * args->count);
+	keys = kvmalloc(sizeof(uint8_t) * args->count, GFP_KERNEL);
 	if (!keys)
 		return -ENOMEM;
 
diff --git a/crypto/lzo.c b/crypto/lzo.c
index 168df784da84..218567d717d6 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -32,9 +32,7 @@ static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
 {
 	void *ctx;
 
-	ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
-	if (!ctx)
-		ctx = vmalloc(LZO1X_MEM_COMPRESS);
+	ctx = kvmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index ec4f507b524f..a2898df61744 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -513,7 +513,7 @@ static int __erst_record_id_cache_add_one(void)
 	if (i < erst_record_id_cache.len)
 		goto retry;
 	if (erst_record_id_cache.len >= erst_record_id_cache.size) {
-		int new_size, alloc_size;
+		int new_size;
 		u64 *new_entries;
 
 		new_size = erst_record_id_cache.size * 2;
@@ -524,11 +524,7 @@ static int __erst_record_id_cache_add_one(void)
 				pr_warn(FW_WARN "too many record IDs!\n");
 			return 0;
 		}
-		alloc_size = new_size * sizeof(entries[0]);
-		if (alloc_size < PAGE_SIZE)
-			new_entries = kmalloc(alloc_size, GFP_KERNEL);
-		else
-			new_entries = vmalloc(alloc_size);
+		new_entries = kvmalloc(new_size * sizeof(entries[0]), GFP_KERNEL);
 		if (!new_entries)
 			return -ENOMEM;
 		memcpy(new_entries, entries,
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index f002fa5d1887..bdf418cac8ef 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -88,13 +88,7 @@ static int agp_get_key(void)
 
 void agp_alloc_page_array(size_t size, struct agp_memory *mem)
 {
-	mem->pages = NULL;
-
-	if (size <= 2*PAGE_SIZE)
-		mem->pages = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (mem->pages == NULL) {
-		mem->pages = vmalloc(size);
-	}
+	mem->pages = kvmalloc(size, GFP_KERNEL);
 }
 EXPORT_SYMBOL(agp_alloc_page_array);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 201b52b750dd..77dd73ff126f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -568,9 +568,7 @@ u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
 
 	size *= nmemb;
 
-	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!mem)
-		mem = vmalloc(size);
+	mem = kvmalloc(size, GFP_KERNEL);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index cf2cbc211d83..9dc0f0ff0321 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -44,10 +44,7 @@ struct closure;
 	(heap)->size = (_size);						\
 	_bytes = (heap)->size * sizeof(*(heap)->data);			\
 	(heap)->data = NULL;						\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(heap)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(heap)->data) && ((gfp) & GFP_KERNEL))			\
-		(heap)->data = vmalloc(_bytes);				\
+	(heap)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(heap)->data;							\
 })
 
@@ -138,10 +135,7 @@ do {									\
 	(fifo)->front = (fifo)->back = 0;				\
 	(fifo)->data = NULL;						\
 									\
-	if (_bytes < KMALLOC_MAX_SIZE)					\
-		(fifo)->data = kmalloc(_bytes, (gfp));			\
-	if ((!(fifo)->data) && ((gfp) & GFP_KERNEL))			\
-		(fifo)->data = vmalloc(_bytes);				\
+	(fifo)->data = kvmalloc(_bytes, (gfp) & GFP_KERNEL);		\
 	(fifo)->data;							\
 })
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 5886ad78058f..a5c1b815145e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -70,13 +70,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
 
 	tmp = size * sizeof(struct mlx4_en_tx_info);
-	ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
+	ring->tx_info = kvmalloc_node(tmp, GFP_KERNEL, node);
 	if (!ring->tx_info) {
-		ring->tx_info = vmalloc(tmp);
-		if (!ring->tx_info) {
-			err = -ENOMEM;
-			goto err_ring;
-		}
+		err = -ENOMEM;
+		goto err_ring;
 	}
 
 	en_dbg(DRV, priv, "Allocated tx_info ring at addr:%p size:%d\n",
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index d614493ad5ac..a4e2af846d28 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -102,10 +102,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
 		return -ENXIO;
 	}
 
-	ndd->data = kmalloc(ndd->nsarea.config_size, GFP_KERNEL);
-	if (!ndd->data)
-		ndd->data = vmalloc(ndd->nsarea.config_size);
-
+	ndd->data = kvmalloc(ndd->nsarea.config_size, GFP_KERNEL);
 	if (!ndd->data)
 		return -ENOMEM;
 
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index e8c7f09d01be..9c97d0a24819 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -87,18 +87,6 @@ struct user_evtchn {
 	bool enabled;
 };
 
-static evtchn_port_t *evtchn_alloc_ring(unsigned int size)
-{
-	evtchn_port_t *ring;
-	size_t s = size * sizeof(*ring);
-
-	ring = kmalloc(s, GFP_KERNEL);
-	if (!ring)
-		ring = vmalloc(s);
-
-	return ring;
-}
-
 static void evtchn_free_ring(evtchn_port_t *ring)
 {
 	kvfree(ring);
@@ -334,7 +322,7 @@ static int evtchn_resize_ring(struct per_user_data *u)
 	else
 		new_size = 2 * u->ring_size;
 
-	new_ring = evtchn_alloc_ring(new_size);
+	new_ring = kvmalloc(new_size * sizeof(*new_ring), GFP_KERNEL);
 	if (!new_ring)
 		return -ENOMEM;
 
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 25286a5912fc..bebf4691a430 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5369,13 +5369,10 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
 		goto out;
 	}
 
-	tmp_buf = kmalloc(left_root->nodesize, GFP_KERNEL | __GFP_NOWARN);
+	tmp_buf = kvmalloc(left_root->nodesize, GFP_KERNEL);
 	if (!tmp_buf) {
-		tmp_buf = vmalloc(left_root->nodesize);
-		if (!tmp_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	left_path->search_commit_root = 1;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 92b09a51742c..86e681db3cab 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3541,12 +3541,9 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 	u64 last_dest_end = destoff;
 
 	ret = -ENOMEM;
-	buf = kmalloc(root->nodesize, GFP_KERNEL | __GFP_NOWARN);
-	if (!buf) {
-		buf = vmalloc(root->nodesize);
-		if (!buf)
-			return ret;
-	}
+	buf = kvmalloc(root->nodesize, GFP_KERNEL);
+	if (!buf)
+		return ret;
 
 	path = btrfs_alloc_path();
 	if (!path) {
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 71261b459863..64f09f32b0d5 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6243,22 +6243,16 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
 	sctx->send_max_size = BTRFS_SEND_BUF_SIZE;
-	sctx->send_buf = kmalloc(sctx->send_max_size, GFP_KERNEL | __GFP_NOWARN);
+	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
-		sctx->send_buf = vmalloc(sctx->send_max_size);
-		if (!sctx->send_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
-	sctx->read_buf = kmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL | __GFP_NOWARN);
+	sctx->read_buf = kvmalloc(BTRFS_SEND_READ_SIZE, GFP_KERNEL);
 	if (!sctx->read_buf) {
-		sctx->read_buf = vmalloc(BTRFS_SEND_READ_SIZE);
-		if (!sctx->read_buf) {
-			ret = -ENOMEM;
-			goto out;
-		}
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	sctx->pending_dir_moves = RB_ROOT;
@@ -6279,13 +6273,10 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
 	alloc_size = arg->clone_sources_count * sizeof(*arg->clone_sources);
 
 	if (arg->clone_sources_count) {
-		clone_sources_tmp = kmalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
+		clone_sources_tmp = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!clone_sources_tmp) {
-			clone_sources_tmp = vmalloc(alloc_size);
-			if (!clone_sources_tmp) {
-				ret = -ENOMEM;
-				goto out;
-			}
+			ret = -ENOMEM;
+			goto out;
 		}
 
 		ret = copy_from_user(clone_sources_tmp, arg->clone_sources,
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 921596eec664..74106e5f912a 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -74,12 +74,9 @@ dio_get_pages_alloc(const struct iov_iter *it, size_t nbytes,
 	align = (unsigned long)(it->iov->iov_base + it->iov_offset) &
 		(PAGE_SIZE - 1);
 	npages = calc_pages_for(align, nbytes);
-	pages = kmalloc(sizeof(*pages) * npages, GFP_KERNEL);
-	if (!pages) {
-		pages = vmalloc(sizeof(*pages) * npages);
-		if (!pages)
-			return ERR_PTR(-ENOMEM);
-	}
+	pages = kvmalloc(sizeof(*pages) * npages, GFP_KERNEL);
+	if (!pages)
+		return ERR_PTR(-ENOMEM);
 
 	for (idx = 0; idx < npages; ) {
 		size_t start;
diff --git a/fs/select.c b/fs/select.c
index 3d4f85defeab..175dd1b9b324 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -586,10 +586,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
 			goto out_nofds;
 
 		alloc_size = 6 * size;
-		bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
-		if (!bits && alloc_size > PAGE_SIZE)
-			bits = vmalloc(alloc_size);
-
+		bits = kvmalloc(alloc_size, GFP_KERNEL);
 		if (!bits)
 			goto out_nofds;
 	}
diff --git a/fs/xattr.c b/fs/xattr.c
index 2d13b4e62fae..f3fb03a5b9b7 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -431,12 +431,9 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			return -E2BIG;
-		kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 		if (copy_from_user(kvalue, value, size)) {
 			error = -EFAULT;
 			goto out;
@@ -528,12 +525,9 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
 			size = XATTR_SIZE_MAX;
-		kvalue = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
-		if (!kvalue) {
-			kvalue = vmalloc(size);
-			if (!kvalue)
-				return -ENOMEM;
-		}
+		kvalue = kvmalloc(size, GFP_KERNEL);
+		if (!kvalue)
+			return -ENOMEM;
 	}
 
 	error = vfs_getxattr(d, kname, kvalue, size);
@@ -611,12 +605,9 @@ listxattr(struct dentry *d, char __user *list, size_t size)
 	if (size) {
 		if (size > XATTR_LIST_MAX)
 			size = XATTR_LIST_MAX;
-		klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
-		if (!klist) {
-			klist = vmalloc(size);
-			if (!klist)
-				return -ENOMEM;
-		}
+		klist = kvmalloc(size, GFP_KERNEL);
+		if (!klist)
+			return -ENOMEM;
 	}
 
 	error = vfs_listxattr(d, klist, size);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 228892dabba6..710d9783a549 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -960,10 +960,7 @@ EXPORT_SYMBOL(iov_iter_get_pages);
 
 static struct page **get_pages_array(size_t n)
 {
-	struct page **p = kmalloc(n * sizeof(struct page *), GFP_KERNEL);
-	if (!p)
-		p = vmalloc(n * sizeof(struct page *));
-	return p;
+	return kvmalloc(n * sizeof(struct page *), GFP_KERNEL);
 }
 
 static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index db77dcb38afd..72ebec18629c 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -200,10 +200,7 @@ struct frame_vector *frame_vector_create(unsigned int nr_frames)
 	 * Avoid higher order allocations, use vmalloc instead. It should
 	 * be rare anyway.
 	 */
-	if (size <= PAGE_SIZE)
-		vec = kmalloc(size, GFP_KERNEL);
-	else
-		vec = vmalloc(size);
+	vec = kvmalloc(size, GFP_KERNEL);
 	if (!vec)
 		return NULL;
 	vec->nr_allocated = nr_frames;
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 2ff499680cc6..0a5cc1237afe 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,17 +712,8 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-	unsigned int *off;
+	return kvmalloc(size * sizeof(unsigned int), GFP_KERNEL);;
 
-	off = kcalloc(size, sizeof(unsigned int), GFP_KERNEL | __GFP_NOWARN);
-
-	if (off)
-		return off;
-
-	if (size < (SIZE_MAX / sizeof(unsigned int)))
-		off = vmalloc(size * sizeof(unsigned int));
-
-	return off;
 }
 EXPORT_SYMBOL(xt_alloc_entry_offsets);
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 86309a3156a5..5678eff40f61 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -624,16 +624,6 @@ static void fq_rehash(struct fq_sched_data *q,
 	q->stat_gc_flows += fcnt;
 }
 
-static void *fq_alloc_node(size_t sz, int node)
-{
-	void *ptr;
-
-	ptr = kmalloc_node(sz, GFP_KERNEL | __GFP_REPEAT | __GFP_NOWARN, node);
-	if (!ptr)
-		ptr = vmalloc_node(sz, node);
-	return ptr;
-}
-
 static void fq_free(void *addr)
 {
 	kvfree(addr);
@@ -650,7 +640,7 @@ static int fq_resize(struct Qdisc *sch, u32 log)
 		return 0;
 
 	/* If XPS was setup, we can allocate memory on right NUMA node */
-	array = fq_alloc_node(sizeof(struct rb_root) << log,
+	array = kvmalloc_node(sizeof(struct rb_root) << log, GFP_KERNEL,
 			      netdev_queue_numa_node_read(sch->dev_queue));
 	if (!array)
 		return -ENOMEM;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 9f7b380cf0a3..ded4d4238ea2 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -692,15 +692,11 @@ static int get_dist_table(struct Qdisc *sch, const struct nlattr *attr)
 	spinlock_t *root_lock;
 	struct disttable *d;
 	int i;
-	size_t s;
 
 	if (n > NETEM_DIST_MAX)
 		return -EINVAL;
 
-	s = sizeof(struct disttable) + n * sizeof(s16);
-	d = kmalloc(s, GFP_KERNEL | __GFP_NOWARN);
-	if (!d)
-		d = vmalloc(s);
+	d = kvmalloc(sizeof(struct disttable) + n * sizeof(s16), GFP_KERNEL);
 	if (!d)
 		return -ENOMEM;
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 7f195ed4d568..5d70cd6a032d 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -684,11 +684,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
 static void *sfq_alloc(size_t sz)
 {
-	void *ptr = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN);
-
-	if (!ptr)
-		ptr = vmalloc(sz);
-	return ptr;
+	return  kvmalloc(sz, GFP_KERNEL);
 }
 
 static void sfq_free(void *addr)
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index e215e6d0c47c..186c975dba0c 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -99,14 +99,9 @@ SYSCALL_DEFINE5(add_key, const char __user *, _type,
 
 	if (_payload) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error2;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error2;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error2;
 
 		ret = -EFAULT;
 		if (copy_from_user(payload, _payload, plen) != 0)
@@ -1064,14 +1059,9 @@ long keyctl_instantiate_key_common(key_serial_t id,
 
 	if (from) {
 		ret = -ENOMEM;
-		payload = kmalloc(plen, GFP_KERNEL);
-		if (!payload) {
-			if (plen <= PAGE_SIZE)
-				goto error;
-			payload = vmalloc(plen);
-			if (!payload)
-				goto error;
-		}
+		payload = kvmalloc(plen, GFP_KERNEL);
+		if (!payload)
+			goto error;
 
 		ret = -EFAULT;
 		if (!copy_from_iter_full(payload, plen, from))
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] mm: support __GFP_REPEAT in kvmalloc_node
  2017-01-02 13:37 [PATCH] mm: introduce kv[mz]alloc helpers Michal Hocko
                   ` (2 preceding siblings ...)
  2017-01-04 14:20 ` Michal Hocko
@ 2017-01-04 18:12 ` Michal Hocko
  2017-01-06 12:09   ` Vlastimil Babka
  2017-01-06 13:29 ` [PATCH] mm: introduce kv[mz]alloc helpers Vlastimil Babka
  4 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-01-04 18:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Joe Perches, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o, Andreas Dilger

While checking opencoded users I've encountered that vhost code would
really like to use kvmalloc with __GFP_REPEAT [1] so the following patch
adds support for __GFP_REPEAT and converts both vhost users.

So currently I am sitting on 3 patches. I will wait for more feedback -
especially about potential split ups or cleanups few more days and then
repost the whole series.

[1] http://lkml.kernel.org/r/20170104150800.GO25453@dhcp22.suse.cz
---
>From 0b92e4d2e040524b878d4e7b9ee88fbad5284b33 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 4 Jan 2017 18:01:39 +0100
Subject: [PATCH] mm: support __GFP_REPEAT in kvmalloc_node

vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
vhost_vsock because it would really like to prefer kmalloc to the
vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
allocation to vmalloc") for more context. Michael Tsirkin has also
noted:
"
__GFP_REPEAT overhead is during allocation time.  Using vmalloc means all
accesses are slowed down.  Allocation is not on data path, accesses are.
"

Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two
things to be careful about. First we should prevent from the OOM killer
and so have to involve __GFP_NORETRY by default and secondly override
__GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored
for !costly orders.

This patch shouldn't introduce any functional change.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/vhost/net.c   | 9 +++------
 drivers/vhost/vsock.c | 9 +++------
 mm/util.c             | 9 +++++++--
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc34653274a..105cd04c7414 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	struct vhost_virtqueue **vqs;
 	int i;
 
-	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!n) {
-		n = vmalloc(sizeof *n);
-		if (!n)
-			return -ENOMEM;
-	}
+	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);
+	if (!n)
+		return -ENOMEM;
 	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
 	if (!vqs) {
 		kvfree(n);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bbbf588540ed..7e0159867553 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -455,12 +455,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	/* This struct is large and allocation could fail, fall back to vmalloc
 	 * if there is no other way.
 	 */
-	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
-	if (!vsock) {
-		vsock = vmalloc(sizeof(*vsock));
-		if (!vsock)
-			return -ENOMEM;
-	}
+	vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_REPEAT);
+	if (!vsock)
+		return -ENOMEM;
 
 	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
 	if (!vqs) {
diff --git a/mm/util.c b/mm/util.c
index 8e4ea6cbe379..a2bfb85e60e5 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -348,8 +348,13 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	 * Make sure that larger requests are not too disruptive - no OOM
 	 * killer and no allocation failure warnings as we have a fallback
 	 */
-	if (size > PAGE_SIZE)
-		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
+	if (size > PAGE_SIZE) {
+		kmalloc_flags |= __GFP_NOWARN;
+
+		if (!(kmalloc_flags & __GFP_REPEAT) ||
+				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
+			kmalloc_flags |= __GFP_NORETRY;
+	}
 
 	ret = kmalloc_node(size, kmalloc_flags, node);
 
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: support __GFP_REPEAT in kvmalloc_node
  2017-01-04 18:12 ` [PATCH] mm: support __GFP_REPEAT in kvmalloc_node Michal Hocko
@ 2017-01-06 12:09   ` Vlastimil Babka
  2017-01-06 12:31     ` Michal Hocko
  2017-01-09  8:50     ` Michal Hocko
  0 siblings, 2 replies; 14+ messages in thread
From: Vlastimil Babka @ 2017-01-06 12:09 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: David Rientjes, Mel Gorman, Johannes Weiner, Al Viro, linux-mm,
	LKML, kvm, linux-f2fs-devel, linux-security-module, linux-ext4,
	Joe Perches, Anatoly Stepanov, Paolo Bonzini, Mike Snitzer,
	Michael S. Tsirkin, Theodore Ts'o, Andreas Dilger

On 01/04/2017 07:12 PM, Michal Hocko wrote:
> While checking opencoded users I've encountered that vhost code would
> really like to use kvmalloc with __GFP_REPEAT [1] so the following patch
> adds support for __GFP_REPEAT and converts both vhost users.
> 
> So currently I am sitting on 3 patches. I will wait for more feedback -
> especially about potential split ups or cleanups few more days and then
> repost the whole series.
> 
> [1] http://lkml.kernel.org/r/20170104150800.GO25453@dhcp22.suse.cz
> ---
> From 0b92e4d2e040524b878d4e7b9ee88fbad5284b33 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 4 Jan 2017 18:01:39 +0100
> Subject: [PATCH] mm: support __GFP_REPEAT in kvmalloc_node
> 
> vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
> vhost_vsock because it would really like to prefer kmalloc to the
> vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
> allocation to vmalloc") for more context. Michael Tsirkin has also
> noted:
> "
> __GFP_REPEAT overhead is during allocation time.  Using vmalloc means all
> accesses are slowed down.  Allocation is not on data path, accesses are.
> "
> 
> Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two
> things to be careful about. First we should prevent from the OOM killer
> and so have to involve __GFP_NORETRY by default and secondly override
> __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored
> for !costly orders.
> 
> This patch shouldn't introduce any functional change.

Which is because the converted usages are always used for costly order,
right.

> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/vhost/net.c   | 9 +++------
>  drivers/vhost/vsock.c | 9 +++------
>  mm/util.c             | 9 +++++++--
>  3 files changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc34653274a..105cd04c7414 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -797,12 +797,9 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  	struct vhost_virtqueue **vqs;
>  	int i;
>  
> -	n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!n) {
> -		n = vmalloc(sizeof *n);
> -		if (!n)
> -			return -ENOMEM;
> -	}
> +	n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);
> +	if (!n)
> +		return -ENOMEM;
>  	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
>  		kvfree(n);
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bbbf588540ed..7e0159867553 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -455,12 +455,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>  	/* This struct is large and allocation could fail, fall back to vmalloc
>  	 * if there is no other way.
>  	 */
> -	vsock = kzalloc(sizeof(*vsock), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> -	if (!vsock) {
> -		vsock = vmalloc(sizeof(*vsock));
> -		if (!vsock)
> -			return -ENOMEM;
> -	}
> +	vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_REPEAT);
> +	if (!vsock)
> +		return -ENOMEM;
>  
>  	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
> diff --git a/mm/util.c b/mm/util.c
> index 8e4ea6cbe379..a2bfb85e60e5 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -348,8 +348,13 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	 * Make sure that larger requests are not too disruptive - no OOM
>  	 * killer and no allocation failure warnings as we have a fallback
>  	 */
> -	if (size > PAGE_SIZE)
> -		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> +	if (size > PAGE_SIZE) {
> +		kmalloc_flags |= __GFP_NOWARN;
> +
> +		if (!(kmalloc_flags & __GFP_REPEAT) ||
> +				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> +			kmalloc_flags |= __GFP_NORETRY;

I think this would be more understandable for me if it was written in
the opposite way, i.e. "if we have costly __GFP_REPEAT allocation, don't
use __GFP_NORETRY", but nevermind, seems correct to me wrt current
handling of both flags in the page allocator. And it serves as a good
argument to have this wrapper in mm/ as we are hopefully more likely to
keep it working as intended with future changes, than all the opencoded
variants.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> +	}
>  
>  	ret = kmalloc_node(size, kmalloc_flags, node);
>  
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: support __GFP_REPEAT in kvmalloc_node
  2017-01-06 12:09   ` Vlastimil Babka
@ 2017-01-06 12:31     ` Michal Hocko
  2017-01-09  8:50     ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-01-06 12:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Joe Perches, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o, Andreas Dilger

On Fri 06-01-17 13:09:36, Vlastimil Babka wrote:
> On 01/04/2017 07:12 PM, Michal Hocko wrote:
[...]
> > diff --git a/mm/util.c b/mm/util.c
> > index 8e4ea6cbe379..a2bfb85e60e5 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -348,8 +348,13 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
> >  	 * Make sure that larger requests are not too disruptive - no OOM
> >  	 * killer and no allocation failure warnings as we have a fallback
> >  	 */
> > -	if (size > PAGE_SIZE)
> > -		kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
> > +	if (size > PAGE_SIZE) {
> > +		kmalloc_flags |= __GFP_NOWARN;
> > +
> > +		if (!(kmalloc_flags & __GFP_REPEAT) ||
> > +				(size <= PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> > +			kmalloc_flags |= __GFP_NORETRY;
> 
> I think this would be more understandable for me if it was written in
> the opposite way, i.e. "if we have costly __GFP_REPEAT allocation, don't
> use __GFP_NORETRY",

Dunno, doesn't look much simpler to me
		kmalloc_flags |= __GFP_NORETRY;
		if ((kmalloc_flags & __GFP_REPEAT) &&
				(size > PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
			kmalloc_flags &= ~__GFP_NORETRY;
		}

> but nevermind, seems correct to me wrt current
> handling of both flags in the page allocator. And it serves as a good
> argument to have this wrapper in mm/ as we are hopefully more likely to
> keep it working as intended with future changes, than all the opencoded
> variants.
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: introduce kv[mz]alloc helpers
  2017-01-02 13:37 [PATCH] mm: introduce kv[mz]alloc helpers Michal Hocko
                   ` (3 preceding siblings ...)
  2017-01-04 18:12 ` [PATCH] mm: support __GFP_REPEAT in kvmalloc_node Michal Hocko
@ 2017-01-06 13:29 ` Vlastimil Babka
  2017-01-06 13:34   ` Michal Hocko
  4 siblings, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2017-01-06 13:29 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: David Rientjes, Mel Gorman, Johannes Weiner, Al Viro, linux-mm,
	LKML, kvm, linux-f2fs-devel, linux-security-module, linux-ext4,
	Joe Perches, Michal Hocko, Anatoly Stepanov, Paolo Bonzini,
	Mike Snitzer, Michael S. Tsirkin, Theodore Ts'o,
	Andreas Dilger

On 01/02/2017 02:37 PM, Michal Hocko wrote:
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -514,18 +514,9 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
>  
> -static void *vhost_kvzalloc(unsigned long size)
> -{
> -	void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);

Hi, I just noticed that this had __GFP_REPEAT, so you'll probably want
to move these hunks to patch 3 with the rest of vhost.

> -
> -	if (!n)
> -		n = vzalloc(size);
> -	return n;
> -}
> -
>  struct vhost_umem *vhost_dev_reset_owner_prepare(void)
>  {
> -	return vhost_kvzalloc(sizeof(struct vhost_umem));
> +	return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
>  
> @@ -1189,7 +1180,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
>  
>  static struct vhost_umem *vhost_umem_alloc(void)
>  {
> -	struct vhost_umem *umem = vhost_kvzalloc(sizeof(*umem));
> +	struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL);
>  
>  	if (!umem)
>  		return NULL;
> @@ -1215,7 +1206,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  		return -EOPNOTSUPP;
>  	if (mem.nregions > max_mem_regions)
>  		return -E2BIG;
> -	newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions));
> +	newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
>  	if (!newmem)
>  		return -ENOMEM;
>  
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 7ae43c59bc79..bb49409172d9 100644

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: introduce kv[mz]alloc helpers
  2017-01-06 13:29 ` [PATCH] mm: introduce kv[mz]alloc helpers Vlastimil Babka
@ 2017-01-06 13:34   ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-01-06 13:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Joe Perches, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o, Andreas Dilger

On Fri 06-01-17 14:29:33, Vlastimil Babka wrote:
> On 01/02/2017 02:37 PM, Michal Hocko wrote:
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -514,18 +514,9 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
> >  
> > -static void *vhost_kvzalloc(unsigned long size)
> > -{
> > -	void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> 
> Hi, I just noticed that this had __GFP_REPEAT, so you'll probably want
> to move these hunks to patch 3 with the rest of vhost.

Well, spotted. I will do that. Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: introduce kv[mz]alloc helpers
  2017-01-04 14:20 ` Michal Hocko
@ 2017-01-06 14:36   ` Vlastimil Babka
  2017-01-06 15:10     ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2017-01-06 14:36 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: David Rientjes, Mel Gorman, Johannes Weiner, Al Viro, linux-mm,
	LKML, kvm, linux-f2fs-devel, linux-security-module, linux-ext4,
	Joe Perches, Anatoly Stepanov, Paolo Bonzini, Mike Snitzer,
	Michael S. Tsirkin, Theodore Ts'o, Andreas Dilger

On 01/04/2017 03:20 PM, Michal Hocko wrote:
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index 2ff499680cc6..0a5cc1237afe 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -712,17 +712,8 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
>   */
>  unsigned int *xt_alloc_entry_offsets(unsigned int size)
>  {
> -	unsigned int *off;
> +	return kvmalloc(size * sizeof(unsigned int), GFP_KERNEL);;
>  
> -	off = kcalloc(size, sizeof(unsigned int), GFP_KERNEL | __GFP_NOWARN);
> -
> -	if (off)
> -		return off;
> -
> -	if (size < (SIZE_MAX / sizeof(unsigned int)))
> -		off = vmalloc(size * sizeof(unsigned int));
> -
> -	return off;

This one seems to have tried hard to avoid the multiplication overflow
by using kcalloc() and doing the size check before vmalloc(), so I
wonder if it's safe to just remove the checks completely?

>  }
>  EXPORT_SYMBOL(xt_alloc_entry_offsets);
>  
> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> index 86309a3156a5..5678eff40f61 100644
> --- a/net/sched/sch_fq.c
> +++ b/net/sched/sch_fq.c
> @@ -624,16 +624,6 @@ static void fq_rehash(struct fq_sched_data *q,
>  	q->stat_gc_flows += fcnt;
>  }
>  
> -static void *fq_alloc_node(size_t sz, int node)
> -{
> -	void *ptr;
> -
> -	ptr = kmalloc_node(sz, GFP_KERNEL | __GFP_REPEAT | __GFP_NOWARN, node);

Another patch 3 material?

> -	if (!ptr)
> -		ptr = vmalloc_node(sz, node);
> -	return ptr;
> -}
> -
>  static void fq_free(void *addr)
>  {
>  	kvfree(addr);
> @@ -650,7 +640,7 @@ static int fq_resize(struct Qdisc *sch, u32 log)
>  		return 0;
>  
>  	/* If XPS was setup, we can allocate memory on right NUMA node */
> -	array = fq_alloc_node(sizeof(struct rb_root) << log,
> +	array = kvmalloc_node(sizeof(struct rb_root) << log, GFP_KERNEL,
>  			      netdev_queue_numa_node_read(sch->dev_queue));
>  	if (!array)
>  		return -ENOMEM;

With that fixed,

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: introduce kv[mz]alloc helpers
  2017-01-06 14:36   ` Vlastimil Babka
@ 2017-01-06 15:10     ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-01-06 15:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Joe Perches, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o, Andreas Dilger

On Fri 06-01-17 15:36:04, Vlastimil Babka wrote:
> On 01/04/2017 03:20 PM, Michal Hocko wrote:
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index 2ff499680cc6..0a5cc1237afe 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -712,17 +712,8 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
> >   */
> >  unsigned int *xt_alloc_entry_offsets(unsigned int size)
> >  {
> > -	unsigned int *off;
> > +	return kvmalloc(size * sizeof(unsigned int), GFP_KERNEL);;
> >  
> > -	off = kcalloc(size, sizeof(unsigned int), GFP_KERNEL | __GFP_NOWARN);
> > -
> > -	if (off)
> > -		return off;
> > -
> > -	if (size < (SIZE_MAX / sizeof(unsigned int)))
> > -		off = vmalloc(size * sizeof(unsigned int));
> > -
> > -	return off;
> 
> This one seems to have tried hard to avoid the multiplication overflow
> by using kcalloc() and doing the size check before vmalloc(), so I
> wonder if it's safe to just remove the checks completely?

Tetsuo has already pointed that out and it is fixed in my loacal
version.
 
> >  }
> >  EXPORT_SYMBOL(xt_alloc_entry_offsets);
> >  
> > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> > index 86309a3156a5..5678eff40f61 100644
> > --- a/net/sched/sch_fq.c
> > +++ b/net/sched/sch_fq.c
> > @@ -624,16 +624,6 @@ static void fq_rehash(struct fq_sched_data *q,
> >  	q->stat_gc_flows += fcnt;
> >  }
> >  
> > -static void *fq_alloc_node(size_t sz, int node)
> > -{
> > -	void *ptr;
> > -
> > -	ptr = kmalloc_node(sz, GFP_KERNEL | __GFP_REPEAT | __GFP_NOWARN, node);
> 
> Another patch 3 material?

I will just drop this part for now. c3bd85495aef6 doesn't say a word about why
it needs __GFP_REPEAT. I will ask Eric.
 
> > -	if (!ptr)
> > -		ptr = vmalloc_node(sz, node);
> > -	return ptr;
> > -}
> > -
> >  static void fq_free(void *addr)
> >  {
> >  	kvfree(addr);
> > @@ -650,7 +640,7 @@ static int fq_resize(struct Qdisc *sch, u32 log)
> >  		return 0;
> >  
> >  	/* If XPS was setup, we can allocate memory on right NUMA node */
> > -	array = fq_alloc_node(sizeof(struct rb_root) << log,
> > +	array = kvmalloc_node(sizeof(struct rb_root) << log, GFP_KERNEL,
> >  			      netdev_queue_numa_node_read(sch->dev_queue));
> >  	if (!array)
> >  		return -ENOMEM;
> 
> With that fixed,
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: support __GFP_REPEAT in kvmalloc_node
  2017-01-06 12:09   ` Vlastimil Babka
  2017-01-06 12:31     ` Michal Hocko
@ 2017-01-09  8:50     ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-01-09  8:50 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Rientjes, Mel Gorman, Johannes Weiner,
	Al Viro, linux-mm, LKML, kvm, linux-f2fs-devel,
	linux-security-module, linux-ext4, Joe Perches, Anatoly Stepanov,
	Paolo Bonzini, Mike Snitzer, Michael S. Tsirkin,
	Theodore Ts'o, Andreas Dilger

On Fri 06-01-17 13:09:36, Vlastimil Babka wrote:
> On 01/04/2017 07:12 PM, Michal Hocko wrote:
> > While checking opencoded users I've encountered that vhost code would
> > really like to use kvmalloc with __GFP_REPEAT [1] so the following patch
> > adds support for __GFP_REPEAT and converts both vhost users.
> > 
> > So currently I am sitting on 3 patches. I will wait for more feedback -
> > especially about potential split ups or cleanups few more days and then
> > repost the whole series.
> > 
> > [1] http://lkml.kernel.org/r/20170104150800.GO25453@dhcp22.suse.cz
> > ---
> > From 0b92e4d2e040524b878d4e7b9ee88fbad5284b33 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Wed, 4 Jan 2017 18:01:39 +0100
> > Subject: [PATCH] mm: support __GFP_REPEAT in kvmalloc_node
> > 
> > vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
> > vhost_vsock because it would really like to prefer kmalloc to the
> > vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
> > allocation to vmalloc") for more context. Michael Tsirkin has also
> > noted:
> > "
> > __GFP_REPEAT overhead is during allocation time.  Using vmalloc means all
> > accesses are slowed down.  Allocation is not on data path, accesses are.
> > "
> > 
> > Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two
> > things to be careful about. First we should prevent from the OOM killer
> > and so have to involve __GFP_NORETRY by default and secondly override
> > __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored
> > for !costly orders.
> > 
> > This patch shouldn't introduce any functional change.
> 
> Which is because the converted usages are always used for costly order,
> right.

I have overlooked this remark previously. You are right. And I've
updated the documentation and also the inline comment to be more
explicit about this. We do not have a good way to support __GFP_REPEAT
for !costly orders currently unfortunatelly. Maybe I should revive my
__GFP_RETRY_MAYFAIL patch, this would be another user (outside of xfs
which already wants something like that for KM_MAYFAIL.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-01-09  8:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-02 13:37 [PATCH] mm: introduce kv[mz]alloc helpers Michal Hocko
2017-01-02 15:55 ` Joe Perches
2017-01-02 16:02   ` Michal Hocko
2017-01-03 10:23 ` Vlastimil Babka
2017-01-03 10:33   ` Michal Hocko
2017-01-04 14:20 ` Michal Hocko
2017-01-06 14:36   ` Vlastimil Babka
2017-01-06 15:10     ` Michal Hocko
2017-01-04 18:12 ` [PATCH] mm: support __GFP_REPEAT in kvmalloc_node Michal Hocko
2017-01-06 12:09   ` Vlastimil Babka
2017-01-06 12:31     ` Michal Hocko
2017-01-09  8:50     ` Michal Hocko
2017-01-06 13:29 ` [PATCH] mm: introduce kv[mz]alloc helpers Vlastimil Babka
2017-01-06 13:34   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).