All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Garnier <thgarnie@google.com>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Joel Fernandes <joelaf@google.com>,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [RESEND PATCH 1/3] mm/vmap: keep track of free blocks for vmap allocation
Date: Thu, 4 Apr 2019 16:52:45 +0000	[thread overview]
Message-ID: <20190404165240.GA9713@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <20190404154320.pf3lkwm5zcblvsfv@pc636>

On Thu, Apr 04, 2019 at 05:43:20PM +0200, Uladzislau Rezki wrote:
> Hello, Roman!
> 
> > 
> > The patch looks really good to me! I've tried hard, but didn't find
> > any serious issues/bugs. Some small nits below.
> > 
> > Thank you for working on it!
> > 
> I try my best, thank you for your review!
> 
> > BTW, when sending a new iteration, please use "[PATCH vX]" subject prefix,
> > e.g. [PATCH v3 1/3] mm/vmap: keep track of free blocks for vmap allocation".
> > RESEND usually means that you're sending the same version, e.g. when
> > you need cc more people.
> > 
> Thank you for the clarification. I will fix that next time.
> 
> > 
> > On Tue, Apr 02, 2019 at 06:25:29PM +0200, Uladzislau Rezki (Sony) wrote:
> > > Currently an allocation of the new vmap area is done over busy
> > > list iteration(complexity O(n)) until a suitable hole is found
> > > between two busy areas. Therefore each new allocation causes
> > > the list being grown. Due to over fragmented list and different
> > > permissive parameters an allocation can take a long time. For
> > > example on embedded devices it is milliseconds.
> > > 
> > > This patch organizes the KVA memory layout into free areas of the
> > > 1-ULONG_MAX range. It uses an augment red-black tree that keeps
> > > blocks sorted by their offsets in pair with linked list keeping
> > > the free space in order of increasing addresses.
> > > 
> > > Each vmap_area object contains the "subtree_max_size" that reflects
> > > a maximum available free block in its left or right sub-tree. Thus,
> > > that allows to take a decision and traversal toward the block that
> > > will fit and will have the lowest start address, i.e. sequential
> > > allocation.
> > 
> > I'd add here that an augmented red-black tree is used, and nodes
> > are augmented with the size of the maximum available free block.
> > 
> Will add.
> 
> > > 
> > > Allocation: to allocate a new block a search is done over the
> > > tree until a suitable lowest(left most) block is large enough
> > > to encompass: the requested size, alignment and vstart point.
> > > If the block is bigger than requested size - it is split.
> > > 
> > > De-allocation: when a busy vmap area is freed it can either be
> > > merged or inserted to the tree. Red-black tree allows efficiently
> > > find a spot whereas a linked list provides a constant-time access
> > > to previous and next blocks to check if merging can be done. In case
> > > of merging of de-allocated memory chunk a large coalesced area is
> > > created.
> > > 
> > > Complexity: ~O(log(N))
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > ---
> > >  include/linux/vmalloc.h |    6 +-
> > >  mm/vmalloc.c            | 1004 +++++++++++++++++++++++++++++++++++------------
> > >  2 files changed, 762 insertions(+), 248 deletions(-)
> > > 
> > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > > index 398e9c95cd61..ad483378fdd1 100644
> > > --- a/include/linux/vmalloc.h
> > > +++ b/include/linux/vmalloc.h
> > > @@ -45,12 +45,16 @@ struct vm_struct {
> > >  struct vmap_area {
> > >  	unsigned long va_start;
> > >  	unsigned long va_end;
> > > +
> > > +	/*
> > > +	 * Largest available free size in subtree.
> > > +	 */
> > > +	unsigned long subtree_max_size;
> > >  	unsigned long flags;
> > >  	struct rb_node rb_node;         /* address sorted rbtree */
> > >  	struct list_head list;          /* address sorted list */
> > >  	struct llist_node purge_list;    /* "lazy purge" list */
> > >  	struct vm_struct *vm;
> > > -	struct rcu_head rcu_head;
> > >  };
> > >  
> > >  /*
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 755b02983d8d..3adbad3fb6c1 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -31,6 +31,7 @@
> > >  #include <linux/compiler.h>
> > >  #include <linux/llist.h>
> > >  #include <linux/bitops.h>
> > > +#include <linux/rbtree_augmented.h>
> > >  
> > >  #include <linux/uaccess.h>
> > >  #include <asm/tlbflush.h>
> > > @@ -320,9 +321,7 @@ unsigned long vmalloc_to_pfn(const void *vmalloc_addr)
> > >  }
> > >  EXPORT_SYMBOL(vmalloc_to_pfn);
> > >  
> > > -
> > >  /*** Global kva allocator ***/
> > > -
> > 
> > Do we need this change?
> >
> This patch does not tend to refactor the code. I have removed extra empty
> lines because i touched the code around. I can either keep that change or
> remove it. What is your opinion?

Usually it's better to separate cosmetic changes from functional, if you're
not touching directly these lines. Not a big deal, of course.

> 
> > >  #define VM_LAZY_FREE	0x02
> > >  #define VM_VM_AREA	0x04
> > >  
> > > @@ -331,14 +330,76 @@ static DEFINE_SPINLOCK(vmap_area_lock);
> > >  LIST_HEAD(vmap_area_list);
> > >  static LLIST_HEAD(vmap_purge_list);
> > >  static struct rb_root vmap_area_root = RB_ROOT;
> > > +static bool vmap_initialized __read_mostly;
> > > +
> > > +/*
> > > + * This kmem_cache is used for vmap_area objects. Instead of
> > > + * allocating from slab we reuse an object from this cache to
> > > + * make things faster. Especially in "no edge" splitting of
> > > + * free block.
> > > + */
> > > +static struct kmem_cache *vmap_area_cachep;
> > > +
> > > +/*
> > > + * This linked list is used in pair with free_vmap_area_root.
> > > + * It gives O(1) access to prev/next to perform fast coalescing.
> > > + */
> > > +static LIST_HEAD(free_vmap_area_list);
> > > +
> > > +/*
> > > + * This augment red-black tree represents the free vmap space.
> > > + * All vmap_area objects in this tree are sorted by va->va_start
> > > + * address. It is used for allocation and merging when a vmap
> > > + * object is released.
> > > + *
> > > + * Each vmap_area node contains a maximum available free block
> > > + * of its sub-tree, right or left. Therefore it is possible to
> > > + * find a lowest match of free area.
> > > + */
> > > +static struct rb_root free_vmap_area_root = RB_ROOT;
> > >  
> > > -/* The vmap cache globals are protected by vmap_area_lock */
> > > -static struct rb_node *free_vmap_cache;
> > > -static unsigned long cached_hole_size;
> > > -static unsigned long cached_vstart;
> > > -static unsigned long cached_align;
> > > +static __always_inline unsigned long
> > > +__va_size(struct vmap_area *va)
> > > +{
> > > +	return (va->va_end - va->va_start);
> > > +}
> > > +
> > > +static __always_inline unsigned long
> > > +get_subtree_max_size(struct rb_node *node)
> > > +{
> > > +	struct vmap_area *va;
> > >  
> > > -static unsigned long vmap_area_pcpu_hole;
> > > +	va = rb_entry_safe(node, struct vmap_area, rb_node);
> > > +	return va ? va->subtree_max_size : 0;
> > > +}
> > > +
> > > +/*
> > > + * Gets called when remove the node and rotate.
> > > + */
> > > +static __always_inline unsigned long
> > > +compute_subtree_max_size(struct vmap_area *va)
> > > +{
> > > +	unsigned long max_size = __va_size(va);
> > > +	unsigned long child_max_size;
> > > +
> > > +	child_max_size = get_subtree_max_size(va->rb_node.rb_right);
> > > +	if (child_max_size > max_size)
> > > +		max_size = child_max_size;
> > > +
> > > +	child_max_size = get_subtree_max_size(va->rb_node.rb_left);
> > > +	if (child_max_size > max_size)
> > > +		max_size = child_max_size;
> > > +
> > > +	return max_size;
> > 
> > Nit: you can use max3 instead, e.g. :
> > 
> > return max3(__va_size(va),
> > 	    get_subtree_max_size(va->rb_node.rb_left),
> > 	    get_subtree_max_size(va->rb_node.rb_right));
> > 
> Good point. Will replace it!
> 
> > > +}
> > > +
> > > +RB_DECLARE_CALLBACKS(static, free_vmap_area_rb_augment_cb,
> > > +	struct vmap_area, rb_node, unsigned long, subtree_max_size,
> > > +	compute_subtree_max_size)
> > > +
> > > +static void purge_vmap_area_lazy(void);
> > > +static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
> > > +static unsigned long lazy_max_pages(void);
> > >  
> > >  static struct vmap_area *__find_vmap_area(unsigned long addr)
> > >  {
> > > @@ -359,41 +420,520 @@ static struct vmap_area *__find_vmap_area(unsigned long addr)
> > >  	return NULL;
> > >  }
> > >  
> > > -static void __insert_vmap_area(struct vmap_area *va)
> > > -{
> > > -	struct rb_node **p = &vmap_area_root.rb_node;
> > > -	struct rb_node *parent = NULL;
> > > -	struct rb_node *tmp;
> > > +/*
> > > + * This function returns back addresses of parent node
> > > + * and its left or right link for further processing.
> > > + */
> > > +static __always_inline struct rb_node **
> > > +__find_va_links(struct vmap_area *va,
> > > +	struct rb_root *root, struct rb_node *from,
> > > +	struct rb_node **parent)
> > 
> > The function looks much cleaner now, thank you!
> > 
> > But if I understand it correctly, it returns a node (via parent)
> > and a pointer to one of two links, so that the returned value
> > is always == parent + some constant offset.
> > If so, I wonder if it's cleaner to return a parent node
> > (as rb_node*) and a bool value which will indicate if the left
> > or the right link should be used.
> > 
> > Not a strong opinion, just an idea.
> > 
> I see your point. Yes, that is possible to return "bool" value that
> indicates left or right path. After that we can detect the direction.
> 
> From the other hand, we end up and access the correct link anyway during
> the traversal the tree. In case of "bool" way, we will need to add on top
> some extra logic that checks where to attach to.

Sure, makes sense. I'd add some comments here then.

Thanks!

  reply	other threads:[~2019-04-04 16:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-02 16:25 [RESEND PATCH 0/3] improve vmap allocation Uladzislau Rezki (Sony)
2019-04-02 16:25 ` [RESEND PATCH 1/3] mm/vmap: keep track of free blocks for " Uladzislau Rezki (Sony)
2019-04-03 21:06   ` Roman Gushchin
2019-04-04 15:43     ` Uladzislau Rezki
2019-04-04 16:52       ` Roman Gushchin [this message]
2019-04-04 17:21         ` Uladzislau Rezki
2019-04-02 16:25 ` [RESEND PATCH 2/3] mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro Uladzislau Rezki (Sony)
2019-04-03 21:15   ` Roman Gushchin
2019-04-04 15:53     ` Uladzislau Rezki
2019-04-02 16:25 ` [RESEND PATCH 3/3] mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro Uladzislau Rezki (Sony)
2019-04-03 21:10   ` Roman Gushchin
2019-04-02 19:26 ` [RESEND PATCH 0/3] improve vmap allocation Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190404165240.GA9713@tower.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=joelaf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mingo@elte.hu \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=thgarnie@google.com \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.