All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Chao Yu <chao2.yu@samsung.com>
Cc: "'Changman Lee'" <cm224.lee@samsung.com>,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache
Date: Fri, 23 Jan 2015 11:43:28 -0800	[thread overview]
Message-ID: <20150123194328.GC22493@jaegeuk-mac02.mot.com> (raw)
In-Reply-To: <00ed01d036d4$28fb0310$7af10930$@samsung.com>

On Fri, Jan 23, 2015 at 02:15:56PM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > Sent: Friday, January 23, 2015 9:48 AM
> > To: Chao Yu
> > Cc: Changman Lee; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org
> > Subject: Re: [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache
> > 
> > On Mon, Jan 12, 2015 at 03:14:48PM +0800, Chao Yu wrote:
> > > This patch adds core functions including slab cache init function and
> > > init/lookup/update/shrink/destroy function for rb-tree based extent cache.
> > >
> > > Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail
> > > design and implementation of extent cache.
> > >
> > > Todo:
> > >  * add a cached_ei into struct extent_tree for a quick recent cache.
> > >  * register rb-based extent cache shrink with mm shrink interface.
> > >  * disable dir inode's extent cache.
> > >

snip

> 
> > 
> > > +	}
> > 
> > How about adding __attach_extent_node()?
> 
> That's more readable.
> 
> How do you think of the following functions:
> 
> struct extent_node *__attach_extent_node(struct f2fs_sb_info *sbi,
> 				struct extent_tree *et, struct extent_info *ei,
> 				struct rb_node *parent, struct rb_node **p)
> {
> 	struct extent_node *en;
> 
> 	en = kmem_cache_alloc(extent_node_slab, GFP_ATOMIC);
> 	if (!en)
> 		return NULL;
> 
> 	en->ei = *ei;
> 	INIT_LIST_HEAD(&en->list);
> 
> 	rb_link_node(&en->rb_node, parent, p);
> 	rb_insert_color(&en->rb_node, &et->root);
> 	et->count++;
> 	atomic_inc(&sbi->total_ext_node);
> 	return en;
> }
> 
> static void __detach_extent_node(struct f2fs_sb_info *sbi,
> 				struct extent_tree *et, struct extent_node *en)
> {
> 	rb_erase(&en->rb_node, &et->root);
> 	et->count--;
> 	atomic_dec(&sbi->total_ext_node);
> }

Looks good to me.

> 
> > {
> > > +
> > > +	en = kmem_cache_alloc(extent_node_slab, GFP_ATOMIC);
> > 
> > 	en = f2fs_kmem_cache_alloc(.., GFP_ATOMIC);
> 
> We should avoid cond_resched() in f2fs_kmem_cache_alloc when we are holding write_lock.
> 
> IMO, it's better to return NULL if we fail to alloc extent_node here.
> Otherwise we'd better alloc extent_node before write_lock.

Oh, it's write_lock. Okay.

> 
> > 
> > > +	en->ei = *ei;
> > > +	INIT_LIST_HEAD(&en->list);
> > > +
> > > +	rb_link_node(&en->rb_node, parent, p);
> > > +	rb_insert_color(&en->rb_node, &et->root);
> > > +	atomic_inc(&sbi->total_ext_node);
> > > +	et->count++;
> > 
> > }
> > 
> > > +
> > > +	return en;
> > > +}
> > > +
> > > +static struct extent_node *__remove_extent_tree(struct extent_tree *et,
> > > +							unsigned int fofs)
> > 
> > This is __detach_extent_node()?
> > 
> > > +{
> > > +	struct rb_node *p = et->root.rb_node;
> > > +	struct extent_node *en;
> > > +
> > > +	while (p) {
> > > +		en = rb_entry(p, struct extent_node, rb_node);
> > > +
> > > +		if (fofs < en->ei.fofs)
> > > +			p = p->rb_left;
> > 
> > Coding style.
> > 
> > if () {
> > } else {
> > }
> 
> will fix.
> 
> > 
> > > +		else if (fofs >= en->ei.fofs + en->ei.len)
> > > +			p = p->rb_right;
> > > +		else {
> > > +			rb_erase(&en->rb_node, &et->root);
> > > +			et->count--;
> > 
> > Add here?
> > 			atomic_dec(&sbi->total_ext_node);
> 
> Agree, see __detach_extent_node() implementation above.
> 
> > 
> > > +			return en;
> > > +		}
> > > +	}
> > > +	return NULL;
> > > +}
> > > +
> > > +static unsigned int __free_extent_tree(struct f2fs_sb_info *sbi,
> > > +					struct extent_tree *et, bool free_all)
> > > +{
> > > +	struct rb_node *node, *next;
> > > +	struct extent_node *en;
> > > +	unsigned int count = et->count;
> > > +
> > > +	node = rb_first(&et->root);
> > > +	while (node) {
> > > +		next = rb_next(node);
> > > +		en = rb_entry(node, struct extent_node, rb_node);
> > > +
> > > +		if (free_all) {
> > > +			spin_lock(&sbi->extent_lock);
> > > +			if (!list_empty(&en->list))
> > > +				list_del_init(&en->list);
> > > +			spin_unlock(&sbi->extent_lock);
> > > +		}
> > > +
> > > +		if (free_all || list_empty(&en->list)) {
> > > +			rb_erase(node, &et->root);
> > > +			kmem_cache_free(extent_node_slab, en);
> > > +			atomic_dec(&sbi->total_ext_node);
> > > +			et->count--;
> > > +		}
> > > +		node = next;
> > > +	}
> > > +
> > > +	return count - et->count;
> > > +}
> > > +
> > > +static bool f2fs_lookup_extent_tree(struct inode *inode, pgoff_t pgofs,
> > > +							struct extent_info *ei)
> > > +{
> > > +	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > > +	struct extent_tree *et;
> > > +	struct extent_node *en;
> > > +
> > > +	if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT))
> > > +		return false;
> > > +
> > > +	down_read(&sbi->extent_tree_lock);
> > > +	et = radix_tree_lookup(&sbi->extent_tree_root, inode->i_ino);
> > > +	if (!et) {
> > > +		up_read(&sbi->extent_tree_lock);
> > > +		return false;
> > > +	}
> > > +	atomic_inc(&et->refcount);
> > > +	up_read(&sbi->extent_tree_lock);
> > > +
> > > +	read_lock(&et->lock);
> > > +	en = __lookup_extent_tree(et, pgofs);
> > > +	if (en) {
> > > +		*ei = en->ei;
> > > +		spin_lock(&sbi->extent_lock);
> > > +		if (!list_empty(&en->list))
> > > +			list_move_tail(&en->list, &sbi->extent_list);
> > > +		spin_unlock(&sbi->extent_lock);
> > > +		stat_inc_read_hit(sbi->sb);
> > > +	}
> > > +	stat_inc_total_hit(sbi->sb);
> > > +	read_unlock(&et->lock);
> > > +
> > > +	atomic_dec(&et->refcount);
> > > +	return en ? true : false;
> > > +}
> > > +
> > > +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs,
> > > +							block_t blkaddr)
> > > +{
> > > +	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > > +	nid_t ino = inode->i_ino;
> > > +	struct extent_tree *et;
> > > +	struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = NULL;
> > > +	struct extent_node *den = NULL;
> > > +	struct extent_info *pei;
> > > +	struct extent_info ei;
> > > +	unsigned int endofs;
> > > +
> > > +	if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT))
> > > +		return;
> > > +
> > > +retry:
> > > +	down_write(&sbi->extent_tree_lock);
> > > +	et = radix_tree_lookup(&sbi->extent_tree_root, ino);
> > > +	if (!et) {
> > > +		et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC);
> > 
> > et = f2fs_kmem_cache_alloc(.., GFP_ATOMIC);
> 
> How about modifying as below to avoid holding extent_tree_lock for long time?

Hmm. I don't think it doesn't take such the long time.
It's GFP_ATOMIC.

Moreover, for radix_tree, I prefer to use f2fs_radix_tree_insert with GFP_NOIO.
Since, we actually don't need to call kmem_cache_free.

> 
> et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC);
> if (!et) {
> 	up_write(&sbi->extent_tree_lock);
> 	cond_resched();
> 	goto retry;
> }
> 
> > 
> > > +		if (!et) {
> > > +			up_write(&sbi->extent_tree_lock);
> > > +			goto retry;
> > > +		}
> > > +		if (radix_tree_insert(&sbi->extent_tree_root, ino, et)) {
> > > +			up_write(&sbi->extent_tree_lock);
> > > +			kmem_cache_free(extent_tree_slab, et);
> 
> cond_resched()?

I'm not sure why this should be needed.
There is rw_semaphore, so we have already a rescheduling point.

> 
> > > +			goto retry;
> > > +		}
> > > +		memset(et, 0, sizeof(struct extent_tree));
> > > +		et->ino = ino;
> > > +		et->root = RB_ROOT;
> > > +		rwlock_init(&et->lock);
> > > +		atomic_set(&et->refcount, 0);
> > > +		et->count = 0;
> > > +		sbi->total_ext_tree++;
> > > +	}
> > > +	atomic_inc(&et->refcount);
> > > +	up_write(&sbi->extent_tree_lock);
> > > +
> > > +	write_lock(&et->lock);
> > > +
> > > +	/* 1. lookup and remove exist extent info in cache */
> > 
> >                                 existing
> 
> got it.
> 
> > 
> > > +	en = __remove_extent_tree(et, fofs);
> > 
> > en = __detach_extent_node();?
> 
> OK, it seems that most codes of __remove_extent_tree and __lookup_extent_tree 
> are the same, so here I want to remove extent node like this:
> 
> 	en = __lookup_extent_tree(et, fofs);
> 	if (!en)
> 		goto update_extent;
> 
> 	dei = en->ei;
> 	__detach_extent_node(sbi, et, en);
> 
> How do you think?

Looks good to me.

> 
> > 
> > > +	if (!en)
> > > +		goto update_extent;
> > > +
> > > +	pei = &en->ei;
> > > +	/* 2. if extent can be split more, split and insert the left part */
> > > +	if (pei->len > 1) {
> > > +		/*  insert left part of split extent into cache */
> > > +		if (pei->fofs < fofs) {
> > > +			set_extent_info(&ei, pei->fofs, pei->blk,
> > > +							fofs - pei->fofs);
> > > +			en1 = __insert_extent_tree(sbi, et, &ei, NULL);
> > > +		}
> > > +
> > > +		/* insert right part of split extent into cache */
> > > +		endofs = pei->fofs + pei->len - 1;
> > > +		if (endofs > fofs) {
> > > +			set_extent_info(&ei, fofs + 1,
> > > +				fofs - pei->fofs + pei->blk, endofs - fofs);
> > > +			en2 = __insert_extent_tree(sbi, et, &ei, NULL);
> > > +		}
> > > +	}
> > > +
> > > +update_extent:
> > > +	/* 3. update extent in extent cache */
> > > +	if (blkaddr) {
> > > +		set_extent_info(&ei, fofs, blkaddr, 1);
> > > +		en3 = __insert_extent_tree(sbi, et, &ei, &den);
> > > +	}
> > > +
> > > +	/* 4. update in global extent list */
> > > +	spin_lock(&sbi->extent_lock);
> > > +	if (en && !list_empty(&en->list))
> > > +		list_del_init(&en->list);
> > > +	/*
> > > +	 * en1 and en2 split from en, they will become more and more smaller
> > > +	 * fragments after splitting several times. So if the length is smaller
> > > +	 * than F2FS_MIN_EXTENT_LEN, we will not add them into extent_list,
> > > +	 * but just waiting shrinker to free them for reclaiming when OOM.
> > > +	 */
> > 
> > Can we just remove en1 and en2 in __insert_extent_tree?
> 
> en1 and en2 is newly added, in __attach_extent_node we do not add en1 and en2 into
> lru list, so if you do not want to keep split part in lru list, let's just remove
> the below codes.

What I meant was, how about avoiding attaching en1 and en2, if they are splits whose
lens are less than F2FS_MIN_EXTENT_LEN in advance?

> 
> > 
> > > +	if (en1 && en1->ei.len >= F2FS_MIN_EXTENT_LEN)
> > > +		list_add_tail(&en1->list, &sbi->extent_list);
> > > +	if (en2 && en2->ei.len >= F2FS_MIN_EXTENT_LEN)
> > > +		list_add_tail(&en2->list, &sbi->extent_list);
> > > +	if (en3) {
> > > +		if (list_empty(&en3->list))
> > > +			list_add_tail(&en3->list, &sbi->extent_list);
> > > +		else
> > > +			list_move_tail(&en3->list, &sbi->extent_list);
> > > +	}
> > > +	if (den && !list_empty(&den->list))
> > > +		list_del_init(&den->list);
> > > +	spin_unlock(&sbi->extent_lock);
> > > +
> > > +	if (en) {
> > > +		kmem_cache_free(extent_node_slab, en);
> > > +		atomic_dec(&sbi->total_ext_node);
> > 
> > 		--> move into __detach_extent_node().
> 
> will do.
> 
> > 
> > > +	}
> > > +	if (den) {
> > > +		kmem_cache_free(extent_node_slab, den);
> > > +		atomic_dec(&sbi->total_ext_node);
> > > +	}
> > > +
> > > +	write_unlock(&et->lock);
> > > +	atomic_dec(&et->refcount);
> > > +}
> > > +
> > > +void f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
> > > +{
> > > +	struct extent_tree *treevec[EXT_TREE_VEC_SIZE];
> > > +	struct extent_node *en, *tmp;
> > > +	unsigned long ino = F2FS_ROOT_INO(sbi);
> > > +	struct radix_tree_iter iter;
> > > +	void **slot;
> > > +	unsigned int found;
> > > +	unsigned int node_cnt = 0, tree_cnt = 0;
> > > +
> > > +	if (available_free_memory(sbi, EXTENT_CACHE))
> > > +		return;
> > > +
> > > +	spin_lock(&sbi->extent_lock);
> > > +	list_for_each_entry_safe(en, tmp, &sbi->extent_list, list) {
> > > +		if (!nr_shrink--)
> > > +			break;
> > > +		list_del_init(&en->list);
> > > +	}
> > > +	spin_unlock(&sbi->extent_lock);
> > > +
> > > +	down_read(&sbi->extent_tree_lock);
> > > +	while ((found = radix_tree_gang_lookup(&sbi->extent_tree_root,
> > > +				(void **)treevec, ino, EXT_TREE_VEC_SIZE))) {
> > > +		unsigned i;
> > > +
> > > +		ino = treevec[found - 1]->ino + 1;
> > > +		for (i = 0; i < found; i++) {
> > > +			struct extent_tree *et = treevec[i];
> > > +
> > > +			atomic_inc(&et->refcount);
> > > +			write_lock(&et->lock);
> > > +			node_cnt += __free_extent_tree(sbi, et, false);
> > > +			write_unlock(&et->lock);
> > > +			atomic_dec(&et->refcount);
> > > +		}
> > > +	}
> > > +	up_read(&sbi->extent_tree_lock);
> > > +
> > > +	down_write(&sbi->extent_tree_lock);
> > > +	radix_tree_for_each_slot(slot, &sbi->extent_tree_root, &iter,
> > > +							F2FS_ROOT_INO(sbi)) {
> > > +		struct extent_tree *et = (struct extent_tree *)*slot;
> > > +
> > > +		if (!atomic_read(&et->refcount) && !et->count) {
> > > +			radix_tree_delete(&sbi->extent_tree_root, et->ino);
> > > +			kmem_cache_free(extent_tree_slab, et);
> > > +			sbi->total_ext_tree--;
> > > +			tree_cnt++;
> > 
> > No use of tree_cnt.
> 
> This is used by trace function in RFC PATCH 10/10.

Well, then, it needs to add tree_cnt in Patch #10. :)

Thanks,

> 
> Thanks for your review! :)
> 
> Regards,
> Yu
> 
> > 
> > Thanks,
> 
> [snip]

  reply	other threads:[~2015-01-23 19:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-12  7:14 [f2fs-dev][RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache Chao Yu
2015-01-20 15:06 ` Changman Lee
2015-01-21  8:41   ` Chao Yu
2015-01-21 22:25     ` Changman Lee
2015-01-22  1:32       ` Chao Yu
2015-01-23  1:48 ` Jaegeuk Kim
2015-01-23  1:48   ` [RFC " Jaegeuk Kim
2015-01-23  6:15   ` [f2fs-dev][RFC " Chao Yu
2015-01-23 19:43     ` Jaegeuk Kim [this message]
2015-01-26  5:39       ` Chao Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150123194328.GC22493@jaegeuk-mac02.mot.com \
    --to=jaegeuk@kernel.org \
    --cc=chao2.yu@samsung.com \
    --cc=cm224.lee@samsung.com \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.