linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Adrian Hunter <ext-adrian.hunter@nokia.com>,
	Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Subject: [PATCH take 2 16/28] UBIFS: add TNC shrinker
Date: Tue,  6 May 2008 13:35:47 +0300	[thread overview]
Message-ID: <1210070159-22794-17-git-send-email-Artem.Bityutskiy@nokia.com> (raw)
In-Reply-To: <1210070159-22794-1-git-send-email-Artem.Bityutskiy@nokia.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 10558 bytes --]

The TNC cache grows with time, because UBIFS caches the indexing nodes
when the indexing B-tree is looked-up. But if the the file-system is
large enough, the TNC may consume a lot of memory, in which UBIFS prunes
it. Namely, it register memory shrinker for these purposes.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
---
 fs/ubifs/shrinker.c |  322 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 322 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/shrinker.c b/fs/ubifs/shrinker.c
new file mode 100644
index 0000000..b100ed9
--- /dev/null
+++ b/fs/ubifs/shrinker.c
@@ -0,0 +1,322 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ *          Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS shrinker which evicts clean znodes from the TNC
+ * tree when Linux VM needs more RAM.
+ *
+ * We do not implement any LRU lists to find oldest znodes to free because it
+ * would add additional overhead to the file system fast paths. So the shrinker
+ * just walks the TNC tree when searching for znodes to free.
+ *
+ * If the root of a TNC sub-tree is clean and old enough, then the children are
+ * also clean and old enough. So the shrinker walks the TNC in level order and
+ * dumps entire sub-trees.
+ *
+ * The age of znodes is just the time-stamp when they were last looked at.
+ * The current shrinker first tries to evict old znodes, then young ones.
+ *
+ * Since the shrinker is global, it has to protect against races with FS
+ * un-mounts, which is done by the 'ubifs_infos_lock' and 'c->umount_mutex'.
+ */
+
+#include "ubifs.h"
+
+/* List of all UBIFS file-system instances */
+LIST_HEAD(ubifs_infos);
+
+/*
+ * We number each shrinker run and record the number on the ubifs_info structure
+ * so that we can easily work out which ubifs_info structures have already been
+ * done by the current run.
+ */
+static unsigned int shrinker_run_no;
+
+/* Protects 'ubifs_infos' list */
+DEFINE_SPINLOCK(ubifs_infos_lock);
+
+/* Global clean znode counter (for all mounted UBIFS instances) */
+atomic_long_t ubifs_clean_zn_cnt;
+
+/**
+ * shrink_tnc - shrink TNC tree.
+ * @c: UBIFS file-system description object
+ * @nr: number of znodes to free
+ * @age: the age of znodes to free
+ * @contention: if any contention, this is set to %1
+ *
+ * This function traverses TNC tree and frees clean znodes. It does not free
+ * clean znodes which younger then @age. Returns number of freed znodes.
+ */
+static int shrink_tnc(struct ubifs_info *c, int nr, int age, int *contention)
+{
+	int total_freed = 0;
+	struct ubifs_znode *znode, *zprev;
+	int time = get_seconds();
+
+	ubifs_assert(mutex_is_locked(&c->umount_mutex));
+	ubifs_assert(mutex_is_locked(&c->tnc_mutex));
+
+	if (!c->zroot.znode || atomic_long_read(&c->clean_zn_cnt) == 0)
+		return 0;
+
+	/*
+	 * Traverse the TNC tree in levelorder manner, so that it is possible
+	 * to destroy large sub-trees. Indeed, if a znode is old, then all its
+	 * children are older or of the same age.
+	 *
+	 * Note, we are holding 'c->tnc_mutex', so we do not have to lock the
+	 * 'c->space_lock' when _reading_ 'c->clean_zn_cnt', because it is
+	 * changed only when the 'c->tnc_mutex' is held.
+	 */
+	zprev = NULL;
+	znode = ubifs_tnc_levelorder_next(c->zroot.znode, NULL);
+	while (znode && total_freed < nr &&
+	       atomic_long_read(&c->clean_zn_cnt) > 0) {
+		int freed;
+
+		/*
+		 * If the znode is clean, but it is in the 'c->cnext' list, this
+		 * means that this znode has just been written to flash as a
+		 * part of commit and was marked clean. They will be removed
+		 * from the list at end commit. We cannot change the list,
+		 * because it is not protected by any mutex (design decision to
+		 * make commit really independent and parallel to main I/O). So
+		 * we just skip these znodes.
+		 *
+		 * Note, the 'clean_zn_cnt' counters are not updated until
+		 * after the commit, so the UBIFS shrinker does not report
+		 * the znodes which are in the 'c->cnext' list as freeable.
+		 *
+		 * Also note, if the root of a sub-tree is not in 'c->cnext',
+		 * then the whole sub-tree is not in 'c->cnext' as well, so it
+		 * is safe to dump whole sub-tree.
+		 */
+
+		if (znode->cnext) {
+			/*
+			 * Very soon these znodes will be removed from the list
+			 * and become freeable.
+			 */
+			*contention = 1;
+		} else if (!ubifs_zn_dirty(znode) &&
+			   abs(time - znode->time) >= age) {
+			if (znode->parent)
+				znode->parent->zbranch[znode->iip].znode = NULL;
+			else
+				c->zroot.znode = NULL;
+
+			freed = ubifs_destroy_tnc_subtree(znode);
+			atomic_long_sub(freed, &ubifs_clean_zn_cnt);
+			atomic_long_sub(freed, &c->clean_zn_cnt);
+			ubifs_assert(atomic_long_read(&c->clean_zn_cnt) >= 0);
+			total_freed += freed;
+			znode = zprev;
+		}
+
+		if (unlikely(!c->zroot.znode))
+			break;
+
+		zprev = znode;
+		znode = ubifs_tnc_levelorder_next(c->zroot.znode, znode);
+		cond_resched();
+	}
+
+	return total_freed;
+}
+
+/**
+ * shrink_tnc_trees - shrink UBIFS TNC trees.
+ * @nr: number of znodes to free
+ * @age: the age of znodes to free
+ * @contention: if any contention, this is set to %1
+ *
+ * This function walks the list of mounted UBIFS file-systems and frees clean
+ * znodes which are older then @age, until at least @nr znodes are freed.
+ * Returns the number of freed znodes.
+ */
+static int shrink_tnc_trees(int nr, int age, int *contention)
+{
+	struct ubifs_info *c;
+	struct list_head *p;
+	unsigned int run_no;
+	int freed = 0;
+
+	spin_lock(&ubifs_infos_lock);
+	do
+		run_no = ++shrinker_run_no;
+	while (run_no == 0);
+	/* Iterate over all mounted UBIFS file-systems and try to shrink them */
+	p = ubifs_infos.next;
+	while (p != &ubifs_infos) {
+		c = list_entry(p, struct ubifs_info, infos_list);
+		/*
+		 * We move the ones we do to the end of the list, so we stop
+		 * when we see one we have already done.
+		 */
+		if (c->shrinker_run_no == run_no)
+			break;
+		if (!mutex_trylock(&c->umount_mutex)) {
+			/* Some un-mount is in progress, try next FS */
+			*contention = 1;
+			p = p->next;
+			continue;
+		}
+		/*
+		 * We're holding 'c->umount_mutex', so the file-system won't go
+		 * away.
+		 */
+		if (!mutex_trylock(&c->tnc_mutex)) {
+			mutex_unlock(&c->umount_mutex);
+			*contention = 1;
+			p = p->next;
+			continue;
+		}
+		spin_unlock(&ubifs_infos_lock);
+		/*
+		 * OK, now we have TNC locked, the file-system cannot go away -
+		 * it is safe to reap the cache.
+		 */
+		c->shrinker_run_no = run_no;
+		freed += shrink_tnc(c, nr, age, contention);
+		mutex_unlock(&c->tnc_mutex);
+		spin_lock(&ubifs_infos_lock);
+		/* Get the next list element before we move this one */
+		p = p->next;
+		/*
+		 * Move this one to the end of the list to provide some
+		 * fairness.
+		 */
+		list_del(&c->infos_list);
+		list_add_tail(&c->infos_list, &ubifs_infos);
+		mutex_unlock(&c->umount_mutex);
+		if (freed >= nr)
+			break;
+	}
+	spin_unlock(&ubifs_infos_lock);
+	return freed;
+}
+
+/**
+ * kick_a_thread - kick a background thread to start commit.
+ *
+ * This function kicks a background thread to start background commit. Returns
+ * %-1 if a thread was kicked or there is another reason to assume the memory
+ * will soon be freed or become freeable. If there are no dirty znodes, returns
+ * %0.
+ */
+static int kick_a_thread(void)
+{
+	int i;
+	struct ubifs_info *c;
+
+	/*
+	 * Iterate over all mounted UBIFS file-systems and find out if there is
+	 * already an ongoing commit operation there. If no, then iterate for
+	 * the second time and initiate background commit.
+	 */
+	spin_lock(&ubifs_infos_lock);
+	for (i = 0; i < 2; i++) {
+		list_for_each_entry(c, &ubifs_infos, infos_list) {
+			long dirty_zn_cnt;
+
+			if (!mutex_trylock(&c->umount_mutex)) {
+				/*
+				 * Some un-mount is in progress, it will
+				 * certainly free memory, so just return.
+				 */
+				spin_unlock(&ubifs_infos_lock);
+				return -1;
+			}
+
+			dirty_zn_cnt = atomic_long_read(&c->dirty_zn_cnt);
+
+			if (!dirty_zn_cnt || c->cmt_state == COMMIT_BROKEN ||
+			    c->ro_media) {
+				mutex_unlock(&c->umount_mutex);
+				continue;
+			}
+
+			if (c->cmt_state != COMMIT_RESTING) {
+				spin_unlock(&ubifs_infos_lock);
+				mutex_unlock(&c->umount_mutex);
+				return -1;
+			}
+
+			if (i == 1) {
+				list_del(&c->infos_list);
+				list_add_tail(&c->infos_list, &ubifs_infos);
+				spin_unlock(&ubifs_infos_lock);
+
+				ubifs_request_bg_commit(c);
+				mutex_unlock(&c->umount_mutex);
+				return -1;
+			}
+			mutex_unlock(&c->umount_mutex);
+		}
+	}
+	spin_unlock(&ubifs_infos_lock);
+
+	return 0;
+}
+
+int ubifs_shrinker(int nr, gfp_t gfp_mask)
+{
+	int freed, contention = 0;
+	long clean_zn_cnt = atomic_long_read(&ubifs_clean_zn_cnt);
+
+	if (nr == 0)
+		return clean_zn_cnt;
+
+	if (!clean_zn_cnt) {
+		/*
+		 * No clean znodes, nothing to reap. All we can do in this case
+		 * is to kick background threads to start commit, which will
+		 * probably make clean znodes which, in turn, will be freeable.
+		 * And we return -1 which means will make VM call us again
+		 * later.
+		 */
+		dbg_tnc("no clean znodes, kick a thread");
+		return kick_a_thread();
+	}
+
+	freed = shrink_tnc_trees(nr, OLD_ZNODE_AGE, &contention);
+	if (freed >= nr)
+		goto out;
+
+	dbg_tnc("not enough old znodes, try to free young ones");
+	freed += shrink_tnc_trees(nr - freed, YOUNG_ZNODE_AGE, &contention);
+	if (freed >= nr)
+		goto out;
+
+	dbg_tnc("not enough young znodes, free all");
+	freed += shrink_tnc_trees(nr - freed, 0, &contention);
+
+	if (!freed && contention) {
+		dbg_tnc("freed nothing, but contention");
+		return -1;
+	}
+
+out:
+	dbg_tnc("%d znodes were freed, requested %d", freed, nr);
+	return freed;
+}
-- 
1.5.4.1


  parent reply	other threads:[~2008-05-06  8:54 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-06 10:35 [PATCH take 2] UBIFS - new flash file system Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 01/28] VFS: introduce writeback_inodes_sb() Artem Bityutskiy
2008-05-07  7:23   ` Andrew Morton
2008-05-07  8:00     ` Christoph Hellwig
2008-05-13  8:31       ` Artem Bityutskiy
2008-05-13  8:59         ` Christoph Hellwig
2008-05-13  9:22           ` Artem Bityutskiy
2008-05-07 11:14     ` Artem Bityutskiy
2008-05-07 12:01       ` Artem Bityutskiy
2008-05-07 15:38       ` Andrew Morton
2008-05-07 16:55         ` Artem Bityutskiy
2008-05-08 10:11           ` Artem Bityutskiy
2008-05-20 12:45         ` Artem Bityutskiy
2008-05-20 17:49           ` Andrew Morton
2008-05-22 10:53             ` Artem Bityutskiy
2008-05-22 11:00               ` Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 02/28] do_mounts: allow UBI root device name Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 03/28] UBIFS: add brief documentation Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 04/28] UBIFS: add I/O sub-system Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 05/28] UBIFS: add flash scanning Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 06/28] UBIFS: add journal replay Artem Bityutskiy
2008-05-06 22:05   ` Marcin Slusarz
2008-05-07  5:57     ` Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 07/28] UBIFS: add file-system build Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 08/28] UBIFS: add superblock and master node Artem Bityutskiy
2008-05-06 23:48   ` Kyungmin Park
2008-05-07  6:07     ` Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 09/28] UBIFS: add file-system recovery Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 10/28] UBIFS: add compression support Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 11/28] UBIFS: add key helpers Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 12/28] UBIFS: add the journal Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 13/28] UBIFS: add commit functionality Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 14/28] UBIFS: add TNC implementation Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 15/28] UBIFS: add TNC commit implementation Artem Bityutskiy
2008-05-06 10:35 ` Artem Bityutskiy [this message]
2008-05-06 10:35 ` [PATCH take 2 17/28] UBIFS: add LEB properties Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 18/28] UBIFS: add LEB properties tree Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 19/28] " Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 20/28] UBIFS: add LEB find subsystem Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 21/28] UBIFS: add Garbage Collector Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 22/28] UBIFS: add VFS operations Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 23/28] UBIFS: add budgeting Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 24/28] UBIFS: add extended attribute support Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 25/28] UBIFS: add orphans handling sub-system Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 26/28] UBIFS: add header files Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 27/28] UBIFS: add debugging stuff Artem Bityutskiy
2008-05-06 10:35 ` [PATCH take 2 28/28] UBIFS: include FS to compilation Artem Bityutskiy
2008-05-07  8:01 ` [PATCH take 2] UBIFS - new flash file system Christoph Hellwig
2008-05-07  8:07   ` Artem Bityutskiy
2008-05-07  8:10     ` Christoph Hellwig
2008-05-07  8:11       ` Artem Bityutskiy
2008-05-07  8:32       ` Artem Bityutskiy
2008-05-07 10:31       ` Artem Bityutskiy
2008-05-16 10:40 ` Christoph Hellwig
2008-05-16 13:10   ` Adrian Hunter
2008-05-19 11:30   ` Artem Bityutskiy
2008-05-19 12:36     ` Andi Kleen
2008-05-23 15:18   ` Artem Bityutskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1210070159-22794-17-git-send-email-Artem.Bityutskiy@nokia.com \
    --to=artem.bityutskiy@nokia.com \
    --cc=ext-adrian.hunter@nokia.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).