* fs: break out inode operations from inode_lock V3
@ 2010-10-29 3:23 Dave Chinner
2010-10-29 3:23 ` [PATCH 1/8] fs: protect inode->i_state with inode->i_lock Dave Chinner
` (7 more replies)
0 siblings, 8 replies; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 3:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
Hi Al,
This series is a merge of the two previous sets of patches that
broke the inode LRU operations and the first half of iput_final()
out from under the inode_lock and then split the lists under
separate locks. The patches should apply to your current merge-stem
tree. I can rebase it on a different tree if you want.
Version 3:
- remove up wake_up_inode() inode_lock avoidance optimisation as the
inode_lock is no longer protecting i_state.
- Added a BUG_ON() check to evict() to ensure that inodes are removed
from the LRU before being evicted.
- removed inode_lock from igrab() when removing inode_lock from the
first half of iput_final()
- added 3 patches to convert the per-sb inode list, the writeback lists and the
inode hashes under their own locks. The changes from the first version of
these patches (posted separately) are:
- cleaned up comment for writeback_single_inode()
- dropped __mark_inode_dirty() cleanup and just moved the minimum
amount of code around to handle the locking changes cleanly.
- moved adding inodes to sb list under the hash lock as well so we
don't have inodes that are not on the per-sb lists on the hash.
- added a patch to clean up inode_lock references in documentation and
comments.
Version 2:
- make i_state checks/__iget() atomic under inode->i_lock
- merge dispose_one_inode() into evict()
- move inode hash removal after ->evict_inode call.
- always take the inode_lru_lock() when checking whether the inode
is on the lru or not.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/8] fs: protect inode->i_state with inode->i_lock
2010-10-29 3:23 fs: break out inode operations from inode_lock V3 Dave Chinner
@ 2010-10-29 3:23 ` Dave Chinner
2010-10-29 4:29 ` Al Viro
2010-10-29 3:23 ` [PATCH 2/8] fs: factor inode disposal Dave Chinner
` (6 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 3:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
From: Dave Chinner <dchinner@redhat.com>
Protect inode state transitions and validity checks with the
inode->i_lock. This enables us to make inode state transitions
independently of the inode_lock and is the first step to peeling
away the inode_lock from the code.
This requires that __iget() is done atomically with i_state checks
during list traversals so that we don't race with another thread
marking the inode I_FREEING between the state check and grabbing the
reference.
Also remove the unlock_new_inode() memory barrier optimisation
required to avoid taking the inode_lock when clearing I_NEW.
Simplify the code by simply taking the inode->i_lock around the
state change and wakeup. Because the wakeup is no longer tricky,
remove the wake_up_inode() function and open code the wakeup where
necessary.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/drop_caches.c | 9 ++-
fs/fs-writeback.c | 42 +++++++++++---
fs/inode.c | 138 +++++++++++++++++++++++++++++++++---------------
fs/notify/inode_mark.c | 21 +++++--
fs/quota/dquot.c | 13 +++--
5 files changed, 157 insertions(+), 66 deletions(-)
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 2195c21..62dd8ee 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -18,11 +18,14 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
spin_lock(&inode_lock);
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
- if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
- continue;
- if (inode->i_mapping->nrpages == 0)
+ spin_lock(&inode->i_lock);
+ if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
+ (inode->i_mapping->nrpages == 0)) {
+ spin_unlock(&inode->i_lock);
continue;
+ }
__iget(inode);
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
invalidate_mapping_pages(inode->i_mapping, 0, -1);
iput(toput_inode);
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f6af81a..50de4f1 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -293,9 +293,11 @@ static void inode_wait_for_writeback(struct inode *inode)
wqh = bit_waitqueue(&inode->i_state, __I_SYNC);
while (inode->i_state & I_SYNC) {
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
__wait_on_bit(wqh, &wq, inode_wait, TASK_UNINTERRUPTIBLE);
spin_lock(&inode_lock);
+ spin_lock(&inode->i_lock);
}
}
@@ -319,6 +321,7 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
unsigned dirty;
int ret;
+ spin_lock(&inode->i_lock);
if (!atomic_read(&inode->i_count))
WARN_ON(!(inode->i_state & (I_WILL_FREE|I_FREEING)));
else
@@ -334,6 +337,7 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
* completed a full scan of b_io.
*/
if (wbc->sync_mode != WB_SYNC_ALL) {
+ spin_unlock(&inode->i_lock);
requeue_io(inode);
return 0;
}
@@ -349,6 +353,7 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
/* Set I_SYNC, reset I_DIRTY_PAGES */
inode->i_state |= I_SYNC;
inode->i_state &= ~I_DIRTY_PAGES;
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
ret = do_writepages(mapping, wbc);
@@ -370,8 +375,10 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
* write_inode()
*/
spin_lock(&inode_lock);
+ spin_lock(&inode->i_lock);
dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~(I_DIRTY_SYNC | I_DIRTY_DATASYNC);
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
@@ -381,6 +388,7 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
spin_lock(&inode_lock);
+ spin_lock(&inode->i_lock);
inode->i_state &= ~I_SYNC;
if (!(inode->i_state & I_FREEING)) {
if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
@@ -422,6 +430,7 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
}
inode_sync_complete(inode);
+ spin_unlock(&inode->i_lock);
return ret;
}
@@ -492,7 +501,9 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb,
* kind does not need peridic writeout yet, and for the latter
* kind writeout is handled by the freer.
*/
+ spin_lock(&inode->i_lock);
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
+ spin_unlock(&inode->i_lock);
requeue_io(inode);
continue;
}
@@ -501,10 +512,14 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb,
* Was this inode dirtied after sync_sb_inodes was called?
* This keeps sync from extra jobs and livelock.
*/
- if (inode_dirtied_after(inode, wbc->wb_start))
+ if (inode_dirtied_after(inode, wbc->wb_start)) {
+ spin_unlock(&inode->i_lock);
return 1;
+ }
__iget(inode);
+ spin_unlock(&inode->i_lock);
+
pages_skipped = wbc->pages_skipped;
writeback_single_inode(inode, wbc);
if (wbc->pages_skipped != pages_skipped) {
@@ -681,7 +696,9 @@ static long wb_writeback(struct bdi_writeback *wb,
if (!list_empty(&wb->b_more_io)) {
inode = wb_inode(wb->b_more_io.prev);
trace_wbc_writeback_wait(&wbc, wb->bdi);
+ spin_lock(&inode->i_lock);
inode_wait_for_writeback(inode);
+ spin_unlock(&inode->i_lock);
}
spin_unlock(&inode_lock);
}
@@ -947,6 +964,7 @@ void __mark_inode_dirty(struct inode *inode, int flags)
block_dump___mark_inode_dirty(inode);
spin_lock(&inode_lock);
+ spin_lock(&inode->i_lock);
if ((inode->i_state & flags) != flags) {
const int was_dirty = inode->i_state & I_DIRTY;
@@ -958,7 +976,7 @@ void __mark_inode_dirty(struct inode *inode, int flags)
* superblock list, based upon its state.
*/
if (inode->i_state & I_SYNC)
- goto out;
+ goto out_unlock_inode;
/*
* Only add valid (hashed) inodes to the superblock's
@@ -966,11 +984,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
if (!S_ISBLK(inode->i_mode)) {
if (inode_unhashed(inode))
- goto out;
+ goto out_unlock_inode;
}
if (inode->i_state & I_FREEING)
- goto out;
+ goto out_unlock_inode;
+ spin_unlock(&inode->i_lock);
/*
* If the inode was already on b_dirty/b_io/b_more_io, don't
* reposition it (that would break b_dirty time-ordering).
@@ -995,7 +1014,10 @@ void __mark_inode_dirty(struct inode *inode, int flags)
inode->dirtied_when = jiffies;
list_move(&inode->i_wb_list, &bdi->wb.b_dirty);
}
+ goto out;
}
+out_unlock_inode:
+ spin_unlock(&inode->i_lock);
out:
spin_unlock(&inode_lock);
@@ -1041,14 +1063,16 @@ static void wait_sb_inodes(struct super_block *sb)
* we still have to wait for that writeout.
*/
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
- struct address_space *mapping;
+ struct address_space *mapping = inode->i_mapping;
- if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
- continue;
- mapping = inode->i_mapping;
- if (mapping->nrpages == 0)
+ spin_lock(&inode->i_lock);
+ if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
+ (mapping->nrpages == 0)) {
+ spin_unlock(&inode->i_lock);
continue;
+ }
__iget(inode);
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
/*
* We hold a reference to 'inode' so it couldn't have
diff --git a/fs/inode.c b/fs/inode.c
index a6d6068..4c84acb 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -26,6 +26,17 @@
#include <linux/posix_acl.h>
/*
+ * inode locking rules.
+ *
+ * inode->i_lock protects:
+ * inode->i_state, __iget()
+ *
+ * Lock ordering:
+ * inode_lock
+ * inode->i_lock
+ */
+
+/*
* This is needed for the following functions:
* - inode_has_buffers
* - invalidate_bdev
@@ -136,15 +147,6 @@ int proc_nr_inodes(ctl_table *table, int write,
}
#endif
-static void wake_up_inode(struct inode *inode)
-{
- /*
- * Prevent speculative execution through spin_unlock(&inode_lock);
- */
- smp_mb();
- wake_up_bit(&inode->i_state, __I_NEW);
-}
-
/**
* inode_init_always - perform inode structure intialisation
* @sb: superblock inode belongs to
@@ -314,7 +316,7 @@ static void init_once(void *foo)
}
/*
- * inode_lock must be held
+ * inode->i_lock must be held
*/
void __iget(struct inode *inode)
{
@@ -429,7 +431,9 @@ void end_writeback(struct inode *inode)
BUG_ON(!(inode->i_state & I_FREEING));
BUG_ON(inode->i_state & I_CLEAR);
inode_sync_wait(inode);
+ spin_lock(&inode->i_lock);
inode->i_state = I_FREEING | I_CLEAR;
+ spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(end_writeback);
@@ -472,7 +476,9 @@ static void dispose_list(struct list_head *head)
__inode_sb_list_del(inode);
spin_unlock(&inode_lock);
- wake_up_inode(inode);
+ spin_lock(&inode->i_lock);
+ wake_up_bit(&inode->i_state, __I_NEW);
+ spin_unlock(&inode->i_lock);
destroy_inode(inode);
}
}
@@ -498,12 +504,17 @@ void evict_inodes(struct super_block *sb)
if (atomic_read(&inode->i_count))
continue;
+ spin_lock(&inode->i_lock);
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
+ spin_unlock(&inode->i_lock);
WARN_ON(1);
continue;
}
inode->i_state |= I_FREEING;
+ if (!(inode->i_state & (I_DIRTY | I_SYNC)))
+ percpu_counter_dec(&nr_inodes_unused);
+ spin_unlock(&inode->i_lock);
/*
* Move the inode off the IO lists and LRU once I_FREEING is
@@ -511,8 +522,6 @@ void evict_inodes(struct super_block *sb)
*/
list_move(&inode->i_lru, &dispose);
list_del_init(&inode->i_wb_list);
- if (!(inode->i_state & (I_DIRTY | I_SYNC)))
- percpu_counter_dec(&nr_inodes_unused);
}
spin_unlock(&inode_lock);
@@ -537,14 +546,21 @@ int invalidate_inodes(struct super_block *sb)
spin_lock(&inode_lock);
list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
- if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
+ spin_lock(&inode->i_lock);
+ if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
+ spin_unlock(&inode->i_lock);
continue;
+ }
if (atomic_read(&inode->i_count)) {
+ spin_unlock(&inode->i_lock);
busy = 1;
continue;
}
inode->i_state |= I_FREEING;
+ if (!(inode->i_state & (I_DIRTY | I_SYNC)))
+ percpu_counter_dec(&nr_inodes_unused);
+ spin_unlock(&inode->i_lock);
/*
* Move the inode off the IO lists and LRU once I_FREEING is
@@ -552,8 +568,6 @@ int invalidate_inodes(struct super_block *sb)
*/
list_move(&inode->i_lru, &dispose);
list_del_init(&inode->i_wb_list);
- if (!(inode->i_state & (I_DIRTY | I_SYNC)))
- percpu_counter_dec(&nr_inodes_unused);
}
spin_unlock(&inode_lock);
@@ -612,8 +626,10 @@ static void prune_icache(int nr_to_scan)
* Referenced or dirty inodes are still in use. Give them
* another pass through the LRU as we canot reclaim them now.
*/
+ spin_lock(&inode->i_lock);
if (atomic_read(&inode->i_count) ||
(inode->i_state & ~I_REFERENCED)) {
+ spin_unlock(&inode->i_lock);
list_del_init(&inode->i_lru);
percpu_counter_dec(&nr_inodes_unused);
continue;
@@ -621,12 +637,14 @@ static void prune_icache(int nr_to_scan)
/* recently referenced inodes get one more pass */
if (inode->i_state & I_REFERENCED) {
- list_move(&inode->i_lru, &inode_lru);
inode->i_state &= ~I_REFERENCED;
+ spin_unlock(&inode->i_lock);
+ list_move(&inode->i_lru, &inode_lru);
continue;
}
if (inode_has_buffers(inode) || inode->i_data.nrpages) {
__iget(inode);
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
if (remove_inode_buffers(inode))
reap += invalidate_mapping_pages(&inode->i_data,
@@ -637,11 +655,15 @@ static void prune_icache(int nr_to_scan)
if (inode != list_entry(inode_lru.next,
struct inode, i_lru))
continue; /* wrong inode or list_empty */
- if (!can_unuse(inode))
+ spin_lock(&inode->i_lock);
+ if (!can_unuse(inode)) {
+ spin_unlock(&inode->i_lock);
continue;
+ }
}
WARN_ON(inode->i_state & I_NEW);
inode->i_state |= I_FREEING;
+ spin_unlock(&inode->i_lock);
/*
* Move the inode off the IO lists and LRU once I_FREEING is
@@ -708,11 +730,13 @@ repeat:
continue;
if (!test(inode, data))
continue;
+ spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
__wait_on_freeing_inode(inode);
goto repeat;
}
__iget(inode);
+ spin_unlock(&inode->i_lock);
return inode;
}
return NULL;
@@ -734,11 +758,13 @@ repeat:
continue;
if (inode->i_sb != sb)
continue;
+ spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
__wait_on_freeing_inode(inode);
goto repeat;
}
__iget(inode);
+ spin_unlock(&inode->i_lock);
return inode;
}
return NULL;
@@ -803,14 +829,23 @@ struct inode *new_inode(struct super_block *sb)
inode = alloc_inode(sb);
if (inode) {
spin_lock(&inode_lock);
- __inode_sb_list_add(inode);
+ spin_lock(&inode->i_lock);
inode->i_state = 0;
+ spin_unlock(&inode->i_lock);
+ __inode_sb_list_add(inode);
spin_unlock(&inode_lock);
}
return inode;
}
EXPORT_SYMBOL(new_inode);
+/**
+ * unlock_new_inode - clear the I_NEW state and wake up any waiters
+ * @inode: new inode to unlock
+ *
+ * Called when the inode is fully initialised to clear the new state of the
+ * inode and wake up anyone waiting for the inode to finish initialisation.
+ */
void unlock_new_inode(struct inode *inode)
{
#ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -830,19 +865,11 @@ void unlock_new_inode(struct inode *inode)
}
}
#endif
- /*
- * This is special! We do not need the spinlock when clearing I_NEW,
- * because we're guaranteed that nobody else tries to do anything about
- * the state of the inode when it is locked, as we just created it (so
- * there can be no old holders that haven't tested I_NEW).
- * However we must emit the memory barrier so that other CPUs reliably
- * see the clearing of I_NEW after the other inode initialisation has
- * completed.
- */
- smp_mb();
+ spin_lock(&inode->i_lock);
WARN_ON(!(inode->i_state & I_NEW));
inode->i_state &= ~I_NEW;
- wake_up_inode(inode);
+ wake_up_bit(&inode->i_state, __I_NEW);
+ spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(unlock_new_inode);
@@ -871,9 +898,11 @@ static struct inode *get_new_inode(struct super_block *sb,
if (set(inode, data))
goto set_failed;
+ spin_lock(&inode->i_lock);
+ inode->i_state = I_NEW;
+ spin_unlock(&inode->i_lock);
hlist_add_head(&inode->i_hash, head);
__inode_sb_list_add(inode);
- inode->i_state = I_NEW;
spin_unlock(&inode_lock);
/* Return the locked inode with I_NEW set, the
@@ -917,10 +946,12 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
/* We released the lock, so.. */
old = find_inode_fast(sb, head, ino);
if (!old) {
+ spin_lock(&inode->i_lock);
inode->i_ino = ino;
+ inode->i_state = I_NEW;
+ spin_unlock(&inode->i_lock);
hlist_add_head(&inode->i_hash, head);
__inode_sb_list_add(inode);
- inode->i_state = I_NEW;
spin_unlock(&inode_lock);
/* Return the locked inode with I_NEW set, the
@@ -1005,15 +1036,19 @@ EXPORT_SYMBOL(iunique);
struct inode *igrab(struct inode *inode)
{
spin_lock(&inode_lock);
- if (!(inode->i_state & (I_FREEING|I_WILL_FREE)))
+ spin_lock(&inode->i_lock);
+ if (!(inode->i_state & (I_FREEING|I_WILL_FREE))) {
__iget(inode);
- else
+ spin_unlock(&inode->i_lock);
+ } else {
+ spin_unlock(&inode->i_lock);
/*
* Handle the case where s_op->clear_inode is not been
* called yet, and somebody is calling igrab
* while the inode is getting freed.
*/
inode = NULL;
+ }
spin_unlock(&inode_lock);
return inode;
}
@@ -1242,7 +1277,9 @@ int insert_inode_locked(struct inode *inode)
ino_t ino = inode->i_ino;
struct hlist_head *head = inode_hashtable + hash(sb, ino);
+ spin_lock(&inode->i_lock);
inode->i_state |= I_NEW;
+ spin_unlock(&inode->i_lock);
while (1) {
struct hlist_node *node;
struct inode *old = NULL;
@@ -1252,8 +1289,11 @@ int insert_inode_locked(struct inode *inode)
continue;
if (old->i_sb != sb)
continue;
- if (old->i_state & (I_FREEING|I_WILL_FREE))
+ spin_lock(&old->i_lock);
+ if (old->i_state & (I_FREEING|I_WILL_FREE)) {
+ spin_unlock(&old->i_lock);
continue;
+ }
break;
}
if (likely(!node)) {
@@ -1262,6 +1302,7 @@ int insert_inode_locked(struct inode *inode)
return 0;
}
__iget(old);
+ spin_unlock(&old->i_lock);
spin_unlock(&inode_lock);
wait_on_inode(old);
if (unlikely(!inode_unhashed(old))) {
@@ -1279,7 +1320,9 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
struct super_block *sb = inode->i_sb;
struct hlist_head *head = inode_hashtable + hash(sb, hashval);
+ spin_lock(&inode->i_lock);
inode->i_state |= I_NEW;
+ spin_unlock(&inode->i_lock);
while (1) {
struct hlist_node *node;
@@ -1291,8 +1334,11 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
continue;
if (!test(old, data))
continue;
- if (old->i_state & (I_FREEING|I_WILL_FREE))
+ spin_lock(&old->i_lock);
+ if (old->i_state & (I_FREEING|I_WILL_FREE)) {
+ spin_unlock(&old->i_lock);
continue;
+ }
break;
}
if (likely(!node)) {
@@ -1301,6 +1347,7 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
return 0;
}
__iget(old);
+ spin_unlock(&old->i_lock);
spin_unlock(&inode_lock);
wait_on_inode(old);
if (unlikely(!inode_unhashed(old))) {
@@ -1346,6 +1393,9 @@ static void iput_final(struct inode *inode)
const struct super_operations *op = inode->i_sb->s_op;
int drop;
+ spin_lock(&inode->i_lock);
+ WARN_ON(inode->i_state & I_NEW);
+
if (op && op->drop_inode)
drop = op->drop_inode(inode);
else
@@ -1357,21 +1407,23 @@ static void iput_final(struct inode *inode)
if (!(inode->i_state & (I_DIRTY|I_SYNC))) {
inode_lru_list_add(inode);
}
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
return;
}
- WARN_ON(inode->i_state & I_NEW);
inode->i_state |= I_WILL_FREE;
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
write_inode_now(inode, 1);
spin_lock(&inode_lock);
+ spin_lock(&inode->i_lock);
WARN_ON(inode->i_state & I_NEW);
inode->i_state &= ~I_WILL_FREE;
__remove_inode_hash(inode);
}
- WARN_ON(inode->i_state & I_NEW);
inode->i_state |= I_FREEING;
+ spin_unlock(&inode->i_lock);
/*
* Move the inode off the IO lists and LRU once I_FREEING is
@@ -1384,8 +1436,10 @@ static void iput_final(struct inode *inode)
spin_unlock(&inode_lock);
evict(inode);
remove_inode_hash(inode);
- wake_up_inode(inode);
+ spin_lock(&inode->i_lock);
+ wake_up_bit(&inode->i_state, __I_NEW);
BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
+ spin_unlock(&inode->i_lock);
destroy_inode(inode);
}
@@ -1582,9 +1636,8 @@ EXPORT_SYMBOL(inode_wait);
* to recheck inode state.
*
* It doesn't matter if I_NEW is not set initially, a call to
- * wake_up_inode() after removing from the hash list will DTRT.
- *
- * This is called with inode_lock held.
+ * wake_up_bit(&inode->i_state, __I_NEW) after removing from the hash list
+ * will DTRT.
*/
static void __wait_on_freeing_inode(struct inode *inode)
{
@@ -1592,6 +1645,7 @@ static void __wait_on_freeing_inode(struct inode *inode)
DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
wq = bit_waitqueue(&inode->i_state, __I_NEW);
prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
schedule();
finish_wait(wq, &wait.wait);
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 21ed106..9c0fb09 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -249,8 +249,11 @@ void fsnotify_unmount_inodes(struct list_head *list)
* I_WILL_FREE, or I_NEW which is fine because by that point
* the inode cannot have any associated watches.
*/
- if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
+ spin_lock(&inode->i_lock);
+ if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
+ spin_unlock(&inode->i_lock);
continue;
+ }
/*
* If i_count is zero, the inode cannot have any watches and
@@ -258,8 +261,10 @@ void fsnotify_unmount_inodes(struct list_head *list)
* evict all inodes with zero i_count from icache which is
* unnecessarily violent and may in fact be illegal to do.
*/
- if (!atomic_read(&inode->i_count))
+ if (!atomic_read(&inode->i_count)) {
+ spin_unlock(&inode->i_lock);
continue;
+ }
need_iput_tmp = need_iput;
need_iput = NULL;
@@ -269,13 +274,17 @@ void fsnotify_unmount_inodes(struct list_head *list)
__iget(inode);
else
need_iput_tmp = NULL;
+ spin_unlock(&inode->i_lock);
/* In case the dropping of a reference would nuke next_i. */
if ((&next_i->i_sb_list != list) &&
- atomic_read(&next_i->i_count) &&
- !(next_i->i_state & (I_FREEING | I_WILL_FREE))) {
- __iget(next_i);
- need_iput = next_i;
+ atomic_read(&next_i->i_count)) {
+ spin_lock(&next_i->i_lock);
+ if (!(next_i->i_state & (I_FREEING | I_WILL_FREE))) {
+ __iget(next_i);
+ need_iput = next_i;
+ }
+ spin_unlock(&next_i->i_lock);
}
/*
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index aad1316..a2d89f8 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -898,18 +898,19 @@ static void add_dquot_ref(struct super_block *sb, int type)
spin_lock(&inode_lock);
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
- if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
+ spin_lock(&inode->i_lock);
+ if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
+ !atomic_read(&inode->i_writecount) ||
+ !dqinit_needed(inode, type)) {
+ spin_unlock(&inode->i_lock);
continue;
+ }
#ifdef CONFIG_QUOTA_DEBUG
if (unlikely(inode_get_rsv_space(inode) > 0))
reserved = 1;
#endif
- if (!atomic_read(&inode->i_writecount))
- continue;
- if (!dqinit_needed(inode, type))
- continue;
-
__iget(inode);
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
iput(old_inode);
--
1.7.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/8] fs: factor inode disposal
2010-10-29 3:23 fs: break out inode operations from inode_lock V3 Dave Chinner
2010-10-29 3:23 ` [PATCH 1/8] fs: protect inode->i_state with inode->i_lock Dave Chinner
@ 2010-10-29 3:23 ` Dave Chinner
2010-10-29 4:45 ` Al Viro
2010-10-29 3:23 ` [PATCH 3/8] fs: Lock the inode LRU list separately Dave Chinner
` (5 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 3:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
From: Dave Chinner <dchinner@redhat.com>
We have a couple of places that dispose of inodes. factor the
disposal into evict() to isolate this code and make it simpler to
peel away the inode_lock from the code.
While doing this, change the logic flow in iput_final() to separate
the different cases that need to be handled to make the transitions
the inode goes through more obvious.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/inode.c | 102 +++++++++++++++++++++++++++--------------------------------
1 files changed, 47 insertions(+), 55 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 4c84acb..2e218ac 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -399,17 +399,6 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval)
EXPORT_SYMBOL(__insert_inode_hash);
/**
- * __remove_inode_hash - remove an inode from the hash
- * @inode: inode to unhash
- *
- * Remove an inode from the superblock.
- */
-static void __remove_inode_hash(struct inode *inode)
-{
- hlist_del_init(&inode->i_hash);
-}
-
-/**
* remove_inode_hash - remove an inode from the hash
* @inode: inode to unhash
*
@@ -437,10 +426,31 @@ void end_writeback(struct inode *inode)
}
EXPORT_SYMBOL(end_writeback);
+/*
+ * Free the inode passed in, removing it from the lists it is still connected
+ * to. We remove any pages still attached to the inode and wait for any IO that
+ * is still in progress before finally destroying the inode.
+ *
+ * An inode must already be marked I_FREEING so that we avoid the inode being
+ * moved back onto lists if we race with other code that manipulates the lists
+ * (e.g. writeback_single_inode). The caller is responsible for setting this.
+ *
+ * An inode must already be removed from the LRU list before being evicted from
+ * the cache. This should occur atomically with setting the I_FREEING state
+ * flag, so no inodes here should ever be on the LRU.
+ */
static void evict(struct inode *inode)
{
const struct super_operations *op = inode->i_sb->s_op;
+ BUG_ON(!(inode->i_state & I_FREEING));
+ BUG_ON(!list_empty(&inode->i_lru));
+
+ spin_lock(&inode_lock);
+ list_del_init(&inode->i_wb_list);
+ __inode_sb_list_del(inode);
+ spin_unlock(&inode_lock);
+
if (op->evict_inode) {
op->evict_inode(inode);
} else {
@@ -452,6 +462,15 @@ static void evict(struct inode *inode)
bd_forget(inode);
if (S_ISCHR(inode->i_mode) && inode->i_cdev)
cd_forget(inode);
+
+ remove_inode_hash(inode);
+
+ spin_lock(&inode->i_lock);
+ wake_up_bit(&inode->i_state, __I_NEW);
+ BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
+ spin_unlock(&inode->i_lock);
+
+ destroy_inode(inode);
}
/*
@@ -470,16 +489,6 @@ static void dispose_list(struct list_head *head)
list_del_init(&inode->i_lru);
evict(inode);
-
- spin_lock(&inode_lock);
- __remove_inode_hash(inode);
- __inode_sb_list_del(inode);
- spin_unlock(&inode_lock);
-
- spin_lock(&inode->i_lock);
- wake_up_bit(&inode->i_state, __I_NEW);
- spin_unlock(&inode->i_lock);
- destroy_inode(inode);
}
}
@@ -517,11 +526,10 @@ void evict_inodes(struct super_block *sb)
spin_unlock(&inode->i_lock);
/*
- * Move the inode off the IO lists and LRU once I_FREEING is
- * set so that it won't get moved back on there if it is dirty.
+ * Move the inode off the LRU once I_FREEING is set so that it
+ * won't get moved back on.
*/
list_move(&inode->i_lru, &dispose);
- list_del_init(&inode->i_wb_list);
}
spin_unlock(&inode_lock);
@@ -563,11 +571,10 @@ int invalidate_inodes(struct super_block *sb)
spin_unlock(&inode->i_lock);
/*
- * Move the inode off the IO lists and LRU once I_FREEING is
- * set so that it won't get moved back on there if it is dirty.
+ * Move the inode off the LRU once I_FREEING is
+ * set so that it won't get moved back on.
*/
list_move(&inode->i_lru, &dispose);
- list_del_init(&inode->i_wb_list);
}
spin_unlock(&inode_lock);
@@ -666,11 +673,10 @@ static void prune_icache(int nr_to_scan)
spin_unlock(&inode->i_lock);
/*
- * Move the inode off the IO lists and LRU once I_FREEING is
- * set so that it won't get moved back on there if it is dirty.
+ * Move the inode off the LRU once I_FREEING is
+ * set so that it won't get moved back on.
*/
list_move(&inode->i_lru, &freeable);
- list_del_init(&inode->i_wb_list);
percpu_counter_dec(&nr_inodes_unused);
}
if (current_is_kswapd())
@@ -1401,16 +1407,16 @@ static void iput_final(struct inode *inode)
else
drop = generic_drop_inode(inode);
+ if (!drop && (sb->s_flags & MS_ACTIVE)) {
+ inode->i_state |= I_REFERENCED;
+ if (!(inode->i_state & (I_DIRTY|I_SYNC)))
+ inode_lru_list_add(inode);
+ spin_unlock(&inode->i_lock);
+ spin_unlock(&inode_lock);
+ return;
+ }
+
if (!drop) {
- if (sb->s_flags & MS_ACTIVE) {
- inode->i_state |= I_REFERENCED;
- if (!(inode->i_state & (I_DIRTY|I_SYNC))) {
- inode_lru_list_add(inode);
- }
- spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
- return;
- }
inode->i_state |= I_WILL_FREE;
spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
@@ -1419,28 +1425,14 @@ static void iput_final(struct inode *inode)
spin_lock(&inode->i_lock);
WARN_ON(inode->i_state & I_NEW);
inode->i_state &= ~I_WILL_FREE;
- __remove_inode_hash(inode);
}
inode->i_state |= I_FREEING;
- spin_unlock(&inode->i_lock);
-
- /*
- * Move the inode off the IO lists and LRU once I_FREEING is
- * set so that it won't get moved back on there if it is dirty.
- */
inode_lru_list_del(inode);
- list_del_init(&inode->i_wb_list);
-
- __inode_sb_list_del(inode);
+ spin_unlock(&inode->i_lock);
spin_unlock(&inode_lock);
+
evict(inode);
- remove_inode_hash(inode);
- spin_lock(&inode->i_lock);
- wake_up_bit(&inode->i_state, __I_NEW);
- BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
- spin_unlock(&inode->i_lock);
- destroy_inode(inode);
}
/**
--
1.7.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/8] fs: Lock the inode LRU list separately
2010-10-29 3:23 fs: break out inode operations from inode_lock V3 Dave Chinner
2010-10-29 3:23 ` [PATCH 1/8] fs: protect inode->i_state with inode->i_lock Dave Chinner
2010-10-29 3:23 ` [PATCH 2/8] fs: factor inode disposal Dave Chinner
@ 2010-10-29 3:23 ` Dave Chinner
2010-10-29 3:23 ` [PATCH 4/8] fs: remove inode_lock from iput_final and prune_icache Dave Chinner
` (4 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 3:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
From: Dave Chinner <dchinner@redhat.com>
Introduce the inode_lru_lock to protect the inode_lru list. This
lock is nested inside the inode->i_lock to allow the inode to be
added to the LRU list in iput_final without needing to deal with
lock inversions. This keeps iput_final() clean and neat.
Further, where marking the inode I_FREEING and removing it from the
LRU, move the LRU list manipulation within the inode->i_lock to keep
the list manipulation consistent with iput_final. This also means
that most of the open coded LRU list removal + unused inode
accounting can now use the inode_lru_list_del() wrappers which
cleans the code up further.
However, this locking change means what the LRU traversal in
prune_icache() inverts this lock ordering and needs to use trylock
semantics on the inode->i_lock to avoid deadlocking. In these cases,
if we fail to lock the inode we move it to the back of the LRU to
prevent spinning on it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/inode.c | 47 ++++++++++++++++++++++++++++++-----------------
1 files changed, 30 insertions(+), 17 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 2e218ac..f00ae32 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -30,10 +30,13 @@
*
* inode->i_lock protects:
* inode->i_state, __iget()
+ * inode_lru_lock protects:
+ * inode_lru, inode->i_lru
*
* Lock ordering:
* inode_lock
* inode->i_lock
+ * inode_lru_lock
*/
/*
@@ -83,6 +86,7 @@ static unsigned int i_hash_shift __read_mostly;
*/
static LIST_HEAD(inode_lru);
+static DEFINE_SPINLOCK(inode_lru_lock);
static struct hlist_head *inode_hashtable __read_mostly;
/*
@@ -334,18 +338,22 @@ EXPORT_SYMBOL(ihold);
static void inode_lru_list_add(struct inode *inode)
{
+ spin_lock(&inode_lru_lock);
if (list_empty(&inode->i_lru)) {
list_add(&inode->i_lru, &inode_lru);
percpu_counter_inc(&nr_inodes_unused);
}
+ spin_unlock(&inode_lru_lock);
}
static void inode_lru_list_del(struct inode *inode)
{
+ spin_lock(&inode_lru_lock);
if (!list_empty(&inode->i_lru)) {
list_del_init(&inode->i_lru);
percpu_counter_dec(&nr_inodes_unused);
}
+ spin_unlock(&inode_lru_lock);
}
static inline void __inode_sb_list_add(struct inode *inode)
@@ -521,15 +529,10 @@ void evict_inodes(struct super_block *sb)
}
inode->i_state |= I_FREEING;
- if (!(inode->i_state & (I_DIRTY | I_SYNC)))
- percpu_counter_dec(&nr_inodes_unused);
+ inode_lru_list_del(inode);
spin_unlock(&inode->i_lock);
- /*
- * Move the inode off the LRU once I_FREEING is set so that it
- * won't get moved back on.
- */
- list_move(&inode->i_lru, &dispose);
+ list_add(&inode->i_lru, &dispose);
}
spin_unlock(&inode_lock);
@@ -566,15 +569,10 @@ int invalidate_inodes(struct super_block *sb)
}
inode->i_state |= I_FREEING;
- if (!(inode->i_state & (I_DIRTY | I_SYNC)))
- percpu_counter_dec(&nr_inodes_unused);
+ inode_lru_list_del(inode);
spin_unlock(&inode->i_lock);
- /*
- * Move the inode off the LRU once I_FREEING is
- * set so that it won't get moved back on.
- */
- list_move(&inode->i_lru, &dispose);
+ list_add(&inode->i_lru, &dispose);
}
spin_unlock(&inode_lock);
@@ -599,7 +597,7 @@ static int can_unuse(struct inode *inode)
/*
* Scan `goal' inodes on the unused list for freeable ones. They are moved to a
- * temporary list and then are freed outside inode_lock by dispose_list().
+ * temporary list and then are freed outside inode_lru_lock by dispose_list().
*
* Any inodes which are pinned purely because of attached pagecache have their
* pagecache removed. If the inode has metadata buffers attached to
@@ -621,6 +619,7 @@ static void prune_icache(int nr_to_scan)
down_read(&iprune_sem);
spin_lock(&inode_lock);
+ spin_lock(&inode_lru_lock);
for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) {
struct inode *inode;
@@ -630,10 +629,19 @@ static void prune_icache(int nr_to_scan)
inode = list_entry(inode_lru.prev, struct inode, i_lru);
/*
+ * we are inverting the inode_lru_lock/inode->i_lock here,
+ * so use a trylock. If we fail to get the lock, just move the
+ * inode to the back of the list so we don't spin on it.
+ */
+ if (!spin_trylock(&inode->i_lock)) {
+ list_move(&inode->i_lru, &inode_lru);
+ continue;
+ }
+
+ /*
* Referenced or dirty inodes are still in use. Give them
* another pass through the LRU as we canot reclaim them now.
*/
- spin_lock(&inode->i_lock);
if (atomic_read(&inode->i_count) ||
(inode->i_state & ~I_REFERENCED)) {
spin_unlock(&inode->i_lock);
@@ -652,17 +660,21 @@ static void prune_icache(int nr_to_scan)
if (inode_has_buffers(inode) || inode->i_data.nrpages) {
__iget(inode);
spin_unlock(&inode->i_lock);
+ spin_unlock(&inode_lru_lock);
spin_unlock(&inode_lock);
if (remove_inode_buffers(inode))
reap += invalidate_mapping_pages(&inode->i_data,
0, -1);
iput(inode);
spin_lock(&inode_lock);
+ spin_lock(&inode_lru_lock);
if (inode != list_entry(inode_lru.next,
struct inode, i_lru))
continue; /* wrong inode or list_empty */
- spin_lock(&inode->i_lock);
+ /* avoid lock inversions with trylock */
+ if (!spin_trylock(&inode->i_lock))
+ continue;
if (!can_unuse(inode)) {
spin_unlock(&inode->i_lock);
continue;
@@ -683,6 +695,7 @@ static void prune_icache(int nr_to_scan)
__count_vm_events(KSWAPD_INODESTEAL, reap);
else
__count_vm_events(PGINODESTEAL, reap);
+ spin_unlock(&inode_lru_lock);
spin_unlock(&inode_lock);
dispose_list(&freeable);
--
1.7.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/8] fs: remove inode_lock from iput_final and prune_icache
2010-10-29 3:23 fs: break out inode operations from inode_lock V3 Dave Chinner
` (2 preceding siblings ...)
2010-10-29 3:23 ` [PATCH 3/8] fs: Lock the inode LRU list separately Dave Chinner
@ 2010-10-29 3:23 ` Dave Chinner
2010-10-29 5:14 ` Al Viro
2010-10-29 3:23 ` [PATCH 5/8] fs: move i_sb_list out from under inode_lock Dave Chinner
` (3 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 3:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
From: Dave Chinner <dchinner@redhat.com>
Now that inode state changes are protected by the inode->i_lock and
the inode LRU manipulations by the inode_lru_lock, we can remove the
inode_lock from prune_icache and the initial part of iput_final().
instead of using the inode_lock to protect the inode during
iput_final, use the inode->i_lock instead. This protects the inode
against new references being taken while we change the inode state
to I_FREEING, as well as preventing prune_icache from grabbing the
inode while we are manipulating it. Hence we no longer need the
inode_lock in iput_final prior to setting I_FREEING on the inode.
For prune_icache, we no longer need the inode_lock to protect the
LRU list, and the inodes themselves are protected against freeing
races by the inode->i_lock. Hence we can lift the inode_lock from
prune_icache as well.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/inode.c | 17 +++--------------
1 files changed, 3 insertions(+), 14 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index f00ae32..b6a5dfe 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -618,7 +618,6 @@ static void prune_icache(int nr_to_scan)
unsigned long reap = 0;
down_read(&iprune_sem);
- spin_lock(&inode_lock);
spin_lock(&inode_lru_lock);
for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) {
struct inode *inode;
@@ -644,8 +643,8 @@ static void prune_icache(int nr_to_scan)
*/
if (atomic_read(&inode->i_count) ||
(inode->i_state & ~I_REFERENCED)) {
- spin_unlock(&inode->i_lock);
list_del_init(&inode->i_lru);
+ spin_unlock(&inode->i_lock);
percpu_counter_dec(&nr_inodes_unused);
continue;
}
@@ -653,20 +652,18 @@ static void prune_icache(int nr_to_scan)
/* recently referenced inodes get one more pass */
if (inode->i_state & I_REFERENCED) {
inode->i_state &= ~I_REFERENCED;
- spin_unlock(&inode->i_lock);
list_move(&inode->i_lru, &inode_lru);
+ spin_unlock(&inode->i_lock);
continue;
}
if (inode_has_buffers(inode) || inode->i_data.nrpages) {
__iget(inode);
spin_unlock(&inode->i_lock);
spin_unlock(&inode_lru_lock);
- spin_unlock(&inode_lock);
if (remove_inode_buffers(inode))
reap += invalidate_mapping_pages(&inode->i_data,
0, -1);
iput(inode);
- spin_lock(&inode_lock);
spin_lock(&inode_lru_lock);
if (inode != list_entry(inode_lru.next,
@@ -696,7 +693,6 @@ static void prune_icache(int nr_to_scan)
else
__count_vm_events(PGINODESTEAL, reap);
spin_unlock(&inode_lru_lock);
- spin_unlock(&inode_lock);
dispose_list(&freeable);
up_read(&iprune_sem);
@@ -1054,7 +1050,6 @@ EXPORT_SYMBOL(iunique);
struct inode *igrab(struct inode *inode)
{
- spin_lock(&inode_lock);
spin_lock(&inode->i_lock);
if (!(inode->i_state & (I_FREEING|I_WILL_FREE))) {
__iget(inode);
@@ -1068,7 +1063,6 @@ struct inode *igrab(struct inode *inode)
*/
inode = NULL;
}
- spin_unlock(&inode_lock);
return inode;
}
EXPORT_SYMBOL(igrab);
@@ -1412,7 +1406,6 @@ static void iput_final(struct inode *inode)
const struct super_operations *op = inode->i_sb->s_op;
int drop;
- spin_lock(&inode->i_lock);
WARN_ON(inode->i_state & I_NEW);
if (op && op->drop_inode)
@@ -1425,16 +1418,13 @@ static void iput_final(struct inode *inode)
if (!(inode->i_state & (I_DIRTY|I_SYNC)))
inode_lru_list_add(inode);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
return;
}
if (!drop) {
inode->i_state |= I_WILL_FREE;
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
write_inode_now(inode, 1);
- spin_lock(&inode_lock);
spin_lock(&inode->i_lock);
WARN_ON(inode->i_state & I_NEW);
inode->i_state &= ~I_WILL_FREE;
@@ -1443,7 +1433,6 @@ static void iput_final(struct inode *inode)
inode->i_state |= I_FREEING;
inode_lru_list_del(inode);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
evict(inode);
}
@@ -1462,7 +1451,7 @@ void iput(struct inode *inode)
if (inode) {
BUG_ON(inode->i_state & I_CLEAR);
- if (atomic_dec_and_lock(&inode->i_count, &inode_lock))
+ if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock))
iput_final(inode);
}
}
--
1.7.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/8] fs: move i_sb_list out from under inode_lock
2010-10-29 3:23 fs: break out inode operations from inode_lock V3 Dave Chinner
` (3 preceding siblings ...)
2010-10-29 3:23 ` [PATCH 4/8] fs: remove inode_lock from iput_final and prune_icache Dave Chinner
@ 2010-10-29 3:23 ` Dave Chinner
2010-10-29 3:23 ` [PATCH 6/8] fs: move i_wb_list " Dave Chinner
` (2 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 3:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
From: Dave Chinner <dchinner@redhat.com>
Protect the per-sb inode list with a new global lock
inode_sb_list_lock and use it to protect the list manipulations and
traversals. This lock replaces the inode_lock as the inodes on the
list can be validity checked while holding the inode->i_lock and
hence the inode_lock is no longer needed to protect the list.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/drop_caches.c | 9 +++++----
fs/fs-writeback.c | 21 +++++++++++----------
fs/inode.c | 43 +++++++++++++++++++++++--------------------
fs/internal.h | 2 ++
fs/notify/inode_mark.c | 15 ++++++++-------
fs/quota/dquot.c | 28 ++++++++++++++++------------
6 files changed, 65 insertions(+), 53 deletions(-)
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 62dd8ee..86cd2f1 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -8,6 +8,7 @@
#include <linux/writeback.h>
#include <linux/sysctl.h>
#include <linux/gfp.h>
+#include "internal.h"
/* A global variable is a bit ugly, but it keeps the code simple */
int sysctl_drop_caches;
@@ -16,7 +17,7 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
{
struct inode *inode, *toput_inode = NULL;
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
spin_lock(&inode->i_lock);
if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
@@ -26,13 +27,13 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
}
__iget(inode);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
invalidate_mapping_pages(inode->i_mapping, 0, -1);
iput(toput_inode);
toput_inode = inode;
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
iput(toput_inode);
}
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 50de4f1..7707a62 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1053,7 +1053,7 @@ static void wait_sb_inodes(struct super_block *sb)
*/
WARN_ON(!rwsem_is_locked(&sb->s_umount));
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
/*
* Data integrity sync. Must wait for all pages under writeback,
@@ -1073,14 +1073,15 @@ static void wait_sb_inodes(struct super_block *sb)
}
__iget(inode);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
+
/*
- * We hold a reference to 'inode' so it couldn't have
- * been removed from s_inodes list while we dropped the
- * inode_lock. We cannot iput the inode now as we can
- * be holding the last reference and we cannot iput it
- * under inode_lock. So we keep the reference and iput
- * it later.
+ * We hold a reference to 'inode' so it couldn't have been
+ * removed from s_inodes list while we dropped the
+ * inode_sb_list_lock. We cannot iput the inode now as we can
+ * be holding the last reference and we cannot iput it under
+ * inode_sb_list_lock. So we keep the reference and iput it
+ * later.
*/
iput(old_inode);
old_inode = inode;
@@ -1089,9 +1090,9 @@ static void wait_sb_inodes(struct super_block *sb)
cond_resched();
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
iput(old_inode);
}
diff --git a/fs/inode.c b/fs/inode.c
index b6a5dfe..52946b6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -32,10 +32,15 @@
* inode->i_state, __iget()
* inode_lru_lock protects:
* inode_lru, inode->i_lru
+ * inode_sb_list_lock protects:
+ * sb->s_inodes, inode->i_sb_list
*
* Lock ordering:
* inode_lock
* inode->i_lock
+ *
+ * inode_sb_list_lock
+ * inode->i_lock
* inode_lru_lock
*/
@@ -97,6 +102,8 @@ static struct hlist_head *inode_hashtable __read_mostly;
*/
DEFINE_SPINLOCK(inode_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_sb_list_lock);
+
/*
* iprune_sem provides exclusion between the kswapd or try_to_free_pages
* icache shrinking path, and the umount path. Without this exclusion,
@@ -356,26 +363,23 @@ static void inode_lru_list_del(struct inode *inode)
spin_unlock(&inode_lru_lock);
}
-static inline void __inode_sb_list_add(struct inode *inode)
-{
- list_add(&inode->i_sb_list, &inode->i_sb->s_inodes);
-}
-
/**
* inode_sb_list_add - add inode to the superblock list of inodes
* @inode: inode to add
*/
void inode_sb_list_add(struct inode *inode)
{
- spin_lock(&inode_lock);
- __inode_sb_list_add(inode);
- spin_unlock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
+ list_add(&inode->i_sb_list, &inode->i_sb->s_inodes);
+ spin_unlock(&inode_sb_list_lock);
}
EXPORT_SYMBOL_GPL(inode_sb_list_add);
-static inline void __inode_sb_list_del(struct inode *inode)
+static inline void inode_sb_list_del(struct inode *inode)
{
+ spin_lock(&inode_sb_list_lock);
list_del_init(&inode->i_sb_list);
+ spin_unlock(&inode_sb_list_lock);
}
static unsigned long hash(struct super_block *sb, unsigned long hashval)
@@ -456,9 +460,10 @@ static void evict(struct inode *inode)
spin_lock(&inode_lock);
list_del_init(&inode->i_wb_list);
- __inode_sb_list_del(inode);
spin_unlock(&inode_lock);
+ inode_sb_list_del(inode);
+
if (op->evict_inode) {
op->evict_inode(inode);
} else {
@@ -516,7 +521,7 @@ void evict_inodes(struct super_block *sb)
down_write(&iprune_sem);
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
if (atomic_read(&inode->i_count))
continue;
@@ -534,7 +539,7 @@ void evict_inodes(struct super_block *sb)
list_add(&inode->i_lru, &dispose);
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
dispose_list(&dispose);
up_write(&iprune_sem);
@@ -555,7 +560,7 @@ int invalidate_inodes(struct super_block *sb)
down_write(&iprune_sem);
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
spin_lock(&inode->i_lock);
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
@@ -574,7 +579,7 @@ int invalidate_inodes(struct super_block *sb)
list_add(&inode->i_lru, &dispose);
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
dispose_list(&dispose);
up_write(&iprune_sem);
@@ -839,16 +844,14 @@ struct inode *new_inode(struct super_block *sb)
{
struct inode *inode;
- spin_lock_prefetch(&inode_lock);
+ spin_lock_prefetch(&inode_sb_list_lock);
inode = alloc_inode(sb);
if (inode) {
- spin_lock(&inode_lock);
spin_lock(&inode->i_lock);
inode->i_state = 0;
spin_unlock(&inode->i_lock);
- __inode_sb_list_add(inode);
- spin_unlock(&inode_lock);
+ inode_sb_list_add(inode);
}
return inode;
}
@@ -917,7 +920,7 @@ static struct inode *get_new_inode(struct super_block *sb,
inode->i_state = I_NEW;
spin_unlock(&inode->i_lock);
hlist_add_head(&inode->i_hash, head);
- __inode_sb_list_add(inode);
+ inode_sb_list_add(inode);
spin_unlock(&inode_lock);
/* Return the locked inode with I_NEW set, the
@@ -966,7 +969,7 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
inode->i_state = I_NEW;
spin_unlock(&inode->i_lock);
hlist_add_head(&inode->i_hash, head);
- __inode_sb_list_add(inode);
+ inode_sb_list_add(inode);
spin_unlock(&inode_lock);
/* Return the locked inode with I_NEW set, the
diff --git a/fs/internal.h b/fs/internal.h
index ebad3b9..493baa1 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -108,3 +108,5 @@ extern void release_open_intent(struct nameidata *);
extern int get_nr_dirty_inodes(void);
extern int evict_inodes(struct super_block *);
extern int invalidate_inodes(struct super_block *);
+
+extern spinlock_t inode_sb_list_lock;
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 9c0fb09..06c8216 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -29,6 +29,8 @@
#include <linux/fsnotify_backend.h>
#include "fsnotify.h"
+#include "../internal.h"
+
/*
* Recalculate the mask of events relevant to a given inode locked.
*/
@@ -232,15 +234,14 @@ out:
* fsnotify_unmount_inodes - an sb is unmounting. handle any watched inodes.
* @list: list of inodes being unmounted (sb->s_inodes)
*
- * Called with inode_lock held, protecting the unmounting super block's list
- * of inodes, and with iprune_mutex held, keeping shrink_icache_memory() at bay.
- * We temporarily drop inode_lock, however, and CAN block.
+ * Called during unmount with no locks held, so needs to be safe against
+ * concurrent modifiers. We temporarily drop inode_sb_list_lock and CAN block.
*/
void fsnotify_unmount_inodes(struct list_head *list)
{
struct inode *inode, *next_i, *need_iput = NULL;
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
list_for_each_entry_safe(inode, next_i, list, i_sb_list) {
struct inode *need_iput_tmp;
@@ -293,7 +294,7 @@ void fsnotify_unmount_inodes(struct list_head *list)
* will be added since the umount has begun. Finally,
* iprune_mutex keeps shrink_icache_memory() away.
*/
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
if (need_iput_tmp)
iput(need_iput_tmp);
@@ -305,7 +306,7 @@ void fsnotify_unmount_inodes(struct list_head *list)
iput(inode);
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
}
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index a2d89f8..440f64c 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -76,7 +76,7 @@
#include <linux/buffer_head.h>
#include <linux/capability.h>
#include <linux/quotaops.h>
-#include <linux/writeback.h> /* for inode_lock, oddly enough.. */
+#include "../internal.h" /* ugh */
#include <asm/uaccess.h>
@@ -896,7 +896,7 @@ static void add_dquot_ref(struct super_block *sb, int type)
int reserved = 0;
#endif
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
spin_lock(&inode->i_lock);
if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
@@ -911,19 +911,23 @@ static void add_dquot_ref(struct super_block *sb, int type)
#endif
__iget(inode);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
iput(old_inode);
__dquot_initialize(inode, type);
- /* We hold a reference to 'inode' so it couldn't have been
- * removed from s_inodes list while we dropped the inode_lock.
- * We cannot iput the inode now as we can be holding the last
- * reference and we cannot iput it under inode_lock. So we
- * keep the reference and iput it later. */
+
+ /*
+ * We hold a reference to 'inode' so it couldn't have been
+ * removed from s_inodes list while we dropped the
+ * inode_sb_list_lock We cannot iput the inode now as we can be
+ * holding the last reference and we cannot iput it under
+ * inode_sb_list_lock. So we keep the reference and iput it
+ * later.
+ */
old_inode = inode;
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
iput(old_inode);
#ifdef CONFIG_QUOTA_DEBUG
@@ -1004,7 +1008,7 @@ static void remove_dquot_ref(struct super_block *sb, int type,
struct inode *inode;
int reserved = 0;
- spin_lock(&inode_lock);
+ spin_lock(&inode_sb_list_lock);
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
/*
* We have to scan also I_NEW inodes because they can already
@@ -1018,7 +1022,7 @@ static void remove_dquot_ref(struct super_block *sb, int type,
remove_inode_dquot_ref(inode, type, tofree_head);
}
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_sb_list_lock);
#ifdef CONFIG_QUOTA_DEBUG
if (reserved) {
printk(KERN_WARNING "VFS (%s): Writes happened after quota"
--
1.7.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 6/8] fs: move i_wb_list out from under inode_lock
2010-10-29 3:23 fs: break out inode operations from inode_lock V3 Dave Chinner
` (4 preceding siblings ...)
2010-10-29 3:23 ` [PATCH 5/8] fs: move i_sb_list out from under inode_lock Dave Chinner
@ 2010-10-29 3:23 ` Dave Chinner
2010-10-29 3:23 ` [PATCH 7/8] fs: rename inode_lock to inode_hash_lock Dave Chinner
2010-10-29 3:23 ` [PATCH 8/8] fs: Clean up documentation references to inode_lock Dave Chinner
7 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 3:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
From: Dave Chinner <dchinner@redhat.com>
Protect the inode writeback list with a new global lock
inode_wb_list_lock and use it to protect the list manipulations and
traversals. This lock replaces the inode_lock as the inodes on the
list can be validity checked while holding the inode->i_lock and
hence the inode_lock is no longer needed to protect the list.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/block_dev.c | 4 +-
fs/fs-writeback.c | 76 ++++++++++++++++++++++++++-------------------
fs/inode.c | 12 +++++--
fs/internal.h | 7 +++-
include/linux/writeback.h | 1 +
mm/backing-dev.c | 8 ++--
6 files changed, 65 insertions(+), 43 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index dea3b62..a94cbf0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -56,11 +56,11 @@ EXPORT_SYMBOL(I_BDEV);
static void bdev_inode_switch_bdi(struct inode *inode,
struct backing_dev_info *dst)
{
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
inode->i_data.backing_dev_info = dst;
if (inode->i_state & I_DIRTY)
list_move(&inode->i_wb_list, &dst->wb.b_dirty);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
}
static sector_t max_block(struct block_device *bdev)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 7707a62..27c029e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -162,6 +162,17 @@ void bdi_start_background_writeback(struct backing_dev_info *bdi)
}
/*
+ * Remove the inode from the writeback list it is on.
+ */
+void inode_wb_list_del(struct inode *inode)
+{
+ spin_lock(&inode_wb_list_lock);
+ list_del_init(&inode->i_wb_list);
+ spin_unlock(&inode_wb_list_lock);
+}
+
+
+/*
* Redirty an inode: set its when-it-was dirtied timestamp and move it to the
* furthest end of its superblock's dirty-inode list.
*
@@ -174,6 +185,7 @@ static void redirty_tail(struct inode *inode)
{
struct bdi_writeback *wb = &inode_to_bdi(inode)->wb;
+ assert_spin_locked(&inode_wb_list_lock);
if (!list_empty(&wb->b_dirty)) {
struct inode *tail;
@@ -191,14 +203,17 @@ static void requeue_io(struct inode *inode)
{
struct bdi_writeback *wb = &inode_to_bdi(inode)->wb;
+ assert_spin_locked(&inode_wb_list_lock);
list_move(&inode->i_wb_list, &wb->b_more_io);
}
static void inode_sync_complete(struct inode *inode)
{
/*
- * Prevent speculative execution through spin_unlock(&inode_lock);
+ * Prevent speculative execution through
+ * spin_unlock(&inode_wb_list_lock);
*/
+
smp_mb();
wake_up_bit(&inode->i_state, __I_SYNC);
}
@@ -272,6 +287,7 @@ static void move_expired_inodes(struct list_head *delaying_queue,
*/
static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this)
{
+ assert_spin_locked(&inode_wb_list_lock);
list_splice_init(&wb->b_more_io, &wb->b_io);
move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this);
}
@@ -294,25 +310,23 @@ static void inode_wait_for_writeback(struct inode *inode)
wqh = bit_waitqueue(&inode->i_state, __I_SYNC);
while (inode->i_state & I_SYNC) {
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
__wait_on_bit(wqh, &wq, inode_wait, TASK_UNINTERRUPTIBLE);
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
spin_lock(&inode->i_lock);
}
}
/*
- * Write out an inode's dirty pages. Called under inode_lock. Either the
- * caller has ref on the inode (either via __iget or via syscall against an fd)
- * or the inode has I_WILL_FREE set (via generic_forget_inode)
+ * Write out an inode's dirty pages. Called under inode_wb_list_lock. Either
+ * the caller has an active reference on the inode or the inode has I_WILL_FREE
+ * set.
*
* If `wait' is set, wait on the writeout.
*
* The whole writeout design is quite complex and fragile. We want to avoid
* starvation of particular inodes when others are being redirtied, prevent
* livelocks, etc.
- *
- * Called under inode_lock.
*/
static int
writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
@@ -354,7 +368,7 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
inode->i_state |= I_SYNC;
inode->i_state &= ~I_DIRTY_PAGES;
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
ret = do_writepages(mapping, wbc);
@@ -374,12 +388,10 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
* due to delalloc, clear dirty metadata flags right before
* write_inode()
*/
- spin_lock(&inode_lock);
spin_lock(&inode->i_lock);
dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~(I_DIRTY_SYNC | I_DIRTY_DATASYNC);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
int err = write_inode(inode, wbc);
@@ -387,7 +399,7 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
ret = err;
}
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
spin_lock(&inode->i_lock);
inode->i_state &= ~I_SYNC;
if (!(inode->i_state & I_FREEING)) {
@@ -529,10 +541,10 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb,
*/
redirty_tail(inode);
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
iput(inode);
cond_resched();
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
if (wbc->nr_to_write <= 0) {
wbc->more_io = 1;
return 1;
@@ -551,7 +563,7 @@ void writeback_inodes_wb(struct bdi_writeback *wb,
if (!wbc->wb_start)
wbc->wb_start = jiffies; /* livelock avoidance */
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
if (!wbc->for_kupdate || list_empty(&wb->b_io))
queue_io(wb, wbc->older_than_this);
@@ -569,7 +581,7 @@ void writeback_inodes_wb(struct bdi_writeback *wb,
if (ret)
break;
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
/* Leave any unwritten inodes on b_io */
}
@@ -578,11 +590,11 @@ static void __writeback_inodes_sb(struct super_block *sb,
{
WARN_ON(!rwsem_is_locked(&sb->s_umount));
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
if (!wbc->for_kupdate || list_empty(&wb->b_io))
queue_io(wb, wbc->older_than_this);
writeback_sb_inodes(sb, wb, wbc, true);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
}
/*
@@ -692,7 +704,7 @@ static long wb_writeback(struct bdi_writeback *wb,
* become available for writeback. Otherwise
* we'll just busyloop.
*/
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
if (!list_empty(&wb->b_more_io)) {
inode = wb_inode(wb->b_more_io.prev);
trace_wbc_writeback_wait(&wbc, wb->bdi);
@@ -700,7 +712,7 @@ static long wb_writeback(struct bdi_writeback *wb,
inode_wait_for_writeback(inode);
spin_unlock(&inode->i_lock);
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
}
return wrote;
@@ -939,7 +951,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
{
struct super_block *sb = inode->i_sb;
struct backing_dev_info *bdi = NULL;
- bool wakeup_bdi = false;
/*
* Don't do this for I_DIRTY_PAGES - that doesn't actually
@@ -963,7 +974,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
if (unlikely(block_dump))
block_dump___mark_inode_dirty(inode);
- spin_lock(&inode_lock);
spin_lock(&inode->i_lock);
if ((inode->i_state & flags) != flags) {
const int was_dirty = inode->i_state & I_DIRTY;
@@ -989,12 +999,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
if (inode->i_state & I_FREEING)
goto out_unlock_inode;
- spin_unlock(&inode->i_lock);
/*
* If the inode was already on b_dirty/b_io/b_more_io, don't
* reposition it (that would break b_dirty time-ordering).
*/
if (!was_dirty) {
+ bool wakeup_bdi = false;
bdi = inode_to_bdi(inode);
if (bdi_cap_writeback_dirty(bdi)) {
@@ -1011,18 +1021,20 @@ void __mark_inode_dirty(struct inode *inode, int flags)
wakeup_bdi = true;
}
+ spin_unlock(&inode->i_lock);
+ spin_lock(&inode_wb_list_lock);
inode->dirtied_when = jiffies;
list_move(&inode->i_wb_list, &bdi->wb.b_dirty);
+ spin_unlock(&inode_wb_list_lock);
+
+ if (wakeup_bdi)
+ bdi_wakeup_thread_delayed(bdi);
+ return;
}
- goto out;
}
out_unlock_inode:
spin_unlock(&inode->i_lock);
-out:
- spin_unlock(&inode_lock);
- if (wakeup_bdi)
- bdi_wakeup_thread_delayed(bdi);
}
EXPORT_SYMBOL(__mark_inode_dirty);
@@ -1195,9 +1207,9 @@ int write_inode_now(struct inode *inode, int sync)
wbc.nr_to_write = 0;
might_sleep();
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
ret = writeback_single_inode(inode, &wbc);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
if (sync)
inode_sync_wait(inode);
return ret;
@@ -1219,9 +1231,9 @@ int sync_inode(struct inode *inode, struct writeback_control *wbc)
{
int ret;
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
ret = writeback_single_inode(inode, wbc);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
return ret;
}
EXPORT_SYMBOL(sync_inode);
diff --git a/fs/inode.c b/fs/inode.c
index 52946b6..e9c7157 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -24,6 +24,7 @@
#include <linux/mount.h>
#include <linux/async.h>
#include <linux/posix_acl.h>
+#include "internal.h"
/*
* inode locking rules.
@@ -34,6 +35,8 @@
* inode_lru, inode->i_lru
* inode_sb_list_lock protects:
* sb->s_inodes, inode->i_sb_list
+ * inode_sb_list_lock protects:
+ * bdi->wb.b_{dirty,io,more_io}, inode->i_wb_list
*
* Lock ordering:
* inode_lock
@@ -42,6 +45,9 @@
* inode_sb_list_lock
* inode->i_lock
* inode_lru_lock
+ *
+ * inode_wb_list_lock
+ * inode->i_lock
*/
/*
@@ -103,6 +109,7 @@ static struct hlist_head *inode_hashtable __read_mostly;
DEFINE_SPINLOCK(inode_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_sb_list_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_wb_list_lock);
/*
* iprune_sem provides exclusion between the kswapd or try_to_free_pages
@@ -458,10 +465,7 @@ static void evict(struct inode *inode)
BUG_ON(!(inode->i_state & I_FREEING));
BUG_ON(!list_empty(&inode->i_lru));
- spin_lock(&inode_lock);
- list_del_init(&inode->i_wb_list);
- spin_unlock(&inode_lock);
-
+ inode_wb_list_del(inode);
inode_sb_list_del(inode);
if (op->evict_inode) {
diff --git a/fs/internal.h b/fs/internal.h
index 493baa1..fd3f700 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -106,7 +106,12 @@ extern void release_open_intent(struct nameidata *);
* inode.c
*/
extern int get_nr_dirty_inodes(void);
-extern int evict_inodes(struct super_block *);
+extern void evict_inodes(struct super_block *);
extern int invalidate_inodes(struct super_block *);
extern spinlock_t inode_sb_list_lock;
+
+/*
+ * fs-writeback.c
+ */
+extern void inode_wb_list_del(struct inode *inode);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 242b6f8..e78a240 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -10,6 +10,7 @@
struct backing_dev_info;
extern spinlock_t inode_lock;
+extern spinlock_t inode_wb_list_lock;
/*
* fs/fs-writeback.c
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 15d5097..168ba5e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -73,14 +73,14 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
struct inode *inode;
nr_wb = nr_dirty = nr_io = nr_more_io = 0;
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
list_for_each_entry(inode, &wb->b_dirty, i_wb_list)
nr_dirty++;
list_for_each_entry(inode, &wb->b_io, i_wb_list)
nr_io++;
list_for_each_entry(inode, &wb->b_more_io, i_wb_list)
nr_more_io++;
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
global_dirty_limits(&background_thresh, &dirty_thresh);
bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
@@ -682,11 +682,11 @@ void bdi_destroy(struct backing_dev_info *bdi)
if (bdi_has_dirty_io(bdi)) {
struct bdi_writeback *dst = &default_backing_dev_info.wb;
- spin_lock(&inode_lock);
+ spin_lock(&inode_wb_list_lock);
list_splice(&bdi->wb.b_dirty, &dst->b_dirty);
list_splice(&bdi->wb.b_io, &dst->b_io);
list_splice(&bdi->wb.b_more_io, &dst->b_more_io);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_wb_list_lock);
}
bdi_unregister(bdi);
--
1.7.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 7/8] fs: rename inode_lock to inode_hash_lock
2010-10-29 3:23 fs: break out inode operations from inode_lock V3 Dave Chinner
` (5 preceding siblings ...)
2010-10-29 3:23 ` [PATCH 6/8] fs: move i_wb_list " Dave Chinner
@ 2010-10-29 3:23 ` Dave Chinner
2010-10-29 3:23 ` [PATCH 8/8] fs: Clean up documentation references to inode_lock Dave Chinner
7 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 3:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
From: Dave Chinner <dchinner@redhat.com>
All that remains of the inode_lock is protecting the inode hash list
manipulation and traversals. Rename the inode_lock to
inode_hash_lock to reflect it's actual function.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/inode.c | 86 +++++++++++++++++++++++---------------------
fs/notify/inode_mark.c | 1 -
fs/notify/mark.c | 1 -
fs/notify/vfsmount_mark.c | 1 -
include/linux/writeback.h | 1 -
5 files changed, 45 insertions(+), 45 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index e9c7157..468a34c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -37,10 +37,10 @@
* sb->s_inodes, inode->i_sb_list
* inode_sb_list_lock protects:
* bdi->wb.b_{dirty,io,more_io}, inode->i_wb_list
+ * inode_hash_lock protects:
+ * inode_hashtable, inode->i_hash
*
* Lock ordering:
- * inode_lock
- * inode->i_lock
*
* inode_sb_list_lock
* inode->i_lock
@@ -48,6 +48,13 @@
*
* inode_wb_list_lock
* inode->i_lock
+ *
+ * inode_hash_lock
+ * inode_sb_list_lock
+ * inode->i_lock
+ *
+ * iunique_lock
+ * inode_hash_lock
*/
/*
@@ -83,6 +90,8 @@
static unsigned int i_hash_mask __read_mostly;
static unsigned int i_hash_shift __read_mostly;
+static struct hlist_head *inode_hashtable __read_mostly;
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock);
/*
* Each inode can be on two separate lists. One is
@@ -98,15 +107,6 @@ static unsigned int i_hash_shift __read_mostly;
static LIST_HEAD(inode_lru);
static DEFINE_SPINLOCK(inode_lru_lock);
-static struct hlist_head *inode_hashtable __read_mostly;
-
-/*
- * A simple spinlock to protect the list manipulations.
- *
- * NOTE! You also have to own the lock if you change
- * the i_state of an inode while it is in use..
- */
-DEFINE_SPINLOCK(inode_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_sb_list_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_wb_list_lock);
@@ -411,9 +411,9 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval)
{
struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval);
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
hlist_add_head(&inode->i_hash, b);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
}
EXPORT_SYMBOL(__insert_inode_hash);
@@ -425,9 +425,9 @@ EXPORT_SYMBOL(__insert_inode_hash);
*/
void remove_inode_hash(struct inode *inode)
{
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
hlist_del_init(&inode->i_hash);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
}
EXPORT_SYMBOL(remove_inode_hash);
@@ -913,7 +913,7 @@ static struct inode *get_new_inode(struct super_block *sb,
if (inode) {
struct inode *old;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
/* We released the lock, so.. */
old = find_inode(sb, head, test, data);
if (!old) {
@@ -923,9 +923,10 @@ static struct inode *get_new_inode(struct super_block *sb,
spin_lock(&inode->i_lock);
inode->i_state = I_NEW;
spin_unlock(&inode->i_lock);
- hlist_add_head(&inode->i_hash, head);
+
inode_sb_list_add(inode);
- spin_unlock(&inode_lock);
+ hlist_add_head(&inode->i_hash, head);
+ spin_unlock(&inode_hash_lock);
/* Return the locked inode with I_NEW set, the
* caller is responsible for filling in the contents
@@ -938,7 +939,7 @@ static struct inode *get_new_inode(struct super_block *sb,
* us. Use the old inode instead of the one we just
* allocated.
*/
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
destroy_inode(inode);
inode = old;
wait_on_inode(inode);
@@ -946,7 +947,7 @@ static struct inode *get_new_inode(struct super_block *sb,
return inode;
set_failed:
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
destroy_inode(inode);
return NULL;
}
@@ -964,7 +965,7 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
if (inode) {
struct inode *old;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
/* We released the lock, so.. */
old = find_inode_fast(sb, head, ino);
if (!old) {
@@ -972,9 +973,10 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
inode->i_ino = ino;
inode->i_state = I_NEW;
spin_unlock(&inode->i_lock);
- hlist_add_head(&inode->i_hash, head);
+
inode_sb_list_add(inode);
- spin_unlock(&inode_lock);
+ hlist_add_head(&inode->i_hash, head);
+ spin_unlock(&inode_hash_lock);
/* Return the locked inode with I_NEW set, the
* caller is responsible for filling in the contents
@@ -987,7 +989,7 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
* us. Use the old inode instead of the one we just
* allocated.
*/
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
destroy_inode(inode);
inode = old;
wait_on_inode(inode);
@@ -1008,10 +1010,14 @@ static int test_inode_iunique(struct super_block *sb, unsigned long ino)
struct hlist_node *node;
struct inode *inode;
+ spin_lock(&inode_hash_lock);
hlist_for_each_entry(inode, node, b, i_hash) {
- if (inode->i_ino == ino && inode->i_sb == sb)
+ if (inode->i_ino == ino && inode->i_sb == sb) {
+ spin_unlock(&inode_hash_lock);
return 0;
+ }
}
+ spin_unlock(&inode_hash_lock);
return 1;
}
@@ -1041,7 +1047,6 @@ ino_t iunique(struct super_block *sb, ino_t max_reserved)
static unsigned int counter;
ino_t res;
- spin_lock(&inode_lock);
spin_lock(&iunique_lock);
do {
if (counter <= max_reserved)
@@ -1049,7 +1054,6 @@ ino_t iunique(struct super_block *sb, ino_t max_reserved)
res = counter++;
} while (!test_inode_iunique(sb, res));
spin_unlock(&iunique_lock);
- spin_unlock(&inode_lock);
return res;
}
@@ -1099,15 +1103,15 @@ static struct inode *ifind(struct super_block *sb,
{
struct inode *inode;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
inode = find_inode(sb, head, test, data);
if (inode) {
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
if (likely(wait))
wait_on_inode(inode);
return inode;
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return NULL;
}
@@ -1131,14 +1135,14 @@ static struct inode *ifind_fast(struct super_block *sb,
{
struct inode *inode;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
inode = find_inode_fast(sb, head, ino);
if (inode) {
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
wait_on_inode(inode);
return inode;
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return NULL;
}
@@ -1303,7 +1307,7 @@ int insert_inode_locked(struct inode *inode)
while (1) {
struct hlist_node *node;
struct inode *old = NULL;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
hlist_for_each_entry(old, node, head, i_hash) {
if (old->i_ino != ino)
continue;
@@ -1318,12 +1322,12 @@ int insert_inode_locked(struct inode *inode)
}
if (likely(!node)) {
hlist_add_head(&inode->i_hash, head);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return 0;
}
__iget(old);
spin_unlock(&old->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
wait_on_inode(old);
if (unlikely(!inode_unhashed(old))) {
iput(old);
@@ -1348,7 +1352,7 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
struct hlist_node *node;
struct inode *old = NULL;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
hlist_for_each_entry(old, node, head, i_hash) {
if (old->i_sb != sb)
continue;
@@ -1363,12 +1367,12 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
}
if (likely(!node)) {
hlist_add_head(&inode->i_hash, head);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return 0;
}
__iget(old);
spin_unlock(&old->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
wait_on_inode(old);
if (unlikely(!inode_unhashed(old))) {
iput(old);
@@ -1647,10 +1651,10 @@ static void __wait_on_freeing_inode(struct inode *inode)
wq = bit_waitqueue(&inode->i_state, __I_NEW);
prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
schedule();
finish_wait(wq, &wait.wait);
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
}
static __initdata unsigned long ihash_entries;
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 06c8216..a33b0f9 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -22,7 +22,6 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/spinlock.h>
-#include <linux/writeback.h> /* for inode_lock */
#include <asm/atomic.h>
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 325185e..50c0085 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -91,7 +91,6 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/srcu.h>
-#include <linux/writeback.h> /* for inode_lock */
#include <asm/atomic.h>
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index 56772b5..6f8eefe 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -23,7 +23,6 @@
#include <linux/mount.h>
#include <linux/mutex.h>
#include <linux/spinlock.h>
-#include <linux/writeback.h> /* for inode_lock */
#include <asm/atomic.h>
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index e78a240..949bb4a 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -9,7 +9,6 @@
struct backing_dev_info;
-extern spinlock_t inode_lock;
extern spinlock_t inode_wb_list_lock;
/*
--
1.7.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 8/8] fs: Clean up documentation references to inode_lock
2010-10-29 3:23 fs: break out inode operations from inode_lock V3 Dave Chinner
` (6 preceding siblings ...)
2010-10-29 3:23 ` [PATCH 7/8] fs: rename inode_lock to inode_hash_lock Dave Chinner
@ 2010-10-29 3:23 ` Dave Chinner
2010-10-29 5:17 ` Al Viro
2010-10-29 7:46 ` Christoph Hellwig
7 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 3:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
From: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
Documentation/filesystems/Locking | 2 +-
Documentation/filesystems/porting | 11 ++++++-----
Documentation/filesystems/vfs.txt | 2 +-
fs/buffer.c | 2 +-
fs/logfs/inode.c | 2 +-
fs/notify/inode_mark.c | 5 ++---
fs/ntfs/inode.c | 4 ++--
include/linux/fs.h | 2 +-
include/linux/quotaops.h | 2 +-
mm/filemap.c | 8 +++++---
mm/rmap.c | 7 ++++---
11 files changed, 25 insertions(+), 22 deletions(-)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 8a817f6..5478e1f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -114,7 +114,7 @@ alloc_inode:
destroy_inode:
dirty_inode: (must not sleep)
write_inode:
-drop_inode: !!!inode_lock!!!
+drop_inode: !!!inode->i_lock!!!
evict_inode:
put_super: write
write_super: read
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index b12c895..2da4b44 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -299,11 +299,12 @@ be used instead. It gets called whenever the inode is evicted, whether it has
remaining links or not. Caller does *not* evict the pagecache or inode-associated
metadata buffers; getting rid of those is responsibility of method, as it had
been for ->delete_inode().
- ->drop_inode() returns int now; it's called on final iput() with inode_lock
-held and it returns true if filesystems wants the inode to be dropped. As before,
-generic_drop_inode() is still the default and it's been updated appropriately.
-generic_delete_inode() is also alive and it consists simply of return 1. Note that
-all actual eviction work is done by caller after ->drop_inode() returns.
+ ->drop_inode() returns int now; it's called on final iput() with
+inode->i_lock held and it returns true if the filesystem wants the inode to be
+dropped. As before, generic_drop_inode() is still the default and it's been
+updated appropriately. generic_delete_inode() is also alive and it consists
+simply of return 1. Note that all actual eviction work is done by caller after
+->drop_inode() returns.
clear_inode() is gone; use end_writeback() instead. As before, it must
be called exactly once on each call of ->evict_inode() (as it used to be for
each call of ->delete_inode()). Unlike before, if you are using inode-associated
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index ed7e5ef..046792a 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -246,7 +246,7 @@ or bottom half).
should be synchronous or not, not all filesystems check this flag.
drop_inode: called when the last access to the inode is dropped,
- with the inode_lock spinlock held.
+ with the inode->i_lock spinlock held.
This method should be either NULL (normal UNIX filesystem
semantics) or "generic_delete_inode" (for filesystems that do not
diff --git a/fs/buffer.c b/fs/buffer.c
index d895d9f..dd24081 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1144,7 +1144,7 @@ __getblk_slow(struct block_device *bdev, sector_t block, int size)
* inode list.
*
* mark_buffer_dirty() is atomic. It takes bh->b_page->mapping->private_lock,
- * mapping->tree_lock and the global inode_lock.
+ * mapping->tree_lock, inode->i_lock and the global inode_wb_list_lock.
*/
void mark_buffer_dirty(struct buffer_head *bh)
{
diff --git a/fs/logfs/inode.c b/fs/logfs/inode.c
index d8c71ec..38c559e 100644
--- a/fs/logfs/inode.c
+++ b/fs/logfs/inode.c
@@ -286,7 +286,7 @@ static int logfs_write_inode(struct inode *inode, struct writeback_control *wbc)
return ret;
}
-/* called with inode_lock held */
+/* called with inode->i_lock held */
static int logfs_drop_inode(struct inode *inode)
{
struct logfs_super *super = logfs_super(inode->i_sb);
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index a33b0f9..b6beba0 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -288,10 +288,9 @@ void fsnotify_unmount_inodes(struct list_head *list)
}
/*
- * We can safely drop inode_lock here because we hold
+ * We can safely drop inode_sb_list_lock here because we hold
* references on both inode and next_i. Also no new inodes
- * will be added since the umount has begun. Finally,
- * iprune_mutex keeps shrink_icache_memory() away.
+ * will be added since the umount has begun.
*/
spin_unlock(&inode_sb_list_lock);
diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c
index 93622b1..a7b12d4 100644
--- a/fs/ntfs/inode.c
+++ b/fs/ntfs/inode.c
@@ -54,7 +54,7 @@
*
* Return 1 if the attributes match and 0 if not.
*
- * NOTE: This function runs with the inode_lock spin lock held so it is not
+ * NOTE: This function runs with the inode->i_lock spin lock held so it is not
* allowed to sleep.
*/
int ntfs_test_inode(struct inode *vi, ntfs_attr *na)
@@ -98,7 +98,7 @@ int ntfs_test_inode(struct inode *vi, ntfs_attr *na)
*
* Return 0 on success and -errno on error.
*
- * NOTE: This function runs with the inode_lock spin lock held so it is not
+ * NOTE: This function runs with the inode->i_lock spin lock held so it is not
* allowed to sleep. (Hence the GFP_ATOMIC allocation.)
*/
static int ntfs_init_locked_inode(struct inode *vi, ntfs_attr *na)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f300a65..913f0ad 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1595,7 +1595,7 @@ struct super_operations {
};
/*
- * Inode state bits. Protected by inode_lock.
+ * Inode state bits. Protected by inode->i_lock.
*
* Three bits determine the dirty state of the inode, I_DIRTY_SYNC,
* I_DIRTY_DATASYNC and I_DIRTY_PAGES.
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index d1a9193..bf230b6 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -278,7 +278,7 @@ static inline int dquot_alloc_space(struct inode *inode, qsize_t nr)
/*
* Mark inode fully dirty. Since we are allocating blocks, inode
* would become fully dirty soon anyway and it reportedly
- * reduces inode_lock contention.
+ * reduces lock contention.
*/
mark_inode_dirty(inode);
}
diff --git a/mm/filemap.c b/mm/filemap.c
index 3d4df44..ae5b156 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -80,7 +80,7 @@
* ->i_mutex
* ->i_alloc_sem (various)
*
- * ->inode_lock
+ * ->inode_wb_list_lock
* ->sb_lock (fs/fs-writeback.c)
* ->mapping->tree_lock (__sync_single_inode)
*
@@ -98,8 +98,10 @@
* ->zone.lru_lock (check_pte_range->isolate_lru_page)
* ->private_lock (page_remove_rmap->set_page_dirty)
* ->tree_lock (page_remove_rmap->set_page_dirty)
- * ->inode_lock (page_remove_rmap->set_page_dirty)
- * ->inode_lock (zap_pte_range->set_page_dirty)
+ * ->inode->i_lock (page_remove_rmap->set_page_dirty)
+ * ->inode_wb_list_lock (page_remove_rmap->set_page_dirty)
+ * ->inode->i_lock (zap_pte_range->set_page_dirty)
+ * ->inode_wb_list_lock (zap_pte_range->set_page_dirty)
* ->private_lock (zap_pte_range->__set_page_dirty_buffers)
*
* ->task->proc_lock
diff --git a/mm/rmap.c b/mm/rmap.c
index 5f17fad..abdf401 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -31,11 +31,12 @@
* swap_lock (in swap_duplicate, swap_info_get)
* mmlist_lock (in mmput, drain_mmlist and others)
* mapping->private_lock (in __set_page_dirty_buffers)
- * inode_lock (in set_page_dirty's __mark_inode_dirty)
- * sb_lock (within inode_lock in fs/fs-writeback.c)
+ * inode->i_lock (in set_page_dirty's __mark_inode_dirty)
+ * inode_wb_list_lock (in set_page_dirty's __mark_inode_dirty)
+ * sb_lock (within inode_wb_list_lock in fs/fs-writeback.c)
* mapping->tree_lock (widely used, in set_page_dirty,
* in arch-dependent flush_dcache_mmap_lock,
- * within inode_lock in __sync_single_inode)
+ * within inode_wb_list_lock in __sync_single_inode)
*
* (code doesn't rely on that order so it could be switched around)
* ->tasklist_lock
--
1.7.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/8] fs: protect inode->i_state with inode->i_lock
2010-10-29 3:23 ` [PATCH 1/8] fs: protect inode->i_state with inode->i_lock Dave Chinner
@ 2010-10-29 4:29 ` Al Viro
0 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2010-10-29 4:29 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel
On Fri, Oct 29, 2010 at 02:23:33PM +1100, Dave Chinner wrote:
> @@ -293,9 +293,11 @@ static void inode_wait_for_writeback(struct inode *inode)
>
> wqh = bit_waitqueue(&inode->i_state, __I_SYNC);
> while (inode->i_state & I_SYNC) {
> + spin_unlock(&inode->i_lock);
Minor annoyance, but... Let's replace spaces with tab in that while.
> + spin_unlock(&inode->i_lock);
> +
> pages_skipped = wbc->pages_skipped;
> writeback_single_inode(inode, wbc);
Might make sense to lift locking i_lock into callers of
writeback_single_inode() (it has to grab the damn thing as soon as it's
called) and collapse it with spin_unlock() here. Separate patch.
> + spin_unlock(&inode->i_lock);
> /*
> * If the inode was already on b_dirty/b_io/b_more_io, don't
> * reposition it (that would break b_dirty time-ordering).
I'm not sure there's any point in dropping it here, actually.
> + spin_lock(&inode->i_lock);
> inode->i_state = I_FREEING | I_CLEAR;
> + spin_unlock(&inode->i_lock);
_Probably_ not needed here; we already had I_FREEING set by the time
we'd called that and no other thread will modify ->i_state of that
inode at that point. Check for I_CLEAR will be later in the same
thread, so no barriers are needed. Separate patch.
> @@ -552,8 +568,6 @@ int invalidate_inodes(struct super_block *sb)
> */
> list_move(&inode->i_lru, &dispose);
> list_del_init(&inode->i_wb_list);
> - if (!(inode->i_state & (I_DIRTY | I_SYNC)))
> - percpu_counter_dec(&nr_inodes_unused);
> }
> spin_unlock(&inode_lock);
Ho-hum...
I'm not sure we need that list_del_init() here, actually, seeing that
I_FREEING is already set... For later patch, anyway.
> @@ -917,10 +946,12 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
> /* We released the lock, so.. */
> old = find_inode_fast(sb, head, ino);
> if (!old) {
> + spin_lock(&inode->i_lock);
> inode->i_ino = ino;
> + inode->i_state = I_NEW;
> + spin_unlock(&inode->i_lock);
Almost certainly not needed; nobody can find this inode at that point.
> + * wake_up_bit(&inode->i_state, __I_NEW) after removing from the hash list
> + * will DTRT.
Add that i_lock is not regained.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/8] fs: factor inode disposal
2010-10-29 3:23 ` [PATCH 2/8] fs: factor inode disposal Dave Chinner
@ 2010-10-29 4:45 ` Al Viro
0 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2010-10-29 4:45 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel
On Fri, Oct 29, 2010 at 02:23:34PM +1100, Dave Chinner wrote:
> /*
> - * Move the inode off the IO lists and LRU once I_FREEING is
> - * set so that it won't get moved back on there if it is dirty.
> + * Move the inode off the LRU once I_FREEING is set so that it
> + * won't get moved back on.
> */
> list_move(&inode->i_lru, &dispose);
> - list_del_init(&inode->i_wb_list);
a) OK, so you've killed that list_del_init(). Good.
b) The comment is completely misleading. We don't put I_FREEING inodes back on
LRU (or IO) lists, no matter where we find them. What's happening here is
that we are collecting inodes to evict, period.
> - * Move the inode off the IO lists and LRU once I_FREEING is
> - * set so that it won't get moved back on there if it is dirty.
> + * Move the inode off the LRU once I_FREEING is
> + * set so that it won't get moved back on.
Ditto.
> /*
> - * Move the inode off the IO lists and LRU once I_FREEING is
> - * set so that it won't get moved back on there if it is dirty.
> + * Move the inode off the LRU once I_FREEING is
> + * set so that it won't get moved back on.
> */
Ditto.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/8] fs: remove inode_lock from iput_final and prune_icache
2010-10-29 3:23 ` [PATCH 4/8] fs: remove inode_lock from iput_final and prune_icache Dave Chinner
@ 2010-10-29 5:14 ` Al Viro
0 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2010-10-29 5:14 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel
On Fri, Oct 29, 2010 at 02:23:36PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Now that inode state changes are protected by the inode->i_lock and
> the inode LRU manipulations by the inode_lru_lock, we can remove the
> inode_lock from prune_icache and the initial part of iput_final().
>
> instead of using the inode_lock to protect the inode during
> iput_final, use the inode->i_lock instead. This protects the inode
> against new references being taken while we change the inode state
> to I_FREEING, as well as preventing prune_icache from grabbing the
> inode while we are manipulating it. Hence we no longer need the
> inode_lock in iput_final prior to setting I_FREEING on the inode.
>
> For prune_icache, we no longer need the inode_lock to protect the
> LRU list, and the inodes themselves are protected against freeing
> races by the inode->i_lock. Hence we can lift the inode_lock from
> prune_icache as well.
Careful. At that point we still rely on inode_lock to protect
inode_unhashed(). Note that ->drop_inode() uses it a lot and this step
moves it from inode_lock to ->i_lock.
What you need to do is pretty simple - make remove_inode_hash()
take both inode_lock (later - inode_hash_lock) and ->i_lock. That's
enough for inode_unhashed() protection, but I'd also hold ->i_lock on
insertions into hash. It's trivial (we hold ->i_lock just next to that
insertion) and would make for more consistent rules.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 8/8] fs: Clean up documentation references to inode_lock
2010-10-29 3:23 ` [PATCH 8/8] fs: Clean up documentation references to inode_lock Dave Chinner
@ 2010-10-29 5:17 ` Al Viro
2010-10-29 7:46 ` Christoph Hellwig
1 sibling, 0 replies; 16+ messages in thread
From: Al Viro @ 2010-10-29 5:17 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel
On Fri, Oct 29, 2010 at 02:23:40PM +1100, Dave Chinner wrote:
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 3d4df44..ae5b156 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -80,7 +80,7 @@
> * ->i_mutex
> * ->i_alloc_sem (various)
> *
> - * ->inode_lock
> + * ->inode_wb_list_lock
Huh? Please, kill -> in these two lines, while we are at it.
> * ->sb_lock (fs/fs-writeback.c)
> * ->mapping->tree_lock (__sync_single_inode)
> *
> @@ -98,8 +98,10 @@
> * ->zone.lru_lock (check_pte_range->isolate_lru_page)
> * ->private_lock (page_remove_rmap->set_page_dirty)
> * ->tree_lock (page_remove_rmap->set_page_dirty)
> - * ->inode_lock (page_remove_rmap->set_page_dirty)
> - * ->inode_lock (zap_pte_range->set_page_dirty)
> + * ->inode->i_lock (page_remove_rmap->set_page_dirty)
> + * ->inode_wb_list_lock (page_remove_rmap->set_page_dirty)
> + * ->inode->i_lock (zap_pte_range->set_page_dirty)
> + * ->inode_wb_list_lock (zap_pte_range->set_page_dirty)
Again, bogus ->
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 8/8] fs: Clean up documentation references to inode_lock
2010-10-29 3:23 ` [PATCH 8/8] fs: Clean up documentation references to inode_lock Dave Chinner
2010-10-29 5:17 ` Al Viro
@ 2010-10-29 7:46 ` Christoph Hellwig
1 sibling, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2010-10-29 7:46 UTC (permalink / raw)
To: Dave Chinner; +Cc: viro, linux-fsdevel, linux-kernel
Most of this should probably simply be folded into the relevant patches.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 7/8] fs: rename inode_lock to inode_hash_lock
2011-03-22 11:23 vfs: inode lock breakup Dave Chinner
@ 2011-03-22 11:23 ` Dave Chinner
0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2011-03-22 11:23 UTC (permalink / raw)
To: viro; +Cc: linux-kernel, linux-fsdevel
From: Dave Chinner <dchinner@redhat.com>
All that remains of the inode_lock is protecting the inode hash list
manipulation and traversals. Rename the inode_lock to
inode_hash_lock to reflect it's actual function.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/inode.c | 111 +++++++++++++++++++++++++--------------------
fs/notify/inode_mark.c | 1 -
fs/notify/mark.c | 1 -
fs/notify/vfsmount_mark.c | 1 -
fs/ntfs/inode.c | 4 +-
include/linux/writeback.h | 1 -
6 files changed, 63 insertions(+), 56 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 5a7f8ef..730ddd6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -38,10 +38,10 @@
* sb->s_inodes, inode->i_sb_list
* inode_wb_list_lock protects:
* bdi->wb.b_{dirty,io,more_io}, inode->i_wb_list
+ * inode_hash_lock protects:
+ * inode_hashtable, inode->i_hash
*
* Lock ordering:
- * inode_lock
- * inode->i_lock
*
* inode_sb_list_lock
* inode->i_lock
@@ -49,6 +49,13 @@
*
* inode_wb_list_lock
* inode->i_lock
+ *
+ * inode_hash_lock
+ * inode_sb_list_lock
+ * inode->i_lock
+ *
+ * iunique_lock
+ * inode_hash_lock
*/
/*
@@ -84,6 +91,8 @@
static unsigned int i_hash_mask __read_mostly;
static unsigned int i_hash_shift __read_mostly;
+static struct hlist_head *inode_hashtable __read_mostly;
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock);
/*
* Each inode can be on two separate lists. One is
@@ -99,15 +108,6 @@ static unsigned int i_hash_shift __read_mostly;
static LIST_HEAD(inode_lru);
static DEFINE_SPINLOCK(inode_lru_lock);
-static struct hlist_head *inode_hashtable __read_mostly;
-
-/*
- * A simple spinlock to protect the list manipulations.
- *
- * NOTE! You also have to own the lock if you change
- * the i_state of an inode while it is in use..
- */
-DEFINE_SPINLOCK(inode_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_sb_list_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_wb_list_lock);
@@ -432,11 +432,11 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval)
{
struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval);
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
spin_lock(&inode->i_lock);
hlist_add_head(&inode->i_hash, b);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
}
EXPORT_SYMBOL(__insert_inode_hash);
@@ -448,11 +448,11 @@ EXPORT_SYMBOL(__insert_inode_hash);
*/
void remove_inode_hash(struct inode *inode)
{
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
spin_lock(&inode->i_lock);
hlist_del_init(&inode->i_hash);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
}
EXPORT_SYMBOL(remove_inode_hash);
@@ -777,11 +777,15 @@ static struct inode *find_inode(struct super_block *sb,
repeat:
hlist_for_each_entry(inode, node, head, i_hash) {
- if (inode->i_sb != sb)
+ spin_lock(&inode->i_lock);
+ if (inode->i_sb != sb) {
+ spin_unlock(&inode->i_lock);
continue;
- if (!test(inode, data))
+ }
+ if (!test(inode, data)) {
+ spin_unlock(&inode->i_lock);
continue;
- spin_lock(&inode->i_lock);
+ }
if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
__wait_on_freeing_inode(inode);
goto repeat;
@@ -805,11 +809,15 @@ static struct inode *find_inode_fast(struct super_block *sb,
repeat:
hlist_for_each_entry(inode, node, head, i_hash) {
- if (inode->i_ino != ino)
+ spin_lock(&inode->i_lock);
+ if (inode->i_ino != ino) {
+ spin_unlock(&inode->i_lock);
continue;
- if (inode->i_sb != sb)
+ }
+ if (inode->i_sb != sb) {
+ spin_unlock(&inode->i_lock);
continue;
- spin_lock(&inode->i_lock);
+ }
if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
__wait_on_freeing_inode(inode);
goto repeat;
@@ -923,7 +931,7 @@ void unlock_new_inode(struct inode *inode)
EXPORT_SYMBOL(unlock_new_inode);
/*
- * This is called without the inode lock held.. Be careful.
+ * This is called without the inode hash lock held.. Be careful.
*
* We no longer cache the sb_flags in i_flags - see fs.h
* -- rmk@arm.uk.linux.org
@@ -940,7 +948,7 @@ static struct inode *get_new_inode(struct super_block *sb,
if (inode) {
struct inode *old;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
/* We released the lock, so.. */
old = find_inode(sb, head, test, data);
if (!old) {
@@ -952,7 +960,7 @@ static struct inode *get_new_inode(struct super_block *sb,
hlist_add_head(&inode->i_hash, head);
spin_unlock(&inode->i_lock);
inode_sb_list_add(inode);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
/* Return the locked inode with I_NEW set, the
* caller is responsible for filling in the contents
@@ -965,7 +973,7 @@ static struct inode *get_new_inode(struct super_block *sb,
* us. Use the old inode instead of the one we just
* allocated.
*/
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
destroy_inode(inode);
inode = old;
wait_on_inode(inode);
@@ -973,7 +981,7 @@ static struct inode *get_new_inode(struct super_block *sb,
return inode;
set_failed:
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
destroy_inode(inode);
return NULL;
}
@@ -991,7 +999,7 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
if (inode) {
struct inode *old;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
/* We released the lock, so.. */
old = find_inode_fast(sb, head, ino);
if (!old) {
@@ -1001,7 +1009,7 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
hlist_add_head(&inode->i_hash, head);
spin_unlock(&inode->i_lock);
inode_sb_list_add(inode);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
/* Return the locked inode with I_NEW set, the
* caller is responsible for filling in the contents
@@ -1014,7 +1022,7 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
* us. Use the old inode instead of the one we just
* allocated.
*/
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
destroy_inode(inode);
inode = old;
wait_on_inode(inode);
@@ -1035,10 +1043,14 @@ static int test_inode_iunique(struct super_block *sb, unsigned long ino)
struct hlist_node *node;
struct inode *inode;
+ spin_lock(&inode_hash_lock);
hlist_for_each_entry(inode, node, b, i_hash) {
- if (inode->i_ino == ino && inode->i_sb == sb)
+ if (inode->i_ino == ino && inode->i_sb == sb) {
+ spin_unlock(&inode_hash_lock);
return 0;
+ }
}
+ spin_unlock(&inode_hash_lock);
return 1;
}
@@ -1068,7 +1080,6 @@ ino_t iunique(struct super_block *sb, ino_t max_reserved)
static unsigned int counter;
ino_t res;
- spin_lock(&inode_lock);
spin_lock(&iunique_lock);
do {
if (counter <= max_reserved)
@@ -1076,7 +1087,6 @@ ino_t iunique(struct super_block *sb, ino_t max_reserved)
res = counter++;
} while (!test_inode_iunique(sb, res));
spin_unlock(&iunique_lock);
- spin_unlock(&inode_lock);
return res;
}
@@ -1118,7 +1128,7 @@ EXPORT_SYMBOL(igrab);
*
* Otherwise NULL is returned.
*
- * Note, @test is called with the inode_lock held, so can't sleep.
+ * Note, @test is called with the inode_hash_lock held, so can't sleep.
*/
static struct inode *ifind(struct super_block *sb,
struct hlist_head *head, int (*test)(struct inode *, void *),
@@ -1126,15 +1136,15 @@ static struct inode *ifind(struct super_block *sb,
{
struct inode *inode;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
inode = find_inode(sb, head, test, data);
if (inode) {
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
if (likely(wait))
wait_on_inode(inode);
return inode;
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return NULL;
}
@@ -1158,14 +1168,14 @@ static struct inode *ifind_fast(struct super_block *sb,
{
struct inode *inode;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
inode = find_inode_fast(sb, head, ino);
if (inode) {
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
wait_on_inode(inode);
return inode;
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return NULL;
}
@@ -1188,7 +1198,7 @@ static struct inode *ifind_fast(struct super_block *sb,
*
* Otherwise NULL is returned.
*
- * Note, @test is called with the inode_lock held, so can't sleep.
+ * Note, @test is called with the inode_hash_lock held, so can't sleep.
*/
struct inode *ilookup5_nowait(struct super_block *sb, unsigned long hashval,
int (*test)(struct inode *, void *), void *data)
@@ -1216,7 +1226,7 @@ EXPORT_SYMBOL(ilookup5_nowait);
*
* Otherwise NULL is returned.
*
- * Note, @test is called with the inode_lock held, so can't sleep.
+ * Note, @test is called with the inode_hash_lock held, so can't sleep.
*/
struct inode *ilookup5(struct super_block *sb, unsigned long hashval,
int (*test)(struct inode *, void *), void *data)
@@ -1267,7 +1277,8 @@ EXPORT_SYMBOL(ilookup);
* inode and this is returned locked, hashed, and with the I_NEW flag set. The
* file system gets to fill it in before unlocking it via unlock_new_inode().
*
- * Note both @test and @set are called with the inode_lock held, so can't sleep.
+ * Note both @test and @set are called with the inode_hash_lock held, so can't
+ * sleep.
*/
struct inode *iget5_locked(struct super_block *sb, unsigned long hashval,
int (*test)(struct inode *, void *),
@@ -1327,7 +1338,7 @@ int insert_inode_locked(struct inode *inode)
while (1) {
struct hlist_node *node;
struct inode *old = NULL;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
hlist_for_each_entry(old, node, head, i_hash) {
if (old->i_ino != ino)
continue;
@@ -1345,12 +1356,12 @@ int insert_inode_locked(struct inode *inode)
inode->i_state |= I_NEW;
hlist_add_head(&inode->i_hash, head);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return 0;
}
__iget(old);
spin_unlock(&old->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
wait_on_inode(old);
if (unlikely(!inode_unhashed(old))) {
iput(old);
@@ -1371,7 +1382,7 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
struct hlist_node *node;
struct inode *old = NULL;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
hlist_for_each_entry(old, node, head, i_hash) {
if (old->i_sb != sb)
continue;
@@ -1389,12 +1400,12 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
inode->i_state |= I_NEW;
hlist_add_head(&inode->i_hash, head);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return 0;
}
__iget(old);
spin_unlock(&old->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
wait_on_inode(old);
if (unlikely(!inode_unhashed(old))) {
iput(old);
@@ -1673,10 +1684,10 @@ static void __wait_on_freeing_inode(struct inode *inode)
wq = bit_waitqueue(&inode->i_state, __I_NEW);
prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
schedule();
finish_wait(wq, &wait.wait);
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
}
static __initdata unsigned long ihash_entries;
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index fb3b3c5..07ea8d3 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -22,7 +22,6 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/spinlock.h>
-#include <linux/writeback.h> /* for inode_lock */
#include <asm/atomic.h>
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 325185e..50c0085 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -91,7 +91,6 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/srcu.h>
-#include <linux/writeback.h> /* for inode_lock */
#include <asm/atomic.h>
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index 85eebff..e86577d 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -23,7 +23,6 @@
#include <linux/mount.h>
#include <linux/mutex.h>
#include <linux/spinlock.h>
-#include <linux/writeback.h> /* for inode_lock */
#include <asm/atomic.h>
diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c
index a627ed8..0b56c6b 100644
--- a/fs/ntfs/inode.c
+++ b/fs/ntfs/inode.c
@@ -54,7 +54,7 @@
*
* Return 1 if the attributes match and 0 if not.
*
- * NOTE: This function runs with the inode_lock spin lock held so it is not
+ * NOTE: This function runs with the inode->i_lock spin lock held so it is not
* allowed to sleep.
*/
int ntfs_test_inode(struct inode *vi, ntfs_attr *na)
@@ -98,7 +98,7 @@ int ntfs_test_inode(struct inode *vi, ntfs_attr *na)
*
* Return 0 on success and -errno on error.
*
- * NOTE: This function runs with the inode_lock spin lock held so it is not
+ * NOTE: This function runs with the inode->i_lock spin lock held so it is not
* allowed to sleep. (Hence the GFP_ATOMIC allocation.)
*/
static int ntfs_init_locked_inode(struct inode *vi, ntfs_attr *na)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 3f5fee7..17e7ccc 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -9,7 +9,6 @@
struct backing_dev_info;
-extern spinlock_t inode_lock;
extern spinlock_t inode_wb_list_lock;
/*
--
1.7.2.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 7/8] fs: rename inode_lock to inode_hash_lock
2010-10-29 8:59 fs: break out inode operations from inode_lock V4 Dave Chinner
@ 2010-10-29 9:00 ` Dave Chinner
0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2010-10-29 9:00 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel
From: Dave Chinner <dchinner@redhat.com>
All that remains of the inode_lock is protecting the inode hash list
manipulation and traversals. Rename the inode_lock to
inode_hash_lock to reflect it's actual function.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/inode.c | 80 +++++++++++++++++++++++----------------------
fs/notify/inode_mark.c | 1 -
fs/notify/mark.c | 1 -
fs/notify/vfsmount_mark.c | 1 -
include/linux/writeback.h | 1 -
5 files changed, 41 insertions(+), 43 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 94e5a5c..c5baf18 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -37,10 +37,10 @@
* sb->s_inodes, inode->i_sb_list
* inode_wb_list_lock protects:
* bdi->wb.b_{dirty,io,more_io}, inode->i_wb_list
+ * inode_hash_lock protects:
+ * inode_hashtable, inode->i_hash
*
* Lock ordering:
- * inode_lock
- * inode->i_lock
*
* inode_sb_list_lock
* inode->i_lock
@@ -48,6 +48,13 @@
*
* inode_wb_list_lock
* inode->i_lock
+ *
+ * inode_hash_lock
+ * inode_sb_list_lock
+ * inode->i_lock
+ *
+ * iunique_lock
+ * inode_hash_lock
*/
/*
@@ -83,6 +90,8 @@
static unsigned int i_hash_mask __read_mostly;
static unsigned int i_hash_shift __read_mostly;
+static struct hlist_head *inode_hashtable __read_mostly;
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock);
/*
* Each inode can be on two separate lists. One is
@@ -98,15 +107,6 @@ static unsigned int i_hash_shift __read_mostly;
static LIST_HEAD(inode_lru);
static DEFINE_SPINLOCK(inode_lru_lock);
-static struct hlist_head *inode_hashtable __read_mostly;
-
-/*
- * A simple spinlock to protect the list manipulations.
- *
- * NOTE! You also have to own the lock if you change
- * the i_state of an inode while it is in use..
- */
-DEFINE_SPINLOCK(inode_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_sb_list_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_wb_list_lock);
@@ -411,11 +411,11 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval)
{
struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval);
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
spin_lock(&inode->i_lock);
hlist_add_head(&inode->i_hash, b);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
}
EXPORT_SYMBOL(__insert_inode_hash);
@@ -427,11 +427,11 @@ EXPORT_SYMBOL(__insert_inode_hash);
*/
void remove_inode_hash(struct inode *inode)
{
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
spin_lock(&inode->i_lock);
hlist_del_init(&inode->i_hash);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
}
EXPORT_SYMBOL(remove_inode_hash);
@@ -911,7 +911,7 @@ static struct inode *get_new_inode(struct super_block *sb,
if (inode) {
struct inode *old;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
/* We released the lock, so.. */
old = find_inode(sb, head, test, data);
if (!old) {
@@ -923,7 +923,7 @@ static struct inode *get_new_inode(struct super_block *sb,
hlist_add_head(&inode->i_hash, head);
spin_unlock(&inode->i_lock);
inode_sb_list_add(inode);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
/* Return the locked inode with I_NEW set, the
* caller is responsible for filling in the contents
@@ -936,7 +936,7 @@ static struct inode *get_new_inode(struct super_block *sb,
* us. Use the old inode instead of the one we just
* allocated.
*/
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
destroy_inode(inode);
inode = old;
wait_on_inode(inode);
@@ -944,7 +944,7 @@ static struct inode *get_new_inode(struct super_block *sb,
return inode;
set_failed:
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
destroy_inode(inode);
return NULL;
}
@@ -962,7 +962,7 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
if (inode) {
struct inode *old;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
/* We released the lock, so.. */
old = find_inode_fast(sb, head, ino);
if (!old) {
@@ -972,7 +972,7 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
hlist_add_head(&inode->i_hash, head);
spin_unlock(&inode->i_lock);
inode_sb_list_add(inode);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
/* Return the locked inode with I_NEW set, the
* caller is responsible for filling in the contents
@@ -985,7 +985,7 @@ static struct inode *get_new_inode_fast(struct super_block *sb,
* us. Use the old inode instead of the one we just
* allocated.
*/
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
destroy_inode(inode);
inode = old;
wait_on_inode(inode);
@@ -1006,10 +1006,14 @@ static int test_inode_iunique(struct super_block *sb, unsigned long ino)
struct hlist_node *node;
struct inode *inode;
+ spin_lock(&inode_hash_lock);
hlist_for_each_entry(inode, node, b, i_hash) {
- if (inode->i_ino == ino && inode->i_sb == sb)
+ if (inode->i_ino == ino && inode->i_sb == sb) {
+ spin_unlock(&inode_hash_lock);
return 0;
+ }
}
+ spin_unlock(&inode_hash_lock);
return 1;
}
@@ -1039,7 +1043,6 @@ ino_t iunique(struct super_block *sb, ino_t max_reserved)
static unsigned int counter;
ino_t res;
- spin_lock(&inode_lock);
spin_lock(&iunique_lock);
do {
if (counter <= max_reserved)
@@ -1047,7 +1050,6 @@ ino_t iunique(struct super_block *sb, ino_t max_reserved)
res = counter++;
} while (!test_inode_iunique(sb, res));
spin_unlock(&iunique_lock);
- spin_unlock(&inode_lock);
return res;
}
@@ -1097,15 +1099,15 @@ static struct inode *ifind(struct super_block *sb,
{
struct inode *inode;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
inode = find_inode(sb, head, test, data);
if (inode) {
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
if (likely(wait))
wait_on_inode(inode);
return inode;
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return NULL;
}
@@ -1129,14 +1131,14 @@ static struct inode *ifind_fast(struct super_block *sb,
{
struct inode *inode;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
inode = find_inode_fast(sb, head, ino);
if (inode) {
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
wait_on_inode(inode);
return inode;
}
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return NULL;
}
@@ -1298,7 +1300,7 @@ int insert_inode_locked(struct inode *inode)
while (1) {
struct hlist_node *node;
struct inode *old = NULL;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
hlist_for_each_entry(old, node, head, i_hash) {
if (old->i_ino != ino)
continue;
@@ -1316,12 +1318,12 @@ int insert_inode_locked(struct inode *inode)
inode->i_state |= I_NEW;
hlist_add_head(&inode->i_hash, head);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return 0;
}
__iget(old);
spin_unlock(&old->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
wait_on_inode(old);
if (unlikely(!inode_unhashed(old))) {
iput(old);
@@ -1342,7 +1344,7 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
struct hlist_node *node;
struct inode *old = NULL;
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
hlist_for_each_entry(old, node, head, i_hash) {
if (old->i_sb != sb)
continue;
@@ -1360,12 +1362,12 @@ int insert_inode_locked4(struct inode *inode, unsigned long hashval,
inode->i_state |= I_NEW;
hlist_add_head(&inode->i_hash, head);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
return 0;
}
__iget(old);
spin_unlock(&old->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
wait_on_inode(old);
if (unlikely(!inode_unhashed(old))) {
iput(old);
@@ -1644,10 +1646,10 @@ static void __wait_on_freeing_inode(struct inode *inode)
wq = bit_waitqueue(&inode->i_state, __I_NEW);
prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
spin_unlock(&inode->i_lock);
- spin_unlock(&inode_lock);
+ spin_unlock(&inode_hash_lock);
schedule();
finish_wait(wq, &wait.wait);
- spin_lock(&inode_lock);
+ spin_lock(&inode_hash_lock);
}
static __initdata unsigned long ihash_entries;
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 06c8216..a33b0f9 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -22,7 +22,6 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/spinlock.h>
-#include <linux/writeback.h> /* for inode_lock */
#include <asm/atomic.h>
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 325185e..50c0085 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -91,7 +91,6 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/srcu.h>
-#include <linux/writeback.h> /* for inode_lock */
#include <asm/atomic.h>
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index 56772b5..6f8eefe 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -23,7 +23,6 @@
#include <linux/mount.h>
#include <linux/mutex.h>
#include <linux/spinlock.h>
-#include <linux/writeback.h> /* for inode_lock */
#include <asm/atomic.h>
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index e78a240..949bb4a 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -9,7 +9,6 @@
struct backing_dev_info;
-extern spinlock_t inode_lock;
extern spinlock_t inode_wb_list_lock;
/*
--
1.7.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-03-22 11:23 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-29 3:23 fs: break out inode operations from inode_lock V3 Dave Chinner
2010-10-29 3:23 ` [PATCH 1/8] fs: protect inode->i_state with inode->i_lock Dave Chinner
2010-10-29 4:29 ` Al Viro
2010-10-29 3:23 ` [PATCH 2/8] fs: factor inode disposal Dave Chinner
2010-10-29 4:45 ` Al Viro
2010-10-29 3:23 ` [PATCH 3/8] fs: Lock the inode LRU list separately Dave Chinner
2010-10-29 3:23 ` [PATCH 4/8] fs: remove inode_lock from iput_final and prune_icache Dave Chinner
2010-10-29 5:14 ` Al Viro
2010-10-29 3:23 ` [PATCH 5/8] fs: move i_sb_list out from under inode_lock Dave Chinner
2010-10-29 3:23 ` [PATCH 6/8] fs: move i_wb_list " Dave Chinner
2010-10-29 3:23 ` [PATCH 7/8] fs: rename inode_lock to inode_hash_lock Dave Chinner
2010-10-29 3:23 ` [PATCH 8/8] fs: Clean up documentation references to inode_lock Dave Chinner
2010-10-29 5:17 ` Al Viro
2010-10-29 7:46 ` Christoph Hellwig
2010-10-29 8:59 fs: break out inode operations from inode_lock V4 Dave Chinner
2010-10-29 9:00 ` [PATCH 7/8] fs: rename inode_lock to inode_hash_lock Dave Chinner
2011-03-22 11:23 vfs: inode lock breakup Dave Chinner
2011-03-22 11:23 ` [PATCH 7/8] fs: rename inode_lock to inode_hash_lock Dave Chinner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.