* [PATCH] ext2: Fix data corruption for racing writes @ 2009-04-02 23:36 Jan Kara 2009-04-02 23:36 ` [PATCH] ext3: Fix chain verification in ext3_get_blocks() Jan Kara 2009-04-04 0:38 ` [PATCH] ext2: Fix data corruption for racing writes Ying Han 0 siblings, 2 replies; 7+ messages in thread From: Jan Kara @ 2009-04-02 23:36 UTC (permalink / raw) To: linux-ext4; +Cc: Andrew Morton, LKML, Jan Kara, Ying Han, Nick Piggin If two writers allocating blocks to file race with each other (e.g. because writepages races with ordinary write or two writepages race with each other), ext2_getblock() can be called on the same inode in parallel. Before we are going to allocate new blocks, we have to recheck the block chain we have obtained so far without holding truncate_mutex. Otherwise we could overwrite the indirect block pointer set by the other writer leading to data loss. The below test program by Ying is able to reproduce the data loss with ext2 on in BRD in a few minutes if the machine is under memory pressure: long kMemSize = 50 << 20; int kPageSize = 4096; int main(int argc, char **argv) { int status; int count = 0; int i; char *fname = "/mnt/test.mmap"; char *mem; unlink(fname); int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600); status = ftruncate(fd, kMemSize); mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); // Fill the memory with 1s. memset(mem, 1, kMemSize); sleep(2); for (i = 0; i < kMemSize; i++) { int byte_good = mem[i] != 0; if (!byte_good && ((i % kPageSize) == 0)) { //printf("%d ", i / kPageSize); count++; } } munmap(mem, kMemSize); close(fd); unlink(fname); if (count > 0) { printf("Running %d bad page\n", count); return 1; } return 0; } CC: Ying Han <yinghan@google.com> CC: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext2/inode.c | 44 +++++++++++++++++++++++++++++++++----------- 1 files changed, 33 insertions(+), 11 deletions(-) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index b43b955..acf6788 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -590,9 +590,8 @@ static int ext2_get_blocks(struct inode *inode, if (depth == 0) return (err); -reread: - partial = ext2_get_branch(inode, depth, offsets, chain, &err); + partial = ext2_get_branch(inode, depth, offsets, chain, &err); /* Simplest case - block found, no allocation needed */ if (!partial) { first_block = le32_to_cpu(chain[depth - 1].key); @@ -602,15 +601,16 @@ reread: while (count < maxblocks && count <= blocks_to_boundary) { ext2_fsblk_t blk; - if (!verify_chain(chain, partial)) { + if (!verify_chain(chain, chain + depth - 1)) { /* * Indirect block might be removed by * truncate while we were reading it. * Handling of that case: forget what we've * got now, go to reread. */ + err = -EAGAIN; count = 0; - goto changed; + break; } blk = le32_to_cpu(*(chain[depth-1].p + count)); if (blk == first_block + count) @@ -618,7 +618,8 @@ reread: else break; } - goto got_it; + if (err != -EAGAIN) + goto got_it; } /* Next simple case - plain lookup or failed read of indirect block */ @@ -626,6 +627,33 @@ reread: goto cleanup; mutex_lock(&ei->truncate_mutex); + /* + * If the indirect block is missing while we are reading + * the chain(ext3_get_branch() returns -EAGAIN err), or + * if the chain has been changed after we grab the semaphore, + * (either because another process truncated this branch, or + * another get_block allocated this branch) re-grab the chain to see if + * the request block has been allocated or not. + * + * Since we already block the truncate/other get_block + * at this point, we will have the current copy of the chain when we + * splice the branch into the tree. + */ + if (err == -EAGAIN || !verify_chain(chain, partial)) { + while (partial > chain) { + brelse(partial->bh); + partial--; + } + partial = ext2_get_branch(inode, depth, offsets, chain, &err); + if (!partial) { + count++; + mutex_unlock(&ei->truncate_mutex); + if (err) + goto cleanup; + clear_buffer_new(bh_result); + goto got_it; + } + } /* * Okay, we need to do block allocation. Lazily initialize the block @@ -683,12 +711,6 @@ cleanup: partial--; } return err; -changed: - while (partial > chain) { - brelse(partial->bh); - partial--; - } - goto reread; } int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) -- 1.6.0.2 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH] ext3: Fix chain verification in ext3_get_blocks() 2009-04-02 23:36 [PATCH] ext2: Fix data corruption for racing writes Jan Kara @ 2009-04-02 23:36 ` Jan Kara 2009-04-04 0:38 ` [PATCH] ext2: Fix data corruption for racing writes Ying Han 1 sibling, 0 replies; 7+ messages in thread From: Jan Kara @ 2009-04-02 23:36 UTC (permalink / raw) To: linux-ext4; +Cc: Andrew Morton, LKML, Jan Kara Chain verification in ext3_get_blocks() has been hosed since it called verify_chain(chain, NULL) which always returns success. As a result readers could in theory race with truncate. On the other hand the race probably cannot happen with the current locking scheme, since by the time ext3_truncate() is called all the pages are already removed and hence get_block() shouldn't be called on such pages... Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext3/inode.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 4a09ff1..4bab705 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -820,7 +820,7 @@ int ext3_get_blocks_handle(handle_t *handle, struct inode *inode, while (count < maxblocks && count <= blocks_to_boundary) { ext3_fsblk_t blk; - if (!verify_chain(chain, partial)) { + if (!verify_chain(chain, chain + depth - 1)) { /* * Indirect block might be removed by * truncate while we were reading it. -- 1.6.0.2 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2: Fix data corruption for racing writes 2009-04-02 23:36 [PATCH] ext2: Fix data corruption for racing writes Jan Kara 2009-04-02 23:36 ` [PATCH] ext3: Fix chain verification in ext3_get_blocks() Jan Kara @ 2009-04-04 0:38 ` Ying Han 2009-04-06 10:23 ` Jan Kara 1 sibling, 1 reply; 7+ messages in thread From: Ying Han @ 2009-04-04 0:38 UTC (permalink / raw) To: Jan Kara; +Cc: linux-ext4, Andrew Morton, LKML, Ying Han, Nick Piggin On Thu, Apr 2, 2009 at 4:36 PM, Jan Kara <jack@suse.cz> wrote: > > If two writers allocating blocks to file race with each other (e.g. because > writepages races with ordinary write or two writepages race with each other), > ext2_getblock() can be called on the same inode in parallel. Before we are > going to allocate new blocks, we have to recheck the block chain we have > obtained so far without holding truncate_mutex. Otherwise we could overwrite > the indirect block pointer set by the other writer leading to data loss. > > The below test program by Ying is able to reproduce the data loss with ext2 > on in BRD in a few minutes if the machine is under memory pressure: > > long kMemSize = 50 << 20; > int kPageSize = 4096; > > int main(int argc, char **argv) { > int status; > int count = 0; > int i; > char *fname = "/mnt/test.mmap"; > char *mem; > unlink(fname); > int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600); > status = ftruncate(fd, kMemSize); > mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > // Fill the memory with 1s. > memset(mem, 1, kMemSize); > sleep(2); > for (i = 0; i < kMemSize; i++) { > int byte_good = mem[i] != 0; > if (!byte_good && ((i % kPageSize) == 0)) { > //printf("%d ", i / kPageSize); > count++; > } > } > munmap(mem, kMemSize); > close(fd); > unlink(fname); > > if (count > 0) { > printf("Running %d bad page\n", count); > return 1; > } > return 0; > } > > CC: Ying Han <yinghan@google.com> > CC: Nick Piggin <nickpiggin@yahoo.com.au> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/ext2/inode.c | 44 +++++++++++++++++++++++++++++++++----------- > 1 files changed, 33 insertions(+), 11 deletions(-) > > diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c > index b43b955..acf6788 100644 > --- a/fs/ext2/inode.c > +++ b/fs/ext2/inode.c > @@ -590,9 +590,8 @@ static int ext2_get_blocks(struct inode *inode, > > if (depth == 0) > return (err); > -reread: > - partial = ext2_get_branch(inode, depth, offsets, chain, &err); > > + partial = ext2_get_branch(inode, depth, offsets, chain, &err); > /* Simplest case - block found, no allocation needed */ > if (!partial) { > first_block = le32_to_cpu(chain[depth - 1].key); > @@ -602,15 +601,16 @@ reread: > while (count < maxblocks && count <= blocks_to_boundary) { > ext2_fsblk_t blk; > > - if (!verify_chain(chain, partial)) { > + if (!verify_chain(chain, chain + depth - 1)) { > /* > * Indirect block might be removed by > * truncate while we were reading it. > * Handling of that case: forget what we've > * got now, go to reread. > */ > + err = -EAGAIN; > count = 0; > - goto changed; > + break; > } > blk = le32_to_cpu(*(chain[depth-1].p + count)); > if (blk == first_block + count) > @@ -618,7 +618,8 @@ reread: > else > break; > } > - goto got_it; > + if (err != -EAGAIN) > + goto got_it; > } > > /* Next simple case - plain lookup or failed read of indirect block */ > @@ -626,6 +627,33 @@ reread: > goto cleanup; > > mutex_lock(&ei->truncate_mutex); > + /* > + * If the indirect block is missing while we are reading > + * the chain(ext3_get_branch() returns -EAGAIN err), or > + * if the chain has been changed after we grab the semaphore, > + * (either because another process truncated this branch, or > + * another get_block allocated this branch) re-grab the chain to see if > + * the request block has been allocated or not. > + * > + * Since we already block the truncate/other get_block > + * at this point, we will have the current copy of the chain when we > + * splice the branch into the tree. > + */ > + if (err == -EAGAIN || !verify_chain(chain, partial)) { > + while (partial > chain) { > + brelse(partial->bh); > + partial--; > + } > + partial = ext2_get_branch(inode, depth, offsets, chain, &err); > + if (!partial) { > + count++; > + mutex_unlock(&ei->truncate_mutex); > + if (err) > + goto cleanup; > + clear_buffer_new(bh_result); > + goto got_it; > + } > + } > > /* > * Okay, we need to do block allocation. Lazily initialize the block > @@ -683,12 +711,6 @@ cleanup: > partial--; > } > return err; > -changed: > - while (partial > chain) { > - brelse(partial->bh); > - partial--; > - } > - goto reread; > } > > int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) I tried this patch and seems i got deadlock on the truncate_mutex. Here is the message after enabling lockdep. I pasted the same message on the origianal thread. kernel: 1 lock held by kswapd1/264: kernel: #0: (&ei->truncate_mutex){--..}, at: [<ffffffff8031d529>] ext2_get_block+0x109/0x960 kernel: INFO: task ftruncate_mmap:2950 blocked for more than 120 seconds. kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: ftruncate_mma D ffff81047e733a80 0 2950 2858 kernel: ffff8101798516f8 0000000000000092 0000000000000000 0000000000000046 kernel: ffff81047e0a1260 ffff81047f070000 ffff81047e0a15c0 0000000100130c66 kernel: 00000000ffffffff ffffffff8025740d 0000000000000000 0000000000000000 kernel: Call Trace: kernel: [<ffffffff8025740d>] mark_held_locks+0x3d/0x80 kernel: [<ffffffff804d78bd>] mutex_lock_nested+0x14d/0x280 kernel: [<ffffffff804d7855>] mutex_lock_nested+0xe5/0x280 kernel: [<ffffffff8031d529>] ext2_get_block+0x109/0x960 kernel: [<ffffffff802ca2e3>] create_empty_buffers+0x43/0xb0 kernel: [<ffffffff802ca2e3>] create_empty_buffers+0x43/0xb0 kernel: [<ffffffff802ca217>] alloc_page_buffers+0x97/0x120 kernel: [<ffffffff802cbfb6>] __block_write_full_page+0x206/0x320 kernel: [<ffffffff802cbe70>] __block_write_full_page+0xc0/0x320 kernel: [<ffffffff8031d420>] ext2_get_block+0x0/0x960 kernel: [<ffffffff8027c74e>] shrink_page_list+0x4fe/0x650 kernel: [<ffffffff80257ee8>] __lock_acquire+0x3b8/0x1080 kernel: [<ffffffff8027be18>] isolate_lru_pages+0x88/0x230 kernel: [<ffffffff8027c9ea>] shrink_inactive_list+0x14a/0x3f0 kernel: [<ffffffff8027cd43>] shrink_zone+0xb3/0x130 kernel: [<ffffffff80249e90>] autoremove_wake_function+0x0/0x30 kernel: [<ffffffff8027d1a8>] try_to_free_pages+0x268/0x3e0 kernel: [<ffffffff8027bfc0>] isolate_pages_global+0x0/0x40 kernel: [<ffffffff802774f7>] __alloc_pages_internal+0x1d7/0x4a0 kernel: [<ffffffff80279b94>] __do_page_cache_readahead+0x124/0x270 kernel: [<ffffffff8027314f>] filemap_fault+0x18f/0x400 kernel: [<ffffffff80280925>] __do_fault+0x65/0x450 kernel: [<ffffffff80257ee8>] __lock_acquire+0x3b8/0x1080 kernel: [<ffffffff803475dd>] __down_read_trylock+0x1d/0x60 kernel: [<ffffffff8028389a>] handle_mm_fault+0x18a/0x7a0 kernel: [<ffffffff804dba1c>] do_page_fault+0x29c/0x930 kernel: [<ffffffff804d8b46>] trace_hardirqs_on_thunk+0x35/0x3a kernel: [<ffffffff804d94dd>] error_exit+0x0/0xa9 kernel: kernel: 2 locks held by ftruncate_mmap/2950: kernel: #0: (&mm->mmap_sem){----}, at: [<ffffffff804db9af>] do_page_fault+0x22f/0x930 kernel: #1: (&ei->truncate_mutex){--..}, at: [<ffffffff8031d529>] ext2_get_block+0x109/0x960 Besides than that, does this patch fix the problem while moving the mutex up to the beginning of ext_get_blocks() does too? Like the following one diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 384fc0d..bef3ef7 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -586,11 +586,13 @@ static int ext2_get_blocks(struct inode *inode, int count = 0; ext2_fsblk_t first_block = 0; + mutex_lock(&ei->truncate_mutex); depth = ext2_block_to_path(inode,iblock,offsets,&blocks_to_boundary); - if (depth == 0) + if (depth == 0) { + mutex_unlock(&ei->truncate_mutex); return (err); -reread: + } partial = ext2_get_branch(inode, depth, offsets, chain, &err); /* Simplest case - block found, no allocation needed */ @@ -602,16 +604,6 @@ reread: while (count < maxblocks && count <= blocks_to_boundary) { ext2_fsblk_t blk; - if (!verify_chain(chain, partial)) { - /* - * Indirect block might be removed by - * truncate while we were reading it. - * Handling of that case: forget what we've - * got now, go to reread. - */ - count = 0; - goto changed; - } blk = le32_to_cpu(*(chain[depth-1].p + count)); if (blk == first_block + count) count++; @@ -625,7 +617,6 @@ reread: if (!create || err == -EIO) goto cleanup; - mutex_lock(&ei->truncate_mutex); /* * Okay, we need to do block allocation. Lazily initialize the block @@ -651,7 +642,6 @@ reread: offsets + (partial - chain), partial); if (err) { - mutex_unlock(&ei->truncate_mutex); goto cleanup; } @@ -662,13 +652,11 @@ reread: err = ext2_clear_xip_target (inode, le32_to_cpu(chain[depth-1].key)); if (err) { - mutex_unlock(&ei->truncate_mutex); goto cleanup; } } ext2_splice_branch(inode, iblock, partial, indirect_blks, count); - mutex_unlock(&ei->truncate_mutex); set_buffer_new(bh_result); got_it: map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key)); @@ -678,17 +666,12 @@ got_it: /* Clean up and exit */ partial = chain + depth - 1; /* the whole chain */ cleanup: + mutex_unlock(&ei->truncate_mutex); while (partial > chain) { brelse(partial->bh); partial--; } return err; -changed: - while (partial > chain) { - brelse(partial->bh); - partial--; - } - goto reread; } int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) --Ying > > -- > 1.6.0.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2: Fix data corruption for racing writes 2009-04-04 0:38 ` [PATCH] ext2: Fix data corruption for racing writes Ying Han @ 2009-04-06 10:23 ` Jan Kara 2009-04-09 20:30 ` Andrew Morton 0 siblings, 1 reply; 7+ messages in thread From: Jan Kara @ 2009-04-06 10:23 UTC (permalink / raw) To: Ying Han; +Cc: linux-ext4, Andrew Morton, LKML, Ying Han, Nick Piggin On Fri 03-04-09 17:38:33, Ying Han wrote: > On Thu, Apr 2, 2009 at 4:36 PM, Jan Kara <jack@suse.cz> wrote: > > > > If two writers allocating blocks to file race with each other (e.g. because > > writepages races with ordinary write or two writepages race with each other), > > ext2_getblock() can be called on the same inode in parallel. Before we are > > going to allocate new blocks, we have to recheck the block chain we have > > obtained so far without holding truncate_mutex. Otherwise we could overwrite > > the indirect block pointer set by the other writer leading to data loss. > > > > The below test program by Ying is able to reproduce the data loss with ext2 > > on in BRD in a few minutes if the machine is under memory pressure: > > > > long kMemSize = 50 << 20; > > int kPageSize = 4096; > > > > int main(int argc, char **argv) { > > int status; > > int count = 0; > > int i; > > char *fname = "/mnt/test.mmap"; > > char *mem; > > unlink(fname); > > int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600); > > status = ftruncate(fd, kMemSize); > > mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > > // Fill the memory with 1s. > > memset(mem, 1, kMemSize); > > sleep(2); > > for (i = 0; i < kMemSize; i++) { > > int byte_good = mem[i] != 0; > > if (!byte_good && ((i % kPageSize) == 0)) { > > //printf("%d ", i / kPageSize); > > count++; > > } > > } > > munmap(mem, kMemSize); > > close(fd); > > unlink(fname); > > > > if (count > 0) { > > printf("Running %d bad page\n", count); > > return 1; > > } > > return 0; > > } > > > > CC: Ying Han <yinghan@google.com> > > CC: Nick Piggin <nickpiggin@yahoo.com.au> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > fs/ext2/inode.c | 44 +++++++++++++++++++++++++++++++++----------- > > 1 files changed, 33 insertions(+), 11 deletions(-) > > > > diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c > > index b43b955..acf6788 100644 > > --- a/fs/ext2/inode.c > > +++ b/fs/ext2/inode.c > > @@ -590,9 +590,8 @@ static int ext2_get_blocks(struct inode *inode, > > > > if (depth == 0) > > return (err); > > -reread: > > - partial = ext2_get_branch(inode, depth, offsets, chain, &err); > > > > + partial = ext2_get_branch(inode, depth, offsets, chain, &err); > > /* Simplest case - block found, no allocation needed */ > > if (!partial) { > > first_block = le32_to_cpu(chain[depth - 1].key); > > @@ -602,15 +601,16 @@ reread: > > while (count < maxblocks && count <= blocks_to_boundary) { > > ext2_fsblk_t blk; > > > > - if (!verify_chain(chain, partial)) { > > + if (!verify_chain(chain, chain + depth - 1)) { > > /* > > * Indirect block might be removed by > > * truncate while we were reading it. > > * Handling of that case: forget what we've > > * got now, go to reread. > > */ > > + err = -EAGAIN; > > count = 0; > > - goto changed; > > + break; > > } > > blk = le32_to_cpu(*(chain[depth-1].p + count)); > > if (blk == first_block + count) > > @@ -618,7 +618,8 @@ reread: > > else > > break; > > } > > - goto got_it; > > + if (err != -EAGAIN) > > + goto got_it; > > } > > > > /* Next simple case - plain lookup or failed read of indirect block */ > > @@ -626,6 +627,33 @@ reread: > > goto cleanup; > > > > mutex_lock(&ei->truncate_mutex); > > + /* > > + * If the indirect block is missing while we are reading > > + * the chain(ext3_get_branch() returns -EAGAIN err), or > > + * if the chain has been changed after we grab the semaphore, > > + * (either because another process truncated this branch, or > > + * another get_block allocated this branch) re-grab the chain to see if > > + * the request block has been allocated or not. > > + * > > + * Since we already block the truncate/other get_block > > + * at this point, we will have the current copy of the chain when we > > + * splice the branch into the tree. > > + */ > > + if (err == -EAGAIN || !verify_chain(chain, partial)) { > > + while (partial > chain) { > > + brelse(partial->bh); > > + partial--; > > + } > > + partial = ext2_get_branch(inode, depth, offsets, chain, &err); > > + if (!partial) { > > + count++; > > + mutex_unlock(&ei->truncate_mutex); > > + if (err) > > + goto cleanup; > > + clear_buffer_new(bh_result); > > + goto got_it; > > + } > > + } > > > > /* > > * Okay, we need to do block allocation. Lazily initialize the block > > @@ -683,12 +711,6 @@ cleanup: > > partial--; > > } > > return err; > > -changed: > > - while (partial > chain) { > > - brelse(partial->bh); > > - partial--; > > - } > > - goto reread; > > } > > > > int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) > > I tried this patch and seems i got deadlock on the truncate_mutex. > Here is the message after enabling lockdep. I pasted the same message > on the origianal thread. > > kernel: 1 lock held by kswapd1/264: > kernel: #0: (&ei->truncate_mutex){--..}, at: [<ffffffff8031d529>] > ext2_get_block+0x109/0x960 > kernel: INFO: task ftruncate_mmap:2950 blocked for more than 120 seconds. > kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > kernel: ftruncate_mma D ffff81047e733a80 0 2950 2858 > kernel: ffff8101798516f8 0000000000000092 0000000000000000 0000000000000046 > kernel: ffff81047e0a1260 ffff81047f070000 ffff81047e0a15c0 0000000100130c66 > kernel: 00000000ffffffff ffffffff8025740d 0000000000000000 0000000000000000 > kernel: Call Trace: > kernel: [<ffffffff8025740d>] mark_held_locks+0x3d/0x80 > kernel: [<ffffffff804d78bd>] mutex_lock_nested+0x14d/0x280 > kernel: [<ffffffff804d7855>] mutex_lock_nested+0xe5/0x280 > kernel: [<ffffffff8031d529>] ext2_get_block+0x109/0x960 > kernel: [<ffffffff802ca2e3>] create_empty_buffers+0x43/0xb0 > kernel: [<ffffffff802ca2e3>] create_empty_buffers+0x43/0xb0 > kernel: [<ffffffff802ca217>] alloc_page_buffers+0x97/0x120 > kernel: [<ffffffff802cbfb6>] __block_write_full_page+0x206/0x320 > kernel: [<ffffffff802cbe70>] __block_write_full_page+0xc0/0x320 > kernel: [<ffffffff8031d420>] ext2_get_block+0x0/0x960 > kernel: [<ffffffff8027c74e>] shrink_page_list+0x4fe/0x650 > kernel: [<ffffffff80257ee8>] __lock_acquire+0x3b8/0x1080 > kernel: [<ffffffff8027be18>] isolate_lru_pages+0x88/0x230 > kernel: [<ffffffff8027c9ea>] shrink_inactive_list+0x14a/0x3f0 > kernel: [<ffffffff8027cd43>] shrink_zone+0xb3/0x130 > kernel: [<ffffffff80249e90>] autoremove_wake_function+0x0/0x30 > kernel: [<ffffffff8027d1a8>] try_to_free_pages+0x268/0x3e0 > kernel: [<ffffffff8027bfc0>] isolate_pages_global+0x0/0x40 > kernel: [<ffffffff802774f7>] __alloc_pages_internal+0x1d7/0x4a0 > kernel: [<ffffffff80279b94>] __do_page_cache_readahead+0x124/0x270 > kernel: [<ffffffff8027314f>] filemap_fault+0x18f/0x400 > kernel: [<ffffffff80280925>] __do_fault+0x65/0x450 > kernel: [<ffffffff80257ee8>] __lock_acquire+0x3b8/0x1080 > kernel: [<ffffffff803475dd>] __down_read_trylock+0x1d/0x60 > kernel: [<ffffffff8028389a>] handle_mm_fault+0x18a/0x7a0 > kernel: [<ffffffff804dba1c>] do_page_fault+0x29c/0x930 > kernel: [<ffffffff804d8b46>] trace_hardirqs_on_thunk+0x35/0x3a > kernel: [<ffffffff804d94dd>] error_exit+0x0/0xa9 > kernel: > kernel: 2 locks held by ftruncate_mmap/2950: > kernel: #0: (&mm->mmap_sem){----}, at: [<ffffffff804db9af>] > do_page_fault+0x22f/0x930 > kernel: #1: (&ei->truncate_mutex){--..}, at: [<ffffffff8031d529>] > ext2_get_block+0x109/0x960 I don't think this is a deadlock (or is the machine hung?). The thread was just waiting for a long time. I'd think that you'll occasionally get exactly the same message even without my patch if you stress the machine like you do. > Besides than that, does this patch fix the problem while moving the > mutex up to the beginning of ext_get_blocks() does too? Like the > following one Yes, moving truncate_mutex also fixes the problem but it would then synchronize readers of one file and we definitely don't want to do that. > diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c > index 384fc0d..bef3ef7 100644 > --- a/fs/ext2/inode.c > +++ b/fs/ext2/inode.c > @@ -586,11 +586,13 @@ static int ext2_get_blocks(struct inode *inode, > int count = 0; > ext2_fsblk_t first_block = 0; > > + mutex_lock(&ei->truncate_mutex); > depth = ext2_block_to_path(inode,iblock,offsets,&blocks_to_boundary); > > - if (depth == 0) > + if (depth == 0) { > + mutex_unlock(&ei->truncate_mutex); > return (err); > -reread: > + } > partial = ext2_get_branch(inode, depth, offsets, chain, &err); > > /* Simplest case - block found, no allocation needed */ > @@ -602,16 +604,6 @@ reread: > while (count < maxblocks && count <= blocks_to_boundary) { > ext2_fsblk_t blk; > > - if (!verify_chain(chain, partial)) { > - /* > - * Indirect block might be removed by > - * truncate while we were reading it. > - * Handling of that case: forget what we've > - * got now, go to reread. > - */ > - count = 0; > - goto changed; > - } > blk = le32_to_cpu(*(chain[depth-1].p + count)); > if (blk == first_block + count) > count++; > @@ -625,7 +617,6 @@ reread: > if (!create || err == -EIO) > goto cleanup; > > - mutex_lock(&ei->truncate_mutex); > > /* > * Okay, we need to do block allocation. Lazily initialize the block > @@ -651,7 +642,6 @@ reread: > offsets + (partial - chain), partial); > > if (err) { > - mutex_unlock(&ei->truncate_mutex); > goto cleanup; > } > > @@ -662,13 +652,11 @@ reread: > err = ext2_clear_xip_target (inode, > le32_to_cpu(chain[depth-1].key)); > if (err) { > - mutex_unlock(&ei->truncate_mutex); > goto cleanup; > } > } > > ext2_splice_branch(inode, iblock, partial, indirect_blks, count); > - mutex_unlock(&ei->truncate_mutex); > set_buffer_new(bh_result); > got_it: > map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key)); > @@ -678,17 +666,12 @@ got_it: > /* Clean up and exit */ > partial = chain + depth - 1; /* the whole chain */ > cleanup: > + mutex_unlock(&ei->truncate_mutex); > while (partial > chain) { > brelse(partial->bh); > partial--; > } > return err; > -changed: > - while (partial > chain) { > - brelse(partial->bh); > - partial--; > - } > - goto reread; > } > > int ext2_get_block(struct inode *inode, sector_t iblock, struct > buffer_head *bh_result, int create) Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2: Fix data corruption for racing writes 2009-04-06 10:23 ` Jan Kara @ 2009-04-09 20:30 ` Andrew Morton 2009-04-09 20:59 ` Ying Han 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2009-04-09 20:30 UTC (permalink / raw) To: Jan Kara; +Cc: yinghan, linux-ext4, linux-kernel, yinghan, nickpiggin On Mon, 6 Apr 2009 12:23:29 +0200 Jan Kara <jack@suse.cz> wrote: > > > > I tried this patch and seems i got deadlock on the truncate_mutex. > > Here is the message after enabling lockdep. I pasted the same message > > on the origianal thread. > > ... > > I don't think this is a deadlock (or is the machine hung?). The thread > was just waiting for a long time. I'd think that you'll occasionally get > exactly the same message even without my patch if you stress the machine > like you do. > Well, it's easy to tell the difference deadlock: system never recovers long-sucky-delay: system eventually recovers. Which was it?? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2: Fix data corruption for racing writes 2009-04-09 20:30 ` Andrew Morton @ 2009-04-09 20:59 ` Ying Han 2009-04-10 18:10 ` Ying Han 0 siblings, 1 reply; 7+ messages in thread From: Ying Han @ 2009-04-09 20:59 UTC (permalink / raw) To: Andrew Morton; +Cc: Jan Kara, yinghan, linux-ext4, linux-kernel, nickpiggin On Thu, Apr 9, 2009 at 1:30 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Mon, 6 Apr 2009 12:23:29 +0200 > Jan Kara <jack@suse.cz> wrote: > >> > >> > I tried this patch and seems i got deadlock on the truncate_mutex. >> > Here is the message after enabling lockdep. I pasted the same message >> > on the origianal thread. >> >> ... >> >> I don't think this is a deadlock (or is the machine hung?). The thread >> was just waiting for a long time. I'd think that you'll occasionally get >> exactly the same message even without my patch if you stress the machine >> like you do. >> > > Well, it's easy to tell the difference > > deadlock: system never recovers > long-sucky-delay: system eventually recovers. > > Which was it?? Guess i have to retest it. I didn't wait long enough to see what happened on the machine. However, i do see the machine got rebooted later, but i am not sure what is the reason it is got reboot. I will rerun the patch and keep an eye on it this time. --Ying > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2: Fix data corruption for racing writes 2009-04-09 20:59 ` Ying Han @ 2009-04-10 18:10 ` Ying Han 0 siblings, 0 replies; 7+ messages in thread From: Ying Han @ 2009-04-10 18:10 UTC (permalink / raw) To: Andrew Morton; +Cc: Jan Kara, yinghan, linux-ext4, linux-kernel, nickpiggin On Thu, Apr 9, 2009 at 1:59 PM, Ying Han <yinghan@google.com> wrote: > On Thu, Apr 9, 2009 at 1:30 PM, Andrew Morton <akpm@linux-foundation.org> wrote: >> On Mon, 6 Apr 2009 12:23:29 +0200 >> Jan Kara <jack@suse.cz> wrote: >> >>> > >>> > I tried this patch and seems i got deadlock on the truncate_mutex. >>> > Here is the message after enabling lockdep. I pasted the same message >>> > on the origianal thread. >>> >>> ... >>> >>> I don't think this is a deadlock (or is the machine hung?). The thread >>> was just waiting for a long time. I'd think that you'll occasionally get >>> exactly the same message even without my patch if you stress the machine >>> like you do. >>> >> >> Well, it's easy to tell the difference >> >> deadlock: system never recovers >> long-sucky-delay: system eventually recovers. >> >> Which was it?? > Guess i have to retest it. I didn't wait long enough to see what > happened on the machine. However, i do see the machine got rebooted > later, but i am not sure what is the reason it is got reboot. > > I will rerun the patch and keep an eye on it this time. Ok, i rerun the patch and i don't see the message poping this time and the machine is up healthy. > > --Ying >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-04-10 18:10 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-02 23:36 [PATCH] ext2: Fix data corruption for racing writes Jan Kara 2009-04-02 23:36 ` [PATCH] ext3: Fix chain verification in ext3_get_blocks() Jan Kara 2009-04-04 0:38 ` [PATCH] ext2: Fix data corruption for racing writes Ying Han 2009-04-06 10:23 ` Jan Kara 2009-04-09 20:30 ` Andrew Morton 2009-04-09 20:59 ` Ying Han 2009-04-10 18:10 ` Ying Han
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).