All of lore.kernel.org
 help / color / mirror / Atom feed
* ext4 unkillable lseek.
@ 2016-01-12 14:53 Dave Jones
  2016-01-12 21:17 ` Andreas Dilger
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Jones @ 2016-01-12 14:53 UTC (permalink / raw)
  To: Linux Kernel; +Cc: linux-ext4

I was investigating a case where it looked like Trinity was getting
into a deadlock.

The running task is doing an lseek(fd, <bignum>, SEEK_DATA) on a sparse
file that looks like this..

$ ll trinity-testfile4 
--wxrwx--- 1 davej davej 4947802326691 Jan 12 09:14 trinity-testfile4*
$ sudo filefrag trinity-testfile4 
trinity-testfile4: 3 extents found

The kernel trace for that process looks like..

trinity-c11     R  running task    22192 11483   2439 0x00080004
 ffff8800428a7c98 ffff8800a2ef87dc ffff8800a3bdf758 ffff8800a3bdf730
 ffff8800a2ef8008 ffff8800a2ef8340 ffff88009f8e9980 ffff8800a2ef8000
 ffff8800428a0000 ffffed0008514001 ffff8800428a0008 ffff8800935499e0
Call Trace:
 [<ffffffff8f5e8bd2>] preempt_schedule_common+0x42/0x70
 [<ffffffff8f5e8c1f>] preempt_schedule+0x1f/0x30
 [<ffffffff8e003058>] ___preempt_schedule+0x12/0x14
 [<ffffffff8e7a1e90>] ? ext4_es_find_delayed_extent_range+0x2a0/0x780
 [<ffffffff8f5f6f81>] ? _raw_read_unlock+0x31/0x50
 [<ffffffff8f5f6f94>] ? _raw_read_unlock+0x44/0x50
 [<ffffffff8e7a1e90>] ext4_es_find_delayed_extent_range+0x2a0/0x780
 [<ffffffff8e69c307>] ext4_llseek+0x567/0x870
 [<ffffffff8e69bda0>] ? ext4_find_unwritten_pgoff.isra.12+0x790/0x790
 [<ffffffff8f5edafc>] ? mutex_lock_nested+0x51c/0x8e0
 [<ffffffff8e20e5f9>] ? trace_hardirqs_on_caller+0x3f9/0x580
 [<ffffffff8e56e1a5>] ? __fdget_pos+0xd5/0x110
 [<ffffffff8e20e78d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8f5ed5e0>] ? mutex_lock_interruptible_nested+0x9f0/0x9f0
 [<ffffffff8e00508f>] ? enter_from_user_mode+0x1f/0x50
 [<ffffffff8e005338>] ? syscall_trace_enter_phase1+0x278/0x470
 [<ffffffff8e248527>] ? debug_lockdep_rcu_enabled+0x77/0x90
 [<ffffffff8e518acd>] SyS_lseek+0x10d/0x180
 [<ffffffff8f5f7457>] entry_SYSCALL_64_fastpath+0x12/0x6b

It's currently been running for a hour.
Even though it's preempting back to userspace, it's ignoring
all the SIGKILLs that trinity has been sending it for taking too long.

Meanwhile all the other processes are backing up on the f_pos lock.

trinity-c7      D ffff880066857d50 24240 11628   2439 0x00080004
 ffff880066857d50 0000000000000007 ffff8800a3bdf758 ffff8800a3bdf730
 ffff880045286608 ffff880045286940 ffff8800a0150000 ffff880045286600
 ffff880066850000 ffffed000cd0a001 ffff880066850008 dffffc0000000000
Call Trace:
 [<ffffffff8f5e8e0f>] schedule+0x9f/0x1c0
 [<ffffffff8f5e9588>] schedule_preempt_disabled+0x18/0x30
 [<ffffffff8f5ed92d>] mutex_lock_nested+0x34d/0x8e0
 [<ffffffff8e56e1a5>] ? __fdget_pos+0xd5/0x110
 [<ffffffff8e337fe3>] ? acct_account_cputime+0x63/0x80
 [<ffffffff8e56e1a5>] ? __fdget_pos+0xd5/0x110
 [<ffffffff8f5ed5e0>] ? mutex_lock_interruptible_nested+0x9f0/0x9f0
 [<ffffffff8e248527>] ? debug_lockdep_rcu_enabled+0x77/0x90
 [<ffffffff8e56e1a5>] __fdget_pos+0xd5/0x110
 [<ffffffff8e51c029>] SyS_read+0x79/0x230
 [<ffffffff8e51bfb0>] ? do_sendfile+0x1280/0x1280
 [<ffffffff8e20e5f9>] ? trace_hardirqs_on_caller+0x3f9/0x580
 [<ffffffff8e003017>] ? trace_hardirqs_on_thunk+0x17/0x19
 [<ffffffff8f5f7457>] entry_SYSCALL_64_fastpath+0x12/0x6b

Eventually it does complete, but waiting a half hour every time
trinity picks lseek as a syscall is kinda crappy.

Shouldn't lseek be a killable operation ?

I notice this doesn't seem to happen with btrfs, suggesting it's
an ext'ism.   This has probably been there for a while, I've not
been doing fuzz runs on ext4 enabled systems for a long time.

	Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext4 unkillable lseek.
  2016-01-12 14:53 ext4 unkillable lseek Dave Jones
@ 2016-01-12 21:17 ` Andreas Dilger
  2016-01-13  7:36   ` Dmitry Monakhov
  2016-01-13 17:00   ` Theodore Ts'o
  0 siblings, 2 replies; 4+ messages in thread
From: Andreas Dilger @ 2016-01-12 21:17 UTC (permalink / raw)
  To: Dave Jones, Ted Tso, Dmitry Monakhov; +Cc: Linux Kernel, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 5179 bytes --]

On Jan 12, 2016, at 7:53 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
> 
> I was investigating a case where it looked like Trinity was getting
> into a deadlock.
> 
> The running task is doing an lseek(fd, <bignum>, SEEK_DATA) on a sparse
> file that looks like this..
> 
> $ ll trinity-testfile4
> --wxrwx--- 1 davej davej 4947802326691 Jan 12 09:14 trinity-testfile4*
> $ sudo filefrag trinity-testfile4
> trinity-testfile4: 3 extents found
> 
> The kernel trace for that process looks like..
> 
> trinity-c11     R  running task    22192 11483   2439 0x00080004
> ffff8800428a7c98 ffff8800a2ef87dc ffff8800a3bdf758 ffff8800a3bdf730
> ffff8800a2ef8008 ffff8800a2ef8340 ffff88009f8e9980 ffff8800a2ef8000
> ffff8800428a0000 ffffed0008514001 ffff8800428a0008 ffff8800935499e0
> Call Trace:
> [<ffffffff8f5e8bd2>] preempt_schedule_common+0x42/0x70
> [<ffffffff8f5e8c1f>] preempt_schedule+0x1f/0x30
> [<ffffffff8e003058>] ___preempt_schedule+0x12/0x14
> [<ffffffff8e7a1e90>] ? ext4_es_find_delayed_extent_range+0x2a0/0x780
> [<ffffffff8f5f6f81>] ? _raw_read_unlock+0x31/0x50
> [<ffffffff8f5f6f94>] ? _raw_read_unlock+0x44/0x50
> [<ffffffff8e7a1e90>] ext4_es_find_delayed_extent_range+0x2a0/0x780

It looks like ext4_es_find_delayed_extent_range() is being called once
for every block in the file looking for any delalloc data, which is
pretty awful.  Checking the git history for this code, it seems it was
fixed once upon a time in commit 14516bb7bb:

    ext4: fix suboptimal seek_{data,hole} extents traversial

    It is ridiculous practice to scan inode block by block, this technique
    applicable only for old indirect files. This takes significant amount
    of time for really large files. Let's reuse ext4_fiemap which already
    traverse inode-tree in most optimal meaner.

    TESTCASE:
    ftruncate64(fd, 0);
    ftruncate64(fd, 1ULL << 40);
    /* lseek will spin very long time */
    lseek64(fd, 0, SEEK_DATA);
    lseek64(fd, 0, SEEK_HOLE);

    Original report: https://lkml.org/lkml/2014/10/16/620

    Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

but it was later reverted in ad7fefb10 because of a problem with ext3 and
never restored.

    Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"

    This reverts commit 14516bb7bb6ffbd49f35389f9ece3b2045ba5815.

    This was causing regression test failures with generic/285 with an ext3
    filesystem using CONFIG_EXT4_USE_FOR_EXT23.

    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Looks like that patch needs to be revived.

> [<ffffffff8e69c307>] ext4_llseek+0x567/0x870
> [<ffffffff8e69bda0>] ? ext4_find_unwritten_pgoff.isra.12+0x790/0x790
> [<ffffffff8f5edafc>] ? mutex_lock_nested+0x51c/0x8e0
> [<ffffffff8e20e5f9>] ? trace_hardirqs_on_caller+0x3f9/0x580
> [<ffffffff8e56e1a5>] ? __fdget_pos+0xd5/0x110
> [<ffffffff8e20e78d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff8f5ed5e0>] ? mutex_lock_interruptible_nested+0x9f0/0x9f0
> [<ffffffff8e00508f>] ? enter_from_user_mode+0x1f/0x50
> [<ffffffff8e005338>] ? syscall_trace_enter_phase1+0x278/0x470
> [<ffffffff8e248527>] ? debug_lockdep_rcu_enabled+0x77/0x90
> [<ffffffff8e518acd>] SyS_lseek+0x10d/0x180
> [<ffffffff8f5f7457>] entry_SYSCALL_64_fastpath+0x12/0x6b
> 
> It's currently been running for a hour.
> Even though it's preempting back to userspace, it's ignoring
> all the SIGKILLs that trinity has been sending it for taking too long.
> 
> Meanwhile all the other processes are backing up on the f_pos lock.
> 
> trinity-c7      D ffff880066857d50 24240 11628   2439 0x00080004
> ffff880066857d50 0000000000000007 ffff8800a3bdf758 ffff8800a3bdf730
> ffff880045286608 ffff880045286940 ffff8800a0150000 ffff880045286600
> ffff880066850000 ffffed000cd0a001 ffff880066850008 dffffc0000000000
> Call Trace:
> [<ffffffff8f5e8e0f>] schedule+0x9f/0x1c0
> [<ffffffff8f5e9588>] schedule_preempt_disabled+0x18/0x30
> [<ffffffff8f5ed92d>] mutex_lock_nested+0x34d/0x8e0
> [<ffffffff8e56e1a5>] ? __fdget_pos+0xd5/0x110
> [<ffffffff8e337fe3>] ? acct_account_cputime+0x63/0x80
> [<ffffffff8e56e1a5>] ? __fdget_pos+0xd5/0x110
> [<ffffffff8f5ed5e0>] ? mutex_lock_interruptible_nested+0x9f0/0x9f0
> [<ffffffff8e248527>] ? debug_lockdep_rcu_enabled+0x77/0x90
> [<ffffffff8e56e1a5>] __fdget_pos+0xd5/0x110
> [<ffffffff8e51c029>] SyS_read+0x79/0x230
> [<ffffffff8e51bfb0>] ? do_sendfile+0x1280/0x1280
> [<ffffffff8e20e5f9>] ? trace_hardirqs_on_caller+0x3f9/0x580
> [<ffffffff8e003017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [<ffffffff8f5f7457>] entry_SYSCALL_64_fastpath+0x12/0x6b
> 
> Eventually it does complete, but waiting a half hour every time
> trinity picks lseek as a syscall is kinda crappy.
> 
> Shouldn't lseek be a killable operation ?
> 
> I notice this doesn't seem to happen with btrfs, suggesting it's
> an ext'ism.   This has probably been there for a while, I've not
> been doing fuzz runs on ext4 enabled systems for a long time.
> 
> 	Dave
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext4 unkillable lseek.
  2016-01-12 21:17 ` Andreas Dilger
@ 2016-01-13  7:36   ` Dmitry Monakhov
  2016-01-13 17:00   ` Theodore Ts'o
  1 sibling, 0 replies; 4+ messages in thread
From: Dmitry Monakhov @ 2016-01-13  7:36 UTC (permalink / raw)
  To: Andreas Dilger, Dave Jones, Ted Tso; +Cc: Linux Kernel, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 5531 bytes --]

Andreas Dilger <adilger@dilger.ca> writes:

> On Jan 12, 2016, at 7:53 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
>> 
>> I was investigating a case where it looked like Trinity was getting
>> into a deadlock.
>> 
>> The running task is doing an lseek(fd, <bignum>, SEEK_DATA) on a sparse
>> file that looks like this..
>> 
>> $ ll trinity-testfile4
>> --wxrwx--- 1 davej davej 4947802326691 Jan 12 09:14 trinity-testfile4*
>> $ sudo filefrag trinity-testfile4
>> trinity-testfile4: 3 extents found
>> 
>> The kernel trace for that process looks like..
>> 
>> trinity-c11     R  running task    22192 11483   2439 0x00080004
>> ffff8800428a7c98 ffff8800a2ef87dc ffff8800a3bdf758 ffff8800a3bdf730
>> ffff8800a2ef8008 ffff8800a2ef8340 ffff88009f8e9980 ffff8800a2ef8000
>> ffff8800428a0000 ffffed0008514001 ffff8800428a0008 ffff8800935499e0
>> Call Trace:
>> [<ffffffff8f5e8bd2>] preempt_schedule_common+0x42/0x70
>> [<ffffffff8f5e8c1f>] preempt_schedule+0x1f/0x30
>> [<ffffffff8e003058>] ___preempt_schedule+0x12/0x14
>> [<ffffffff8e7a1e90>] ? ext4_es_find_delayed_extent_range+0x2a0/0x780
>> [<ffffffff8f5f6f81>] ? _raw_read_unlock+0x31/0x50
>> [<ffffffff8f5f6f94>] ? _raw_read_unlock+0x44/0x50
>> [<ffffffff8e7a1e90>] ext4_es_find_delayed_extent_range+0x2a0/0x780
>
> It looks like ext4_es_find_delayed_extent_range() is being called once
> for every block in the file looking for any delalloc data, which is
> pretty awful.  Checking the git history for this code, it seems it was
> fixed once upon a time in commit 14516bb7bb:
>
>     ext4: fix suboptimal seek_{data,hole} extents traversial
>
>     It is ridiculous practice to scan inode block by block, this technique
>     applicable only for old indirect files. This takes significant amount
>     of time for really large files. Let's reuse ext4_fiemap which already
>     traverse inode-tree in most optimal meaner.
>
>     TESTCASE:
>     ftruncate64(fd, 0);
>     ftruncate64(fd, 1ULL << 40);
>     /* lseek will spin very long time */
>     lseek64(fd, 0, SEEK_DATA);
>     lseek64(fd, 0, SEEK_HOLE);
>
>     Original report: https://lkml.org/lkml/2014/10/16/620
>
>     Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
>
> but it was later reverted in ad7fefb10 because of a problem with ext3 and
> never restored.
>
>     Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"
>
>     This reverts commit 14516bb7bb6ffbd49f35389f9ece3b2045ba5815.
>
>     This was causing regression test failures with generic/285 with an ext3
>     filesystem using CONFIG_EXT4_USE_FOR_EXT23.
>
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
>
> Looks like that patch needs to be revived.
Yes. It is in my queue. I'll do it.
>
>> [<ffffffff8e69c307>] ext4_llseek+0x567/0x870
>> [<ffffffff8e69bda0>] ? ext4_find_unwritten_pgoff.isra.12+0x790/0x790
>> [<ffffffff8f5edafc>] ? mutex_lock_nested+0x51c/0x8e0
>> [<ffffffff8e20e5f9>] ? trace_hardirqs_on_caller+0x3f9/0x580
>> [<ffffffff8e56e1a5>] ? __fdget_pos+0xd5/0x110
>> [<ffffffff8e20e78d>] ? trace_hardirqs_on+0xd/0x10
>> [<ffffffff8f5ed5e0>] ? mutex_lock_interruptible_nested+0x9f0/0x9f0
>> [<ffffffff8e00508f>] ? enter_from_user_mode+0x1f/0x50
>> [<ffffffff8e005338>] ? syscall_trace_enter_phase1+0x278/0x470
>> [<ffffffff8e248527>] ? debug_lockdep_rcu_enabled+0x77/0x90
>> [<ffffffff8e518acd>] SyS_lseek+0x10d/0x180
>> [<ffffffff8f5f7457>] entry_SYSCALL_64_fastpath+0x12/0x6b
>> 
>> It's currently been running for a hour.
>> Even though it's preempting back to userspace, it's ignoring
>> all the SIGKILLs that trinity has been sending it for taking too long.
>> 
>> Meanwhile all the other processes are backing up on the f_pos lock.
>> 
>> trinity-c7      D ffff880066857d50 24240 11628   2439 0x00080004
>> ffff880066857d50 0000000000000007 ffff8800a3bdf758 ffff8800a3bdf730
>> ffff880045286608 ffff880045286940 ffff8800a0150000 ffff880045286600
>> ffff880066850000 ffffed000cd0a001 ffff880066850008 dffffc0000000000
>> Call Trace:
>> [<ffffffff8f5e8e0f>] schedule+0x9f/0x1c0
>> [<ffffffff8f5e9588>] schedule_preempt_disabled+0x18/0x30
>> [<ffffffff8f5ed92d>] mutex_lock_nested+0x34d/0x8e0
>> [<ffffffff8e56e1a5>] ? __fdget_pos+0xd5/0x110
>> [<ffffffff8e337fe3>] ? acct_account_cputime+0x63/0x80
>> [<ffffffff8e56e1a5>] ? __fdget_pos+0xd5/0x110
>> [<ffffffff8f5ed5e0>] ? mutex_lock_interruptible_nested+0x9f0/0x9f0
>> [<ffffffff8e248527>] ? debug_lockdep_rcu_enabled+0x77/0x90
>> [<ffffffff8e56e1a5>] __fdget_pos+0xd5/0x110
>> [<ffffffff8e51c029>] SyS_read+0x79/0x230
>> [<ffffffff8e51bfb0>] ? do_sendfile+0x1280/0x1280
>> [<ffffffff8e20e5f9>] ? trace_hardirqs_on_caller+0x3f9/0x580
>> [<ffffffff8e003017>] ? trace_hardirqs_on_thunk+0x17/0x19
>> [<ffffffff8f5f7457>] entry_SYSCALL_64_fastpath+0x12/0x6b
>> 
>> Eventually it does complete, but waiting a half hour every time
>> trinity picks lseek as a syscall is kinda crappy.
>> 
>> Shouldn't lseek be a killable operation ?
>> 
>> I notice this doesn't seem to happen with btrfs, suggesting it's
>> an ext'ism.   This has probably been there for a while, I've not
>> been doing fuzz runs on ext4 enabled systems for a long time.
>> 
>> 	Dave
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext4 unkillable lseek.
  2016-01-12 21:17 ` Andreas Dilger
  2016-01-13  7:36   ` Dmitry Monakhov
@ 2016-01-13 17:00   ` Theodore Ts'o
  1 sibling, 0 replies; 4+ messages in thread
From: Theodore Ts'o @ 2016-01-13 17:00 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Dave Jones, Dmitry Monakhov, Linux Kernel, linux-ext4

On Tue, Jan 12, 2016 at 02:17:43PM -0700, Andreas Dilger wrote:
> 
> It looks like ext4_es_find_delayed_extent_range() is being called once
> for every block in the file looking for any delalloc data, which is
> pretty awful.  Checking the git history for this code, it seems it was
> fixed once upon a time in commit 14516bb7bb:
> 
>     ext4: fix suboptimal seek_{data,hole} extents traversial
> 
>     It is ridiculous practice to scan inode block by block, this technique
>     applicable only for old indirect files. This takes significant amount
>     of time for really large files. Let's reuse ext4_fiemap which already
>     traverse inode-tree in most optimal meaner.
> 
>     TESTCASE:
>     ftruncate64(fd, 0);
>     ftruncate64(fd, 1ULL << 40);
>     /* lseek will spin very long time */
>     lseek64(fd, 0, SEEK_DATA);
>     lseek64(fd, 0, SEEK_HOLE);
> 
>     Original report: https://lkml.org/lkml/2014/10/16/620
> 
>     Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> 
> but it was later reverted in ad7fefb10 because of a problem with ext3 and
> never restored.

The relevant thread dates back to January 3, 2015 when it went
dead/dormant.  The last message from Dimitri was:

>Crap. I do not understand why I cant not reproduce this.
>I'm out of my normal dev environment for couple of days,
>so patch reverting looks reasonable.  But please add code which
>break the loop on signal because otherwise this result in DOS for huge file

We never did do that last bit, which is probably what we should do as
a short-term fix until we can debug the "fix suboptimal seek_{data,hole}
extents traversal" patch.

							- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-13 17:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-12 14:53 ext4 unkillable lseek Dave Jones
2016-01-12 21:17 ` Andreas Dilger
2016-01-13  7:36   ` Dmitry Monakhov
2016-01-13 17:00   ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.