All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 201461] New: ext4 journal stalls write system call
@ 2018-10-17 12:56 bugzilla-daemon
  2018-10-17 12:57 ` [Bug 201461] " bugzilla-daemon
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: bugzilla-daemon @ 2018-10-17 12:56 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=201461

            Bug ID: 201461
           Summary: ext4 journal stalls write system call
           Product: File System
           Version: 2.5
    Kernel Version: 3.10.0-229.el7.x86_64
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: zhenbianshu@foxmail.com
        Regression: No

I am using CentOS 7.1.1503 with kernel linux 3.10.0-229.el7.x86_64, ext4 file
system with ordered journal mode, and delalloc enabled.

When my app writes logs to a file continually (about 6M/s), I can find the
write system call stalled for 100-700 ms occasionally, When I disable ext4's
journal feature, or set the journal mode to writeback, or disable delay
allocate, the stalling disappears. When I set linux's writeback more frequent,
dirty page expire time shorter, the problem reduced. 

I printed the process's stack when stalling happends, I got this:


[<ffffffff812e31f4>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffffa0195854>] ext4_da_get_block_prep+0x1a4/0x4b0 [ext4]
[<ffffffff811fbe17>] __block_write_begin+0x1a7/0x490
[<ffffffffa019b71c>] ext4_da_write_begin+0x15c/0x340 [ext4]
[<ffffffff8115685e>] generic_file_buffered_write+0x11e/0x290
[<ffffffff811589c5>] __generic_file_aio_write+0x1d5/0x3e0
[<ffffffff81158c2d>] generic_file_aio_write+0x5d/0xc0
[<ffffffffa0190b75>] ext4_file_write+0xb5/0x460 [ext4]
[<ffffffff811c64cd>] do_sync_write+0x8d/0xd0
[<ffffffff811c6c6d>] vfs_write+0xbd/0x1e0
[<ffffffff811c76b8>] SyS_write+0x58/0xb0
[<ffffffff81614a29>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


I read the source code of linux kernel, and found that
call_rwsem_down_read_failed will call rwsem_down_read_failed which will keep
waiting for rw_semaphore.

I think the reason is that metadata journal flushing must wait for related
dirty pages flushed, when dirty pages flushing took long time, the journal
blocked, and journal commit has this inode's rw_semaphore, write system call to
this inode is stalled.

I really hope I can find evidence to prove it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 201461] ext4 journal stalls write system call
  2018-10-17 12:56 [Bug 201461] New: ext4 journal stalls write system call bugzilla-daemon
@ 2018-10-17 12:57 ` bugzilla-daemon
  2018-10-17 15:39 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2018-10-17 12:57 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=201461

zhenbianshu@foxmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|high                        |normal

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 201461] ext4 journal stalls write system call
  2018-10-17 12:56 [Bug 201461] New: ext4 journal stalls write system call bugzilla-daemon
  2018-10-17 12:57 ` [Bug 201461] " bugzilla-daemon
@ 2018-10-17 15:39 ` bugzilla-daemon
  2018-10-18  2:02 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2018-10-17 15:39 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=201461

Theodore Tso (tytso@mit.edu) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu

--- Comment #1 from Theodore Tso (tytso@mit.edu) ---
It shouldn't be the case that the journal's data=ordered writeback would be
taking the i_data semaphore for writing.   

I'm guessing what is happening is that journal is causing a lot of I/O
operations because of data=ordered mode (e.g., not data=writeback), and this is
delaying a read which a writeback thread was trying to do while it was holding
a write lock on the inode's semaphore.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 201461] ext4 journal stalls write system call
  2018-10-17 12:56 [Bug 201461] New: ext4 journal stalls write system call bugzilla-daemon
  2018-10-17 12:57 ` [Bug 201461] " bugzilla-daemon
  2018-10-17 15:39 ` bugzilla-daemon
@ 2018-10-18  2:02 ` bugzilla-daemon
  2018-10-18  2:35 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2018-10-18  2:02 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=201461

--- Comment #2 from zhenbianshu@foxmail.com ---
(In reply to Theodore Tso from comment #1)
> It shouldn't be the case that the journal's data=ordered writeback would be
> taking the i_data semaphore for writing.   
> 
> I'm guessing what is happening is that journal is causing a lot of I/O
> operations because of data=ordered mode (e.g., not data=writeback), and this
> is delaying a read which a writeback thread was trying to do while it was
> holding a write lock on the inode's semaphore.

can you please show me the related source code of writeback holding a write
lock?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 201461] ext4 journal stalls write system call
  2018-10-17 12:56 [Bug 201461] New: ext4 journal stalls write system call bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-10-18  2:02 ` bugzilla-daemon
@ 2018-10-18  2:35 ` bugzilla-daemon
  2018-10-18 13:04 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2018-10-18  2:35 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=201461

--- Comment #3 from Theodore Tso (tytso@mit.edu) ---
The writeback code will end up calling ext4_writepages(), which will do block
allocations via ext4_map_blocks() with the flag EXT4_GET_BLOCKS_CREATE(), and
this will take a write lock on i_data_sem.   

I'll let you trace through the call chain that starts in the writeback thread,
through the call to ext4_writepages(), proceeds to mpage_map_one_extent(), and
from there to ext4_map_blocks() which actually holds the write block.

Note that this is distinct from the data=ordered forced writeblock which starts
in fs/jbd2's journal_submit_inode_data_buffer() calling generic_writepages().  
The difference is that generic_writepages() will only call writepage()
function.  For ext4 that's ext4_writepage(), which will never do block
allocation.

But the writeback threads call the address space operation's, writepages()
function, and ext4_writepages() *will* do block allocation for delayed
allocation writes.

This is a subtle point; ext4_writepage() and ext4_writepages() do very
different things.   The first is used by the commit kernel thread; the second
is used by the writeback kernel thread.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 201461] ext4 journal stalls write system call
  2018-10-17 12:56 [Bug 201461] New: ext4 journal stalls write system call bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-10-18  2:35 ` bugzilla-daemon
@ 2018-10-18 13:04 ` bugzilla-daemon
  2018-10-18 14:12 ` bugzilla-daemon
  2018-10-18 14:22 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2018-10-18 13:04 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=201461

--- Comment #4 from zhenbianshu@foxmail.com ---
(In reply to Theodore Tso from comment #3)
> The writeback code will end up calling ext4_writepages(), which will do
> block allocations via ext4_map_blocks() with the flag
> EXT4_GET_BLOCKS_CREATE(), and this will take a write lock on i_data_sem.   
> 
> I'll let you trace through the call chain that starts in the writeback
> thread, through the call to ext4_writepages(), proceeds to
> mpage_map_one_extent(), and from there to ext4_map_blocks() which actually
> holds the write block.
> 
> Note that this is distinct from the data=ordered forced writeblock which
> starts in fs/jbd2's journal_submit_inode_data_buffer() calling
> generic_writepages().   The difference is that generic_writepages() will
> only call writepage() function.  For ext4 that's ext4_writepage(), which
> will never do block allocation.
> 
> But the writeback threads call the address space operation's, writepages()
> function, and ext4_writepages() *will* do block allocation for delayed
> allocation writes.
> 
> This is a subtle point; ext4_writepage() and ext4_writepages() do very
> different things.   The first is used by the commit kernel thread; the
> second is used by the writeback kernel thread.

I did not find ext4_writepages() or mpage_map_one_extent() in linux 3.10.0
source code, what I found is ext4_da_writepages().

And the call chain:
journal_submit_data_buffers()
journal_submit_inode_data_buffers()
generic_writepages()
ext4_da_writepages()
write_cache_page_da()
mpage_da_map_and_submit()
ext4_map_blocks()
will take the write lock of i_data_sem.

Will generic_writepages() call ext4_da_writepages() in jbd2 ?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 201461] ext4 journal stalls write system call
  2018-10-17 12:56 [Bug 201461] New: ext4 journal stalls write system call bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-10-18 13:04 ` bugzilla-daemon
@ 2018-10-18 14:12 ` bugzilla-daemon
  2018-10-18 14:22 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2018-10-18 14:12 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=201461

--- Comment #5 from Theodore Tso (tytso@mit.edu) ---
Keep in mind that 3.10 was released over ***five years*** ago.   Quite frankly,
using a 3.10 kernel borders on malpractice, since it has so many unpatched
security holes it isn't funny.  If you are talking about a RHEL kernel, it has
diverged significantly from mainline and whether or not it is a security
disaster is between you and your Red Hat support contract.  :-)

My comments apply mainly to the the upstream development kernel, but to answer
your questions:

No, in 3.10, generic_writepages() does not call ext4_da_writepages().   And
ext4_da_writepages roughly corresponds to what ext4_writepages() in
non-obsolete kernels.

A huge number of things have changed between 3.10 and more modern kernels,
including many performance enhancements.   So some of the details have changed.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 201461] ext4 journal stalls write system call
  2018-10-17 12:56 [Bug 201461] New: ext4 journal stalls write system call bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-10-18 14:12 ` bugzilla-daemon
@ 2018-10-18 14:22 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2018-10-18 14:22 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=201461

zhenbianshu@foxmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID

--- Comment #6 from zhenbianshu@foxmail.com ---
(In reply to Theodore Tso from comment #5)
> Keep in mind that 3.10 was released over ***five years*** ago.   Quite
> frankly, using a 3.10 kernel borders on malpractice, since it has so many
> unpatched security holes it isn't funny.  If you are talking about a RHEL
> kernel, it has diverged significantly from mainline and whether or not it is
> a security disaster is between you and your Red Hat support contract.  :-)
> 
> My comments apply mainly to the the upstream development kernel, but to
> answer your questions:
> 
> No, in 3.10, generic_writepages() does not call ext4_da_writepages().   And
> ext4_da_writepages roughly corresponds to what ext4_writepages() in
> non-obsolete kernels.
> 
> A huge number of things have changed between 3.10 and more modern kernels,
> including many performance enhancements.   So some of the details have
> changed.

I got it, thank you very much!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-10-18 22:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-17 12:56 [Bug 201461] New: ext4 journal stalls write system call bugzilla-daemon
2018-10-17 12:57 ` [Bug 201461] " bugzilla-daemon
2018-10-17 15:39 ` bugzilla-daemon
2018-10-18  2:02 ` bugzilla-daemon
2018-10-18  2:35 ` bugzilla-daemon
2018-10-18 13:04 ` bugzilla-daemon
2018-10-18 14:12 ` bugzilla-daemon
2018-10-18 14:22 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.