ext4: performance regression introduced by the cgroup writeback support

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ext4: performance regression introduced by the cgroup writeback support
@ 2015-09-23 13:49 Dexuan Cui
  2015-09-23 16:13 ` Chris Mason
  0 siblings, 1 reply; 6+ messages in thread
From: Dexuan Cui @ 2015-09-23 13:49 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, Tejun Heo, linux-ext4,
	linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2111 bytes --]

Hi all,
Since some point between July and Sep, I have been suffered from a strange "very slow write" issue and on Sep 9 I reported it to LKML (but got no reply): https://lkml.org/lkml/2015/9/9/290

The issue is: under high CPU and disk I/O pressure, *some* processes can suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while the normal write speed should be at least dozens of MB/s.

I think I identified the commit which introduced the regression:
ext4: implement cgroup writeback support (https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=001e4a8775f6e8ad52a89e0072f09aee47d5d252)

This commit is already in the mainline tree, so I can reproduce the issue there too:
With the latest mainline,  I can reproduce the issue; after I revert the patch, I can't reproduce the issue.

When the issue happens:
1. the read speed is pretty normal, e.g.. it's still >100MB/s.
2. 'top' shows both the 'user' and 'sys' utilization is about 0%, but the IO-wait is always about 100%.
3. 'iotop' shows the read speed is 0 (this is correct because there is indeed no read request)  and the write speed is pretty slow (the average is <1MB/s or even 20KB/s).
4. when the issue happens, sometimes any new process suffers from the slow write issue, but sometimes it looks not all the new processes suffers from the issue.
5. The " WARNING: CPU: 7 PID: 6782 at fs/inode.c:390 ihold+0x30/0x40() " in my Sep-9 mail may be another different issue.
6. To reproduce the issue, I need to run my workload for enough long time (see the below).

My workload is simple: I just repeatedly build the kernel source ("make clean; make -j16"). My kernel config is attached FYI.

I can reproduce the issue on a physical machine: e.g., in my kernel building test with my .config, it took only ~5 minutes in the first 176 runs, but since the 177th run, it could take from 10 hours to 5 minutes - very unstable.

It looks it's easier to reproduce the issue in a Hyper-V VM: usually I can reproduce the issue within the first 10 or 20 runs.

Any idea?

Thanks,
-- Dexuan

[-- Attachment #2: kernel-config.txt.gz --]
[-- Type: application/x-gzip, Size: 46184 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext4: performance regression introduced by the cgroup writeback support
  2015-09-23 13:49 ext4: performance regression introduced by the cgroup writeback support Dexuan Cui
@ 2015-09-23 16:13 ` Chris Mason
  2015-09-23 18:53   ` Tejun Heo
  2015-09-24  0:12   ` Dexuan Cui
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Mason @ 2015-09-23 16:13 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Theodore Ts'o, Andreas Dilger, Tejun Heo, linux-ext4,
	linux-fsdevel, linux-kernel

On Wed, Sep 23, 2015 at 01:49:31PM +0000, Dexuan Cui wrote:
> Hi all,
> Since some point between July and Sep, I have been suffered from a strange "very slow write" issue and on Sep 9 I reported it to LKML (but got no reply): https://lkml.org/lkml/2015/9/9/290
> 
> The issue is: under high CPU and disk I/O pressure, *some* processes can suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while the normal write speed should be at least dozens of MB/s.
> 
> I think I identified the commit which introduced the regression:
> ext4: implement cgroup writeback support (https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=001e4a8775f6e8ad52a89e0072f09aee47d5d252)
> 
> This commit is already in the mainline tree, so I can reproduce the issue there too:
> With the latest mainline,  I can reproduce the issue; after I revert the patch, I can't reproduce the issue.
> 
> When the issue happens:
> 1. the read speed is pretty normal, e.g.. it's still >100MB/s.
> 2. 'top' shows both the 'user' and 'sys' utilization is about 0%, but the IO-wait is always about 100%.
> 3. 'iotop' shows the read speed is 0 (this is correct because there is indeed no read request)  and the write speed is pretty slow (the average is <1MB/s or even 20KB/s).
> 4. when the issue happens, sometimes any new process suffers from the slow write issue, but sometimes it looks not all the new processes suffers from the issue.
> 5. The " WARNING: CPU: 7 PID: 6782 at fs/inode.c:390 ihold+0x30/0x40() " in my Sep-9 mail may be another different issue.
> 6. To reproduce the issue, I need to run my workload for enough long time (see the below).
> 
> My workload is simple: I just repeatedly build the kernel source ("make clean; make -j16"). My kernel config is attached FYI.
> 
> I can reproduce the issue on a physical machine: e.g., in my kernel building test with my .config, it took only ~5 minutes in the first 176 runs, but since the 177th run, it could take from 10 hours to 5 minutes - very unstable.
> 
> It looks it's easier to reproduce the issue in a Hyper-V VM: usually I can reproduce the issue within the first 10 or 20 runs.
> 
> Any idea?

Are you using cgroups?  That patch really shouldn't impact load unless
there are actual IO controls in place.

-chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext4: performance regression introduced by the cgroup writeback support
  2015-09-23 16:13 ` Chris Mason
@ 2015-09-23 18:53   ` Tejun Heo
  2015-09-24  0:15     ` Dexuan Cui
  2015-09-24  7:26     ` Dexuan Cui
  2015-09-24  0:12   ` Dexuan Cui
  1 sibling, 2 replies; 6+ messages in thread
From: Tejun Heo @ 2015-09-23 18:53 UTC (permalink / raw)
  To: Chris Mason, Dexuan Cui, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-fsdevel, linux-kernel

On Wed, Sep 23, 2015 at 12:13:59PM -0400, Chris Mason wrote:
> > The issue is: under high CPU and disk I/O pressure, *some* processes can suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while the normal write speed should be at least dozens of MB/s.

So, I think I know what caused this regression.  Separate wb domains
shouldn't have been enabled on traditional hierarchies.  It doesn't
work there and leads to multiple wb domains competing on the same
blkcg and the bw estimation would go completely haywire.  Will update
soon.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: ext4: performance regression introduced by the cgroup writeback support
  2015-09-23 16:13 ` Chris Mason
  2015-09-23 18:53   ` Tejun Heo
@ 2015-09-24  0:12   ` Dexuan Cui
  1 sibling, 0 replies; 6+ messages in thread
From: Dexuan Cui @ 2015-09-24  0:12 UTC (permalink / raw)
  To: Chris Mason
  Cc: Theodore Ts'o, Andreas Dilger, Tejun Heo, linux-ext4,
	linux-fsdevel, linux-kernel

> -----Original Message-----
> From: Chris Mason [mailto:clm@fb.com]
> Sent: Thursday, September 24, 2015 0:14
> To: Dexuan Cui <decui@microsoft.com>
> Cc: Theodore Ts'o <tytso@mit.edu>; Andreas Dilger <adilger.kernel@dilger.ca>;
> Tejun Heo <tj@kernel.org>; linux-ext4@vger.kernel.org; linux-
> fsdevel@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: ext4: performance regression introduced by the cgroup writeback
> support
> 
> On Wed, Sep 23, 2015 at 01:49:31PM +0000, Dexuan Cui wrote:
> > Hi all,
> > Since some point between July and Sep, I have been suffered from a strange
> "very slow write" issue and on Sep 9 I reported it to LKML (but got no reply):
> https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2flkml.org%
> 2flkml%2f2015%2f9%2f9%2f290&data=01%7c01%7cdecui%40064d.mgd.micros
> oft.com%7c8001aa10249f41a0363608d2c432042d%7c72f988bf86f141af91ab2
> d7cd011db47%7c1&sdata=oJBsP55jdg86TNt2X71s0gfPlwbMTzaJN9QIcsXsSmA%
> 3d
> >
> > The issue is: under high CPU and disk I/O pressure, *some* processes can
> suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while
> the normal write speed should be at least dozens of MB/s.
> >
> > I think I identified the commit which introduced the regression:
> > ext4: implement cgroup writeback support
> (https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgit.kernel.
> org%2fcgit%2flinux%2fkernel%2fgit%2fnext%2flinux-
> next.git%2fcommit%2f%3fid%3d001e4a8775f6e8ad52a89e0072f09aee47d5d25
> 2&data=01%7c01%7cdecui%40064d.mgd.microsoft.com%7c8001aa10249f41a0
> 363608d2c432042d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=QIcX
> R%2flZMqkK2afIxV%2fYxZDug26vj5yx%2bkoh6ugJB2A%3d)
> >
> > This commit is already in the mainline tree, so I can reproduce the issue there
> too:
> > With the latest mainline,  I can reproduce the issue; after I revert the patch, I
> can't reproduce the issue.
> >
> > When the issue happens:
> > 1. the read speed is pretty normal, e.g.. it's still >100MB/s.
> > 2. 'top' shows both the 'user' and 'sys' utilization is about 0%, but the IO-wait is
> always about 100%.
> > 3. 'iotop' shows the read speed is 0 (this is correct because there is indeed no
> read request)  and the write speed is pretty slow (the average is <1MB/s or even
> 20KB/s).
> > 4. when the issue happens, sometimes any new process suffers from the slow
> write issue, but sometimes it looks not all the new processes suffers from the
> issue.
> > 5. The " WARNING: CPU: 7 PID: 6782 at fs/inode.c:390 ihold+0x30/0x40() " in
> my Sep-9 mail may be another different issue.
> > 6. To reproduce the issue, I need to run my workload for enough long time
> (see the below).
> >
> > My workload is simple: I just repeatedly build the kernel source ("make clean;
> make -j16"). My kernel config is attached FYI.
> >
> > I can reproduce the issue on a physical machine: e.g., in my kernel building test
> with my .config, it took only ~5 minutes in the first 176 runs, but since the 177th
> run, it could take from 10 hours to 5 minutes - very unstable.
> >
> > It looks it's easier to reproduce the issue in a Hyper-V VM: usually I can
> reproduce the issue within the first 10 or 20 runs.
> >
> > Any idea?
> 
> Are you using cgroups?  That patch really shouldn't impact load unless
> there are actual IO controls in place.
> 
> -chris

I'm not using cgroups here.

Tejun just now found the root cause: "Separate wb domains
shouldn't have been enabled on traditional hierarchies " and supplied a fix.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: ext4: performance regression introduced by the cgroup writeback support
  2015-09-23 18:53   ` Tejun Heo
@ 2015-09-24  0:15     ` Dexuan Cui
  2015-09-24  7:26     ` Dexuan Cui
  1 sibling, 0 replies; 6+ messages in thread
From: Dexuan Cui @ 2015-09-24  0:15 UTC (permalink / raw)
  To: Tejun Heo, Chris Mason, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-fsdevel, linux-kernel

> -----Original Message-----
> From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of Tejun Heo
> Sent: Thursday, September 24, 2015 2:54
> To: Chris Mason <clm@fb.com>; Dexuan Cui <decui@microsoft.com>;
> Theodore Ts'o <tytso@mit.edu>; Andreas Dilger <adilger.kernel@dilger.ca>;
> linux-ext4@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: ext4: performance regression introduced by the cgroup writeback
> support
> 
> On Wed, Sep 23, 2015 at 12:13:59PM -0400, Chris Mason wrote:
> > > The issue is: under high CPU and disk I/O pressure, *some* processes can
> suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while
> the normal write speed should be at least dozens of MB/s.
> 
> So, I think I know what caused this regression.  Separate wb domains
> shouldn't have been enabled on traditional hierarchies.  It doesn't
> work there and leads to multiple wb domains competing on the same
> blkcg and the bw estimation would go completely haywire.  Will update
> soon.
> 
> Thanks.
> 
> --
> tejun

Thanks a lot for the quick fix, Tejun!

I'll test the fix.
I'll report back in case it can't fix the issue --I think this is unlikely. :-)

-- Dexuan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: ext4: performance regression introduced by the cgroup writeback support
  2015-09-23 18:53   ` Tejun Heo
  2015-09-24  0:15     ` Dexuan Cui
@ 2015-09-24  7:26     ` Dexuan Cui
  1 sibling, 0 replies; 6+ messages in thread
From: Dexuan Cui @ 2015-09-24  7:26 UTC (permalink / raw)
  To: Tejun Heo, Chris Mason, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-fsdevel, linux-kernel

> From: Dexuan Cui
> Sent: Thursday, September 24, 2015 8:16
> To: 'Tejun Heo' <tj@kernel.org>; Chris Mason <clm@fb.com>; Theodore Ts'o
> <tytso@mit.edu>; Andreas Dilger <adilger.kernel@dilger.ca>; linux-
> ext4@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: RE: ext4: performance regression introduced by the cgroup writeback
> support
> 
> > -----Original Message-----
> > From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of Tejun Heo
> > Sent: Thursday, September 24, 2015 2:54
> > To: Chris Mason <clm@fb.com>; Dexuan Cui <decui@microsoft.com>;
> > Theodore Ts'o <tytso@mit.edu>; Andreas Dilger <adilger.kernel@dilger.ca>;
> > linux-ext4@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-
> > kernel@vger.kernel.org
> > Subject: Re: ext4: performance regression introduced by the cgroup writeback
> > support
> >
> > On Wed, Sep 23, 2015 at 12:13:59PM -0400, Chris Mason wrote:
> > > > The issue is: under high CPU and disk I/O pressure, *some* processes can
> > suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while
> > the normal write speed should be at least dozens of MB/s.
> >
> > So, I think I know what caused this regression.  Separate wb domains
> > shouldn't have been enabled on traditional hierarchies.  It doesn't
> > work there and leads to multiple wb domains competing on the same
> > blkcg and the bw estimation would go completely haywire.  Will update
> > soon.
> >
> > Thanks.
> >
> > --
> > tejun
> 
> Thanks a lot for the quick fix, Tejun!
> 
> I'll test the fix.
> I'll report back in case it can't fix the issue --I think this is unlikely. :-)
> 
> -- Dexuan

Hi Tejun,
Thank you!
I believe your patch fixes my issue, according to my test.

-- Dexuan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-09-24  7:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-23 13:49 ext4: performance regression introduced by the cgroup writeback support Dexuan Cui
2015-09-23 16:13 ` Chris Mason
2015-09-23 18:53   ` Tejun Heo
2015-09-24  0:15     ` Dexuan Cui
2015-09-24  7:26     ` Dexuan Cui
2015-09-24  0:12   ` Dexuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).