linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Sync semantics.
@ 2010-11-11 12:52 Rogier Wolff
  2010-11-12  7:11 ` Michal Svoboda
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Rogier Wolff @ 2010-11-11 12:52 UTC (permalink / raw)
  To: linux-kernel


Hi, 

What should I expect from a "sync" system call?

The manual says: 

   sync() first commits inodes to buffers, and then buffers to disk.

and then goes on to state: 

   ... since  version  1.3.20 Linux does actually wait.

[for the buffers to be handed over to the drive]

So how long can I expect a "sync" call to take? 

I would expect that all buffers that are dirty at the time of the
"sync" call are written by the time that sync returns. I'm currently
bombarding my fileserver with some 40-60Mbytes per second of data to
be written (*). The fileserver has 8G of memory. So max 8000 Mb of
dirty buffers can be stored, right? The server writes an average of
(at least) 40Mb/second to disk. According to my calculator, I will
have to wait up to 200 seconds for the sync system call to return....


# time sync
0.000u 0.220s 2:22:23.96 0.0%   0+0k 0+0io 2pf+0w

Two hours 22 minutes. 

I typed the "time sync" again, and it hasn't returned yet.... Actually
I don't expect it to before 6-hours-from-now because that's when the
clients will run out of data to send.

wolff    13706  0.0  0.0   1816   208 pts/12   D+   12:08   0:00 sync
wolff    14116  0.0  0.0   1908   520 pts/34   S+   13:48   0:00 grep sync

It's been running 100 minutes by now..... 


(*) The three clients are each sending 20-35 Mb/second but are being
held up by the server who doesn't seem to be handling more than about
40-50Mb/sec total.

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sync semantics.
  2010-11-11 12:52 Sync semantics Rogier Wolff
@ 2010-11-12  7:11 ` Michal Svoboda
  2010-11-15  2:39 ` Dave Chinner
  2010-11-16 14:31 ` Pavel Machek
  2 siblings, 0 replies; 7+ messages in thread
From: Michal Svoboda @ 2010-11-12  7:11 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 380 bytes --]

Rogier Wolff wrote:
> What should I expect from a "sync" system call?
> # time sync
> 0.000u 0.220s 2:22:23.96 0.0%   0+0k 0+0io 2pf+0w
> Two hours 22 minutes. 

I would also be interested to hear the answer for this question. Sync
and ongoing i/o simply don't seem to play well together, and you don't
even need anything as heavy as mentioned above.


Michal Svoboda

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sync semantics.
  2010-11-11 12:52 Sync semantics Rogier Wolff
  2010-11-12  7:11 ` Michal Svoboda
@ 2010-11-15  2:39 ` Dave Chinner
  2010-11-15  7:42   ` Michal Svoboda
  2010-11-16 14:31 ` Pavel Machek
  2 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2010-11-15  2:39 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: linux-kernel

On Thu, Nov 11, 2010 at 01:52:19PM +0100, Rogier Wolff wrote:
> 
> Hi, 
> 
> What should I expect from a "sync" system call?
> 
> The manual says: 
> 
>    sync() first commits inodes to buffers, and then buffers to disk.
> 
> and then goes on to state: 
> 
>    ... since  version  1.3.20 Linux does actually wait.
> 
> [for the buffers to be handed over to the drive]
> 
> So how long can I expect a "sync" call to take? 
> 
> I would expect that all buffers that are dirty at the time of the
> "sync" call are written by the time that sync returns. I'm currently
> bombarding my fileserver with some 40-60Mbytes per second of data to
> be written (*). The fileserver has 8G of memory. So max 8000 Mb of
> dirty buffers can be stored, right? The server writes an average of
> (at least) 40Mb/second to disk. According to my calculator, I will
> have to wait up to 200 seconds for the sync system call to return....
> 
> 
> # time sync
> 0.000u 0.220s 2:22:23.96 0.0%   0+0k 0+0io 2pf+0w

Depending on the kernel, sync will keep writing if you keep
dirtying. This should be mostly fixed in 2.6.36....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sync semantics.
  2010-11-15  2:39 ` Dave Chinner
@ 2010-11-15  7:42   ` Michal Svoboda
  2010-11-16  1:16     ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Svoboda @ 2010-11-15  7:42 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 281 bytes --]

Dave Chinner wrote:
> Depending on the kernel, sync will keep writing if you keep
> dirtying. This should be mostly fixed in 2.6.36....

Is that a "we hope that it is so but we are not sure" kind of "mostly",
or a "there are known cases when this is not true" one?

Michal Svoboda

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sync semantics.
  2010-11-15  7:42   ` Michal Svoboda
@ 2010-11-16  1:16     ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2010-11-16  1:16 UTC (permalink / raw)
  To: linux-kernel

On Mon, Nov 15, 2010 at 08:42:41AM +0100, Michal Svoboda wrote:
> Dave Chinner wrote:
> > Depending on the kernel, sync will keep writing if you keep
> > dirtying. This should be mostly fixed in 2.6.36....
> 
> Is that a "we hope that it is so but we are not sure" kind of "mostly",
> or a "there are known cases when this is not true" one?

The latter.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sync semantics.
  2010-11-11 12:52 Sync semantics Rogier Wolff
  2010-11-12  7:11 ` Michal Svoboda
  2010-11-15  2:39 ` Dave Chinner
@ 2010-11-16 14:31 ` Pavel Machek
  2010-11-17  8:09   ` Rogier Wolff
  2 siblings, 1 reply; 7+ messages in thread
From: Pavel Machek @ 2010-11-16 14:31 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: linux-kernel

Hi!

> What should I expect from a "sync" system call?
> 
> The manual says: 
> 
>    sync() first commits inodes to buffers, and then buffers to disk.
> 
> and then goes on to state: 
> 
>    ... since  version  1.3.20 Linux does actually wait.
> 
> [for the buffers to be handed over to the drive]
> 
> So how long can I expect a "sync" call to take? 
> 
> I would expect that all buffers that are dirty at the time of the
> "sync" call are written by the time that sync returns. I'm currently
> bombarding my fileserver with some 40-60Mbytes per second of data to
> be written (*). The fileserver has 8G of memory. So max 8000 Mb of

Are you sure? Hitting 40MB/sec is hard when it involves seeking...

You may want to lower dirty_ratio...
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sync semantics.
  2010-11-16 14:31 ` Pavel Machek
@ 2010-11-17  8:09   ` Rogier Wolff
  0 siblings, 0 replies; 7+ messages in thread
From: Rogier Wolff @ 2010-11-17  8:09 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

On Tue, Nov 16, 2010 at 03:31:49PM +0100, Pavel Machek wrote:
> > I would expect that all buffers that are dirty at the time of the
> > "sync" call are written by the time that sync returns. I'm currently
> > bombarding my fileserver with some 40-60Mbytes per second of data to
> > be written (*). The fileserver has 8G of memory. So max 8000 Mb of
> 
> Are you sure? Hitting 40MB/sec is hard when it involves seeking...

Yeah... It's about 10 times slower than when no seeking is involved,
so that makes sense, doesn't it? The machine can sustain over 400 Mb
per second on linear reads:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0      0  50908 6667292 502040    0    0 430064     0 2171 1677  0 23 66 11
 4  0      0  51280 6713952 501976    0    0 429596     0 2430 1889 16 28 44 12
 1  0      0  51768 6754884 502100    0    0 423388     0 2460 2100 13 28 47 13
 0  1      0  50760 6793392 502416    0    0 422892     0 2174 1796  0 21 68 10

Through the filesystem I get:

1073741824 bytes (1.1 GB) copied, 2.70151 s, 397 MB/s
1073741824 bytes (1.1 GB) copied, 2.62782 s, 409 MB/s

Which impresses me. In practise I seldomly see high
1xxMb/sec. (i.e. 120-150Mb per second happens, while 180-190 is rare).

On the other hand, in the same run I also get: 
1073741824 bytes (1.1 GB) copied, 6.82678 s, 157 MB/s
1073741824 bytes (1.1 GB) copied, 6.66133 s, 161 MB/s
1073741824 bytes (1.1 GB) copied, 6.58995 s, 163 MB/s

which apparently is caused by these files being more fragmented. These
files (1Gb each) were written linearly, but some might have been
written wile other of these 1G files (in a different directory) were
written at the same time. I'm guessing these ended up more or less
interleaved.

Checking up on the fragmentation of these files, the fast ones have
about 600-800 fragments, while the slow ones have 1300-2000 fragments.

Mb/sec    #frags
 400       1252
 493        865
 391        755
 393        606
 395        819
 206        937
 159        901
 173       1940
 165       1806
 157       1481
 168       1351
 179       2692
 166       1541
 154       1151
 159        924
 149       1228
 155       1139
 151       1103
 150       1070
 155       1160

There is SOME correlation but not 100%. This is on an 8x1T RAID. 

> You may want to lower dirty_ratio...

You know, what I would REALLY want is that when say 400Mb of dirty
buffers exist, the machine would start alternating between the two or
three areas that require writing. All these should be "linear". If you
switch only once every second or so, the "seeking time" is less than
1%. In that case, my server should be able to write up to 400Mb per
second, except for that I can only supply 120Mb per second over the
Ethernet. But that would still be a 3x improvement over what the
machine can handle now.

In theory these things should work even better if things like
"dirty_ratio" are higher. 

In the current situation, the "sync" call will return when the IO
system falls to "idle". The chances of "nothing needing writing"
increases as the amount of allowed buffers is lower. But the problem
is that sync keeps on waiting for those new "dirty" buffers that have
become dirty AFTER the start of the sync call.

Suppose we have a mail handling daemon that just recieved an Email
from over the network. Instead of just saying: Ok, i'll take over from
here, it prefers to write it to disk, and calls sync, so that should
the power fail, the EMail is on permanent storage, and can be
correctly handled.

This works just great, until someone manages to get the server to
continue to get new dirty buffers, so that the sync takes over ten
minutes, and the other sides MTA will time out.....

Anyway, someone told me that it's been fixed, and sync won't behave
like this anymore.

	Roger. 

> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-17  8:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-11 12:52 Sync semantics Rogier Wolff
2010-11-12  7:11 ` Michal Svoboda
2010-11-15  2:39 ` Dave Chinner
2010-11-15  7:42   ` Michal Svoboda
2010-11-16  1:16     ` Dave Chinner
2010-11-16 14:31 ` Pavel Machek
2010-11-17  8:09   ` Rogier Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).