All of lore.kernel.org
 help / color / mirror / Atom feed
* R: Re: Slow startup of systemd-journal on BTRFS
@ 2014-06-12 11:05 Goffredo Baroncelli <kreijack@libero.it>
  0 siblings, 0 replies; 6+ messages in thread
From: Goffredo Baroncelli <kreijack@libero.it> @ 2014-06-12 11:05 UTC (permalink / raw)
  To: lists; +Cc: systemd Mailing List, linux-btrfs



>----Messaggio originale----
>Da: lists@colorremedies.com
>Data: 12/06/2014 2.40
>A: <kreijack@inwind.it>, "Goffredo Baroncelli"<kreijack@libero.it>
>Cc: "systemd Mailing List"<systemd-devel@lists.freedesktop.org>, "linux-btrfs"
<linux-btrfs@vger.kernel.org>
>Ogg: Re: Slow startup of systemd-journal on BTRFS
>
>
>On Jun 11, 2014, at 3:28 PM, Goffredo Baroncelli <kreijack@libero.it> wrote:
>> 
>> If someone is able to suggest me how FRAGMENT the log file, I can try to 
collect more scientific data.
>
>So long as you're not using compression, filefrag will show you fragments of 
systemd-journald journals. I can vouch for the behavior 
> you experience without xattr +C or autodefrag, but further it also causes 
much slowness when reading journal contents. LIke if I want to 
> search all boots for a particular error message to see how far back it 
started, this takes quite a bit longer than on other file systems. 
> So far I'm not experiencing this problem with autodefrag or any other 
negative side effects, but my understanding is this code is still in flux.
>
>Since the journals have their own checksumming I'm not overly concerned about 
setting xattr +C.

This is true; but it can be a general solution: the checksum of the data are 
needed during a 
scrub and/or a RAID rebuilding.

I want to investigate doing an explicit defrag once a week.


>
>Chris Murphy
G.Baroncelli


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: R: Re: Slow startup of systemd-journal on BTRFS
  2014-06-12 23:24   ` Dave Chinner
@ 2014-06-13 22:19     ` Goffredo Baroncelli
  0 siblings, 0 replies; 6+ messages in thread
From: Goffredo Baroncelli @ 2014-06-13 22:19 UTC (permalink / raw)
  To: Dave Chinner, Duncan; +Cc: linux-btrfs, systemd-devel

Hi Dave

On 06/13/2014 01:24 AM, Dave Chinner wrote:
> On Thu, Jun 12, 2014 at 12:37:13PM +0000, Duncan wrote:
>> Goffredo Baroncelli <kreijack@libero.it> posted on Thu, 12 Jun 2014
>> 13:13:26 +0200 as excerpted:
>>
>>>> systemd has a very stupid journal write pattern. It checks if there is
>>>> space in the file for the write, and if not it fallocates the small
>>>> amount of space it needs (it does *4 byte* fallocate calls!) and then
>>>> does the write to it.  All this does is fragment the crap out of the log
>>>> files because the filesystems cannot optimise the allocation patterns.
>>>
>>> I checked the code, and to me it seems that the fallocate() are done in
>>> FILE_SIZE_INCREASE unit (actually 8MB).
>>
>> FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think 
>> actually pretty much equally bad without NOCOW set on the file.
> 
> So maybe it's been fixed in systemd since the last time I looked.
> Yup:
> 
> http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal-file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58
> 
> The reason it was changed? To "save a syscall per append", not to
> prevent fragmentation of the file, which was the problem everyone
> was complaining about...

thanks for pointing that. However I am performing my tests on a fedora 20 with systemd-208, which seems have this change
> 
>> Why?  Because btrfs data blocks are 4 KiB.  With COW, the effect for 
>> either 4 byte or 8 MiB file allocations is going to end up being the 
>> same, forcing (repeated until full) rewrite of each 4 KiB block into its 
>> own extent.


I am reaching the conclusion that fallocate is not the problem. The fallocate increase the filesize of about 8MB, which is enough for some logging. So it is not called very often.

I have to investigate more what happens when the log are copied from /run to /var/log/journal: this is when journald seems to slow all.

I am prepared a PC which reboot continuously; I am collecting the time required to finish the boot vs the fragmentation of the system.journal file vs the number of boot. The results are dramatic: after 20 reboot, the boot time increase of 20-30 seconds. Doing a defrag of system.journal reduces the boot time to the original one, but after another 20 reboot, the boot time still requires 20-30 seconds more....

It is a slow PC, but I saw the same behavior also on a more modern pc (i5 with 8GB).

For both PC the HD is a mechanical one...

> 
> And that's now a btrfs problem.... :/

Are you sure ?

ghigo@venice:/var/log$ sudo filefrag messages
messages: 29 extents found

ghigo@venice:/var/log$ sudo filefrag journal/*/system.journal
journal/41d686199835445395ac629d576dfcb9/system.journal: 1378 extents found

So the old rsyslog creates files with fewer fragments. BTRFS (but it seems also xfs) for sure highlights more this problem than other filesystem. But also systemd seems to create a lot of extens.

BR
G.Baroncelli



> 
> Cheers,
> 
> Dave.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: R: Re: Slow startup of systemd-journal on BTRFS
  2014-06-12 12:37 ` Duncan
@ 2014-06-12 23:24   ` Dave Chinner
  2014-06-13 22:19     ` Goffredo Baroncelli
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2014-06-12 23:24 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs, systemd-devel

On Thu, Jun 12, 2014 at 12:37:13PM +0000, Duncan wrote:
> Goffredo Baroncelli <kreijack@libero.it> posted on Thu, 12 Jun 2014
> 13:13:26 +0200 as excerpted:
> 
> >>systemd has a very stupid journal write pattern. It checks if there is
> >>space in the file for the write, and if not it fallocates the small
> >>amount of space it needs (it does *4 byte* fallocate calls!) and then
> >>does the write to it.  All this does is fragment the crap out of the log
> >>files because the filesystems cannot optimise the allocation patterns.
> > 
> > I checked the code, and to me it seems that the fallocate() are done in
> > FILE_SIZE_INCREASE unit (actually 8MB).
> 
> FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think 
> actually pretty much equally bad without NOCOW set on the file.

So maybe it's been fixed in systemd since the last time I looked.
Yup:

http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal-file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58

The reason it was changed? To "save a syscall per append", not to
prevent fragmentation of the file, which was the problem everyone
was complaining about...

> Why?  Because btrfs data blocks are 4 KiB.  With COW, the effect for 
> either 4 byte or 8 MiB file allocations is going to end up being the 
> same, forcing (repeated until full) rewrite of each 4 KiB block into its 
> own extent.

And that's now a btrfs problem.... :/

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: R: Re: Slow startup of systemd-journal on BTRFS
  2014-06-12 11:13 Goffredo Baroncelli <kreijack@libero.it>
@ 2014-06-12 12:37 ` Duncan
  2014-06-12 23:24   ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Duncan @ 2014-06-12 12:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: systemd-devel

Goffredo Baroncelli <kreijack@libero.it> posted on Thu, 12 Jun 2014
13:13:26 +0200 as excerpted:

>>systemd has a very stupid journal write pattern. It checks if there is
>>space in the file for the write, and if not it fallocates the small
>>amount of space it needs (it does *4 byte* fallocate calls!) and then
>>does the write to it.  All this does is fragment the crap out of the log
>>files because the filesystems cannot optimise the allocation patterns.
> 
> I checked the code, and to me it seems that the fallocate() are done in
> FILE_SIZE_INCREASE unit (actually 8MB).

FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think 
actually pretty much equally bad without NOCOW set on the file.

Why?  Because btrfs data blocks are 4 KiB.  With COW, the effect for 
either 4 byte or 8 MiB file allocations is going to end up being the 
same, forcing (repeated until full) rewrite of each 4 KiB block into its 
own extent.

Turning off the fallocate should allow btrfs to at least consolidate a 
bit, tho to the extent that multiple 4 KiB blocks cannot be written, 
repeated fsync will still cause issues.

80-100 MiB logs (size mentioned in another reply) should be reasonably 
well handled by btrfs autodefrag, however, if it's turned on.  I'd be 
worried if sizes were > 256 MiB and certainly as sizes approached a GiB, 
but it should handle 80-100 MiB just fine.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* R: Re: Slow startup of systemd-journal on BTRFS
@ 2014-06-12 11:13 Goffredo Baroncelli <kreijack@libero.it>
  2014-06-12 12:37 ` Duncan
  0 siblings, 1 reply; 6+ messages in thread
From: Goffredo Baroncelli <kreijack@libero.it> @ 2014-06-12 11:13 UTC (permalink / raw)
  To: david; +Cc: systemd Mailing List, linux-btrfs

>Da: david@fromorbit.com
>Data: 12/06/2014 3.21
>A: <kreijack@inwind.it>
>Cc: "systemd Mailing List"<systemd-devel@lists.freedesktop.org>, "linux-btrfs"
<linux-btrfs@vger.kernel.org>
>Ogg: Re: Slow startup of systemd-journal on BTRFS
>
>On Wed, Jun 11, 2014 at 11:28:54PM +0200, Goffredo Baroncelli wrote:
>> Hi all,
>> 
>> I would like to share a my experience about a slowness of systemd when used 
on BTRFS.
>> 
>> My boot time was very high (about ~50 seconds); most of time it was due to 
NetworkManager which took about 30-40 seconds to start (this data came from 
"systemd-analyze plot").
>> 
>> I make several attempts to address this issue. Also I noticed that sometime 
this problem disappeared; but I was never able to understand why.
>> 
>> However this link
>> 
>> 	https://bugzilla.redhat.com/show_bug.cgi?id=1006386
>> 
>> suggested me that the problem could be due to a bad interaction between 
systemd and btrfs. NetworkManager was innocent. 
>
>systemd has a very stupid journal write pattern. It checks if there
>is space in the file for the write, and if not it fallocates the
>small amount of space it needs (it does *4 byte* fallocate calls!)
>and then does the write to it.  All this does is fragment the crap
>out of the log files because the filesystems cannot optimise the
>allocation patterns.

I checked the code, and to me it seems that the fallocate() are
done in FILE_SIZE_INCREASE unit (actually 8MB). 

>
>Yup, it fragments journal files on XFS, too.
>
>http://oss.sgi.com/archives/xfs/2014-03/msg00322.html
>
>IIRC, the systemd developers consider this a filesystem problem and
>so refused to change the systemd code to be nice to the filesystem
>allocators, even though they don't actually need to use fallocate...

If I am able to start a decent setup I would like to play to change some
parameters like:
- remove fallocate at all (at the beginning only ?)
- increase the fallocate allocation unit
- change the file log size and rotation time
- periodically defragment
[...[
>
>Cheers,
>
>Dave.
>
>-- 
>Dave Chinner
>david@fromorbit.com
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* R: Re: Slow startup of systemd-journal on BTRFS
@ 2014-06-12 11:07 Goffredo Baroncelli <kreijack@libero.it>
  0 siblings, 0 replies; 6+ messages in thread
From: Goffredo Baroncelli <kreijack@libero.it> @ 2014-06-12 11:07 UTC (permalink / raw)
  To: russell; +Cc: systemd Mailing List, linux-btrfs



>----Messaggio originale----
>Da: russell@coker.com.au
>Data: 12/06/2014 3.18
>A: <kreijack@inwind.it>
>Cc: "systemd Mailing List"<systemd-devel@lists.freedesktop.org>, "linux-btrfs"
<linux-btrfs@vger.kernel.org>
>Ogg: Re: Slow startup of systemd-journal on BTRFS
>
>On Wed, 11 Jun 2014 23:28:54 Goffredo Baroncelli wrote:
>>         https://bugzilla.redhat.com/show_bug.cgi?id=1006386
>> 
>> suggested me that the problem could be due to a bad interaction between
>> systemd and btrfs. NetworkManager was innocent.  It seems that
>> systemd-journal create a very hight fragmented files when it stores its
>> log. And BTRFS it is know to behave slowly when a file is highly
>> fragmented. This had caused a slow startup of systemd-journal, which in
>> turn had blocked the services which depend by the loggin system.
>
>On my BTRFS/systemd systems I edit /etc/systemd/journald.conf and put 
>"SystemMaxUse=50M".  That doesn't solve the fragmentation problem but 
reduces 
>it enough that it doesn't bother me.

IIRC my log files are about 80/100MB. So I am not sure if this could help.
I want to investigate also the option

    MaxFileSec=1d

which rotates the log file once a day (or a week)

>
>-- 
>My Main Blog         http://etbe.coker.com.au/
>My Documents Blog    http://doc.coker.com.au/
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-13 22:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-12 11:05 R: Re: Slow startup of systemd-journal on BTRFS Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 11:07 Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 11:13 Goffredo Baroncelli <kreijack@libero.it>
2014-06-12 12:37 ` Duncan
2014-06-12 23:24   ` Dave Chinner
2014-06-13 22:19     ` Goffredo Baroncelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.