All of lore.kernel.org
 help / color / mirror / Atom feed
* nilfs2 doesn't garbage collect checkpoints for me
@ 2011-05-26 18:11 Dima Tisnek
       [not found] ` <BANLkTim4BBKwFJUzbnsKw0_Ru2k8ZW3MYw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Dima Tisnek @ 2011-05-26 18:11 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

I'm testing nilfs2 and other fs's for use on cheap flash cards, trying
to avoid writing same location all the time.
My test program makes lots of small sqlite transactions which sqlite
syncs to disk.
In less than 2000 transaction 1GB nilfs2 volume ran out of disk space.
tried unmount, mount again, didn't help
block device is nbd, works with with other fs's

lscp shows there are 7121 checkpoints and somehow old ones are not
removed automatically.
df -h shows 944M used 8M free 100% use
nilfs_cleanerd is running, its logs look ok (start, pause, resume,
shutdown, start, pause, resume)

kernel 2.6.38-arch
nilfs-utils 2.0.23

If someone interested I can drop the block device image to dropbox or something.

Thanks,
Dima Tisnek
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nilfs2 doesn't garbage collect checkpoints for me
       [not found] ` <BANLkTim4BBKwFJUzbnsKw0_Ru2k8ZW3MYw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-05-26 18:32   ` dexen deVries
       [not found]     ` <201105262032.54257.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: dexen deVries @ 2011-05-26 18:32 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,


On Thursday 26 of May 2011 20:11:55 you wrote:
> I'm testing nilfs2 and other fs's for use on cheap flash cards, trying
> to avoid writing same location all the time.

I'm using nilfs2 on a small server with a cheap-o 16GB SSD extracted from 
Eeepc for the same reason; works great.



> My test program makes lots of small sqlite transactions which sqlite
> syncs to disk.
> In less than 2000 transaction 1GB nilfs2 volume ran out of disk space.
> tried unmount, mount again, didn't help
> block device is nbd, works with with other fs's
> 
> lscp shows there are 7121 checkpoints and somehow old ones are not
> removed automatically.

First off, the default configuration of nilfs_cleanerd is to keep all 
checkpoints for at least one hour (3600 seconds). See file 
/etc/nilfs_cleanerd.conf, option `protection_period'. For testing you may want 
to change the protection period to just a few seconds and see if that helps. 
Either via the config file (and issue a SIGHUP so it reloads the config) or via 
the `-p SECONDS' argument (see manpage).

To see what's going on, you may want to change (temporarily) the 
`log_priority' in config file to `debug'; in /var/log/debug you should then see 
statements describing actions of the nilfs_cleanerd.


Example:

May 26 20:23:53 blitz nilfs_cleanerd[3198]: wake up
May 26 20:23:53 blitz nilfs_cleanerd[3198]: ncleansegs = 1175
May 26 20:23:53 blitz nilfs_cleanerd[3198]: 4 segments selected to be cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: protected checkpoints = 
[156725,157003] (protection period >= 1306430633)
May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1844 cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1845 cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1846 cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1847 cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: wait 0.488223000


where the `ncleansegs' is the number of clean (free) segments you already 
have, and `protected checkpoints' indicates range of checkpoint numbers that 
are still under protection (due to the `protection_period' setting)


In any case, my understanding is that in typical DB, each transaction (which 
may be each command, if you don't begin/commit transaction explicitly) causes 
an fsync() which creates a new checkpoint. On a small drive that *may* cause 
creation of so many checkpoints in a short time they don't get GC'd before the 
drive fills up. Not sure yet how to work around that.



Two more possible sources of the problem:
1) GC used to break in certain scenario: the FS could become internally 
inconsistent (no data loss, but it wouldn't perform GC anymore) if two or more 
nilfs_cleanerds were processing it at the same time. It's probably fixed with 
the most recent patches. To check if that's the case, see output of `dmesg' 
command; it would indicate problems in NILFS.

2) new `nilfs_cleanerd' process may become stuck on semaphore if you kill the 
old one hard  (for example, kill -9). That used to leave aux file in /dev/shm/, 
like /dev/shm/sem.nilfs-cleaner-2067. To check if that's the case, run 
nilfs_cleanred through strace, like:

# strace -f nilfs_cleanerd /dev/YOUR_FILESYSTEM

if it hangs at one point on futex() call, that's it. A brute-force, but sure-
fire way is to kill all instances of nilfs_cleanerd and remove files matching 
/dev/shm/sem.nilfs-cleaner-*


Hope that helps somehow~


-- 
dexen deVries

``One can't proceed from the informal to the formal by formal means.''
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nilfs2 doesn't garbage collect checkpoints for me
       [not found]     ` <201105262032.54257.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2011-05-26 20:24       ` Dima Tisnek
       [not found]         ` <BANLkTimNm6QcNOmc1Gwp2K+SVKoRV8+8Cg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Dima Tisnek @ 2011-05-26 20:24 UTC (permalink / raw)
  To: dexen deVries; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Dexen,

you were spot on, protection time was the culprit.

while on the subject, I can see many cases where more is written to
the disk in 1h than there is free space (my problem was than entire
disk, but it's equivalent), are there any plans to collect earlier
rather than enfore protection time when running low on disk?

now I set cleaner like this (for accelerated testing):

protection_period       1
min_clean_segments      10%
max_clean_segments      90%
clean_check_interval    1
selection_policy        timestamp       # timestamp in ascend order
nsegments_per_clean     10
mc_nsegments_per_clean  100
cleaning_interval       10
mc_cleaning_interval    1
retry_interval          10
use_mmap
log_priority            info

I know some settings are ridiculous, do tell me if I did something
completely insane :)

same test (~150K disk traffic per transaction, 10 transactions a
second), eventually system begins to swap and then nilfs2 dies leaving
this in dmesg:

[66598.373596] nilfs_cleanerd: page allocation failure. order:0, mode:0x50
[66598.373596] Pid: 30708, comm: nilfs_cleanerd Tainted: G        W
2.6.38-ARCH #1
[66598.373596] Call Trace:
[66598.373596]  [<c10c563c>] ? __alloc_pages_nodemask+0x54c/0x750
[66598.373596]  [<c10ffab8>] ? mem_cgroup_charge_common+0x68/0xb0
[66598.373596]  [<c10bf923>] ? find_or_create_page+0x43/0x90
[66598.373596]  [<d2344733>] ? nilfs_grab_buffer+0x33/0xc0 [nilfs2]
[66598.381684] nbd4: Attempted send on closed socket
[66598.381684] end_request: I/O error, dev nbd4, sector 461512
[66598.394594] nbd4: Attempted send on closed socket
[66598.394594] end_request: I/O error, dev nbd4, sector 461760
[66598.394594] nbd4: Attempted send on closed socket
[66598.394594] end_request: I/O error, dev nbd4, sector 462008
[66598.404089]  [<d2357bcc>] ?
nilfs_gccache_submit_read_data+0x2c/0x140 [nilfs2]
[66598.404089]  [<d2358573>] ?
nilfs_ioctl_clean_segments.isra.8+0x2e3/0x7a0 [nilfs2]
[66598.404089]  [<d2348d09>] ? nilfs_btree_do_lookup+0x1f9/0x290 [nilfs2]
[66598.404089]  [<d2358e36>] ? nilfs_ioctl+0x1f6/0x40c [nilfs2]
[66598.404089]  [<c1033b14>] ? finish_task_switch+0x34/0xb0
[66598.404089]  [<c13192cd>] ? schedule+0x28d/0x9e0
[66598.404089]  [<d2358c40>] ? nilfs_ioctl+0x0/0x40c [nilfs2]
[66598.407101] nbd4: Attempted send on closed socket
[66598.407101] end_request: I/O error, dev nbd4, sector 462256
[66598.407101] nbd4: Attempted send on closed socket
[66598.407101] end_request: I/O error, dev nbd4, sector 462504
[66598.407101] nbd4: Attempted send on closed socket
[66598.407101] end_request: I/O error, dev nbd4, sector 462752
[66598.430221]  [<c1113b99>] ? do_vfs_ioctl+0x79/0x570
[66598.430221]  [<c11ac164>] ? copy_to_user+0x34/0x50
[66598.430221]  [<c11140f7>] ? sys_ioctl+0x67/0x80
[66598.430221]  [<c131c360>] ? syscall_call+0x7/0xb
[66598.430221] Mem-Info:
[66598.430221] DMA per-cpu:
[66598.430221] CPU    0: hi:    0, btch:   1 usd:   0
[66598.430221] Normal per-cpu:
[66598.430221] CPU    0: hi:   90, btch:  15 usd:  11
[66598.430221] active_anon:0 inactive_anon:1 isolated_anon:0
[66598.430221]  active_file:28328 inactive_file:28550 isolated_file:128
[66598.430221]  unevictable:0 dirty:0 writeback:0 unstable:0
[66598.430221]  free:0 slab_reclaimable:2787 slab_unreclaimable:1104
[66598.430221]  mapped:1 shmem:0 pagetables:27 bounce:0
[66598.430221] DMA free:0kB min:120kB low:148kB high:180kB
active_anon:0kB inactive_anon:4kB active_file:5984kB
inactive_file:7032kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB
mapped:0kB shmem:0kB slab_reclaimable:2876kB slab_unreclaimable:8kB
kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:18118 all_unreclaimable? yes
[66598.430221] lowmem_reserve[]: 0 238 238 238



if you give me some hints and I will try again tomorrow :)

Dima Tisnek

On 26 May 2011 11:32, dexen deVries <dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi,
>
>
> On Thursday 26 of May 2011 20:11:55 you wrote:
>> I'm testing nilfs2 and other fs's for use on cheap flash cards, trying
>> to avoid writing same location all the time.
>
> I'm using nilfs2 on a small server with a cheap-o 16GB SSD extracted from
> Eeepc for the same reason; works great.
>
>
>
>> My test program makes lots of small sqlite transactions which sqlite
>> syncs to disk.
>> In less than 2000 transaction 1GB nilfs2 volume ran out of disk space.
>> tried unmount, mount again, didn't help
>> block device is nbd, works with with other fs's
>>
>> lscp shows there are 7121 checkpoints and somehow old ones are not
>> removed automatically.
>
> First off, the default configuration of nilfs_cleanerd is to keep all
> checkpoints for at least one hour (3600 seconds). See file
> /etc/nilfs_cleanerd.conf, option `protection_period'. For testing you may want
> to change the protection period to just a few seconds and see if that helps.
> Either via the config file (and issue a SIGHUP so it reloads the config) or via
> the `-p SECONDS' argument (see manpage).
>
> To see what's going on, you may want to change (temporarily) the
> `log_priority' in config file to `debug'; in /var/log/debug you should then see
> statements describing actions of the nilfs_cleanerd.
>
>
> Example:
>
> May 26 20:23:53 blitz nilfs_cleanerd[3198]: wake up
> May 26 20:23:53 blitz nilfs_cleanerd[3198]: ncleansegs = 1175
> May 26 20:23:53 blitz nilfs_cleanerd[3198]: 4 segments selected to be cleaned
> May 26 20:23:53 blitz nilfs_cleanerd[3198]: protected checkpoints =
> [156725,157003] (protection period >= 1306430633)
> May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1844 cleaned
> May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1845 cleaned
> May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1846 cleaned
> May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1847 cleaned
> May 26 20:23:53 blitz nilfs_cleanerd[3198]: wait 0.488223000
>
>
> where the `ncleansegs' is the number of clean (free) segments you already
> have, and `protected checkpoints' indicates range of checkpoint numbers that
> are still under protection (due to the `protection_period' setting)
>
>
> In any case, my understanding is that in typical DB, each transaction (which
> may be each command, if you don't begin/commit transaction explicitly) causes
> an fsync() which creates a new checkpoint. On a small drive that *may* cause
> creation of so many checkpoints in a short time they don't get GC'd before the
> drive fills up. Not sure yet how to work around that.
>
>
>
> Two more possible sources of the problem:
> 1) GC used to break in certain scenario: the FS could become internally
> inconsistent (no data loss, but it wouldn't perform GC anymore) if two or more
> nilfs_cleanerds were processing it at the same time. It's probably fixed with
> the most recent patches. To check if that's the case, see output of `dmesg'
> command; it would indicate problems in NILFS.
>
> 2) new `nilfs_cleanerd' process may become stuck on semaphore if you kill the
> old one hard  (for example, kill -9). That used to leave aux file in /dev/shm/,
> like /dev/shm/sem.nilfs-cleaner-2067. To check if that's the case, run
> nilfs_cleanred through strace, like:
>
> # strace -f nilfs_cleanerd /dev/YOUR_FILESYSTEM
>
> if it hangs at one point on futex() call, that's it. A brute-force, but sure-
> fire way is to kill all instances of nilfs_cleanerd and remove files matching
> /dev/shm/sem.nilfs-cleaner-*
>
>
> Hope that helps somehow~
>
>
> --
> dexen deVries
>
> ``One can't proceed from the informal to the formal by formal means.''
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nilfs2 doesn't garbage collect checkpoints for me
       [not found]         ` <BANLkTimNm6QcNOmc1Gwp2K+SVKoRV8+8Cg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-05-27  4:38           ` Ryusuke Konishi
       [not found]             ` <20110527.133803.215389018.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2011-05-27  4:38 UTC (permalink / raw)
  To: dimaqq-Re5JQEeQqe8AvxtiuMwx3w
  Cc: dexen.devries-Re5JQEeQqe8AvxtiuMwx3w, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,
On Thu, 26 May 2011 13:24:22 -0700, Dima Tisnek wrote:
> Hi Dexen,
> 
> you were spot on, protection time was the culprit.
> 
> while on the subject, I can see many cases where more is written to
> the disk in 1h than there is free space (my problem was than entire
> disk, but it's equivalent), are there any plans to collect earlier
> rather than enfore protection time when running low on disk?
> 
> now I set cleaner like this (for accelerated testing):
> 
> protection_period       1
> min_clean_segments      10%
> max_clean_segments      90%
> clean_check_interval    1
> selection_policy        timestamp       # timestamp in ascend order
> nsegments_per_clean     10
> mc_nsegments_per_clean  100
> cleaning_interval       10
> mc_cleaning_interval    1
> retry_interval          10
> use_mmap
> log_priority            info
> 
> I know some settings are ridiculous, do tell me if I did something
> completely insane :)

The maximum value of nsegments_per_clean and mc_nsegments_per_clean is
is 32, and it is cut to the value if greater than that.

Setting a large value for these parameters is actually not sane
because cleaning one segment consumes 8MB kernel memory at maximum.
Decreasing cleaning intervals is better way.

But, unfortunately cleaner daemon does not handle subsecond value for
these parameters.

> same test (~150K disk traffic per transaction, 10 transactions a
> second), eventually system begins to swap and then nilfs2 dies leaving
> this in dmesg:
> 
> [66598.373596] nilfs_cleanerd: page allocation failure. order:0, mode:0x50
> [66598.373596] Pid: 30708, comm: nilfs_cleanerd Tainted: G        W
> 2.6.38-ARCH #1
> [66598.373596] Call Trace:
> [66598.373596]  [<c10c563c>] ? __alloc_pages_nodemask+0x54c/0x750
> [66598.373596]  [<c10ffab8>] ? mem_cgroup_charge_common+0x68/0xb0
> [66598.373596]  [<c10bf923>] ? find_or_create_page+0x43/0x90
> [66598.373596]  [<d2344733>] ? nilfs_grab_buffer+0x33/0xc0 [nilfs2]
> [66598.381684] nbd4: Attempted send on closed socket
> [66598.381684] end_request: I/O error, dev nbd4, sector 461512
> [66598.394594] nbd4: Attempted send on closed socket
> [66598.394594] end_request: I/O error, dev nbd4, sector 461760
> [66598.394594] nbd4: Attempted send on closed socket
> [66598.394594] end_request: I/O error, dev nbd4, sector 462008
> [66598.404089]  [<d2357bcc>] ?
> nilfs_gccache_submit_read_data+0x2c/0x140 [nilfs2]
> [66598.404089]  [<d2358573>] ?
> nilfs_ioctl_clean_segments.isra.8+0x2e3/0x7a0 [nilfs2]
> [66598.404089]  [<d2348d09>] ? nilfs_btree_do_lookup+0x1f9/0x290 [nilfs2]
> [66598.404089]  [<d2358e36>] ? nilfs_ioctl+0x1f6/0x40c [nilfs2]
> [66598.404089]  [<c1033b14>] ? finish_task_switch+0x34/0xb0
> [66598.404089]  [<c13192cd>] ? schedule+0x28d/0x9e0
> [66598.404089]  [<d2358c40>] ? nilfs_ioctl+0x0/0x40c [nilfs2]
> [66598.407101] nbd4: Attempted send on closed socket
> [66598.407101] end_request: I/O error, dev nbd4, sector 462256
> [66598.407101] nbd4: Attempted send on closed socket
> [66598.407101] end_request: I/O error, dev nbd4, sector 462504
> [66598.407101] nbd4: Attempted send on closed socket
> [66598.407101] end_request: I/O error, dev nbd4, sector 462752
> [66598.430221]  [<c1113b99>] ? do_vfs_ioctl+0x79/0x570
> [66598.430221]  [<c11ac164>] ? copy_to_user+0x34/0x50
> [66598.430221]  [<c11140f7>] ? sys_ioctl+0x67/0x80
> [66598.430221]  [<c131c360>] ? syscall_call+0x7/0xb
> [66598.430221] Mem-Info:
> [66598.430221] DMA per-cpu:
> [66598.430221] CPU    0: hi:    0, btch:   1 usd:   0
> [66598.430221] Normal per-cpu:
> [66598.430221] CPU    0: hi:   90, btch:  15 usd:  11
> [66598.430221] active_anon:0 inactive_anon:1 isolated_anon:0
> [66598.430221]  active_file:28328 inactive_file:28550 isolated_file:128
> [66598.430221]  unevictable:0 dirty:0 writeback:0 unstable:0
> [66598.430221]  free:0 slab_reclaimable:2787 slab_unreclaimable:1104
> [66598.430221]  mapped:1 shmem:0 pagetables:27 bounce:0
> [66598.430221] DMA free:0kB min:120kB low:148kB high:180kB
> active_anon:0kB inactive_anon:4kB active_file:5984kB
> inactive_file:7032kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB
> mapped:0kB shmem:0kB slab_reclaimable:2876kB slab_unreclaimable:8kB
> kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB
> writeback_tmp:0kB pages_scanned:18118 all_unreclaimable? yes
> [66598.430221] lowmem_reserve[]: 0 238 238 238

This log actually shows that the kernel memory shortage happened.

You should decrease mc_nsegments_per_clean at least less than 32.

Well, I'll consider changing the parser routine to allow subsecond
values for the interval parameters.

Thanks,
Ryusuke Konishi
 
 
> if you give me some hints and I will try again tomorrow :)
> 
> Dima Tisnek
> 
> On 26 May 2011 11:32, dexen deVries <dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > Hi,
> >
> >
> > On Thursday 26 of May 2011 20:11:55 you wrote:
> >> I'm testing nilfs2 and other fs's for use on cheap flash cards, trying
> >> to avoid writing same location all the time.
> >
> > I'm using nilfs2 on a small server with a cheap-o 16GB SSD extracted from
> > Eeepc for the same reason; works great.
> >
> >
> >
> >> My test program makes lots of small sqlite transactions which sqlite
> >> syncs to disk.
> >> In less than 2000 transaction 1GB nilfs2 volume ran out of disk space.
> >> tried unmount, mount again, didn't help
> >> block device is nbd, works with with other fs's
> >>
> >> lscp shows there are 7121 checkpoints and somehow old ones are not
> >> removed automatically.
> >
> > First off, the default configuration of nilfs_cleanerd is to keep all
> > checkpoints for at least one hour (3600 seconds). See file
> > /etc/nilfs_cleanerd.conf, option `protection_period'. For testing you may want
> > to change the protection period to just a few seconds and see if that helps.
> > Either via the config file (and issue a SIGHUP so it reloads the config) or via
> > the `-p SECONDS' argument (see manpage).
> >
> > To see what's going on, you may want to change (temporarily) the
> > `log_priority' in config file to `debug'; in /var/log/debug you should then see
> > statements describing actions of the nilfs_cleanerd.
> >
> >
> > Example:
> >
> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: wake up
> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: ncleansegs = 1175
> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: 4 segments selected to be cleaned
> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: protected checkpoints =
> > [156725,157003] (protection period >= 1306430633)
> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1844 cleaned
> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1845 cleaned
> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1846 cleaned
> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1847 cleaned
> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: wait 0.488223000
> >
> >
> > where the `ncleansegs' is the number of clean (free) segments you already
> > have, and `protected checkpoints' indicates range of checkpoint numbers that
> > are still under protection (due to the `protection_period' setting)
> >
> >
> > In any case, my understanding is that in typical DB, each transaction (which
> > may be each command, if you don't begin/commit transaction explicitly) causes
> > an fsync() which creates a new checkpoint. On a small drive that *may* cause
> > creation of so many checkpoints in a short time they don't get GC'd before the
> > drive fills up. Not sure yet how to work around that.
> >
> >
> >
> > Two more possible sources of the problem:
> > 1) GC used to break in certain scenario: the FS could become internally
> > inconsistent (no data loss, but it wouldn't perform GC anymore) if two or more
> > nilfs_cleanerds were processing it at the same time. It's probably fixed with
> > the most recent patches. To check if that's the case, see output of `dmesg'
> > command; it would indicate problems in NILFS.
> >
> > 2) new `nilfs_cleanerd' process may become stuck on semaphore if you kill the
> > old one hard  (for example, kill -9). That used to leave aux file in /dev/shm/,
> > like /dev/shm/sem.nilfs-cleaner-2067. To check if that's the case, run
> > nilfs_cleanred through strace, like:
> >
> > # strace -f nilfs_cleanerd /dev/YOUR_FILESYSTEM
> >
> > if it hangs at one point on futex() call, that's it. A brute-force, but sure-
> > fire way is to kill all instances of nilfs_cleanerd and remove files matching
> > /dev/shm/sem.nilfs-cleaner-*
> >
> >
> > Hope that helps somehow~
> >
> >
> > --
> > dexen deVries
> >
> > ``One can't proceed from the informal to the formal by formal means.''
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nilfs2 doesn't garbage collect checkpoints for me
       [not found]             ` <20110527.133803.215389018.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-05-31 22:08               ` Dima Tisnek
       [not found]                 ` <BANLkTinnpFyrxeO2_DF5gXLgas2WLdqw4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Dima Tisnek @ 2011-05-31 22:08 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke,

you were right, cleanerd memory usage was the culprit - my accelerated
test runs under virtualbox with only 256M of ram, thus high segment
numbers eat up a large portion of that. I now have nsegments 5 and 10
(mc), and tests run to completion :)

while on the subject, does nilfs have to crash under low memory,
and will it always crash if system is oom?

now my next problem, nilfs interacts rather badly with nbd, I get 100%
system cpu utilization in nbd (as seen in top). I get high cpu usage
with other filesystems too actually, with nilfs it's just much much
worse. whole system becomes completely unresponsive and I can only get
4 small sqlite transactions per second. ext3/nbd gives me at least 10x
more tps.

Can I do anything to help track down this issue?

I tried loopback, that doesn't suit my testcase, as I can't get stats
out of loopback device.
nilfs/loop works fine most of the time, but becomes sluggish (still
usable) when cleanerd has to do some work.

d.

On 26 May 2011 21:38, Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> wrote:
> Hi,
> On Thu, 26 May 2011 13:24:22 -0700, Dima Tisnek wrote:
>> Hi Dexen,
>>
>> you were spot on, protection time was the culprit.
>>
>> while on the subject, I can see many cases where more is written to
>> the disk in 1h than there is free space (my problem was than entire
>> disk, but it's equivalent), are there any plans to collect earlier
>> rather than enfore protection time when running low on disk?
>>
>> now I set cleaner like this (for accelerated testing):
>>
>> protection_period       1
>> min_clean_segments      10%
>> max_clean_segments      90%
>> clean_check_interval    1
>> selection_policy        timestamp       # timestamp in ascend order
>> nsegments_per_clean     10
>> mc_nsegments_per_clean  100
>> cleaning_interval       10
>> mc_cleaning_interval    1
>> retry_interval          10
>> use_mmap
>> log_priority            info
>>
>> I know some settings are ridiculous, do tell me if I did something
>> completely insane :)
>
> The maximum value of nsegments_per_clean and mc_nsegments_per_clean is
> is 32, and it is cut to the value if greater than that.
>
> Setting a large value for these parameters is actually not sane
> because cleaning one segment consumes 8MB kernel memory at maximum.
> Decreasing cleaning intervals is better way.
>
> But, unfortunately cleaner daemon does not handle subsecond value for
> these parameters.
>
>> same test (~150K disk traffic per transaction, 10 transactions a
>> second), eventually system begins to swap and then nilfs2 dies leaving
>> this in dmesg:
>>
>> [66598.373596] nilfs_cleanerd: page allocation failure. order:0, mode:0x50
>> [66598.373596] Pid: 30708, comm: nilfs_cleanerd Tainted: G        W
>> 2.6.38-ARCH #1
>> [66598.373596] Call Trace:
>> [66598.373596]  [<c10c563c>] ? __alloc_pages_nodemask+0x54c/0x750
>> [66598.373596]  [<c10ffab8>] ? mem_cgroup_charge_common+0x68/0xb0
>> [66598.373596]  [<c10bf923>] ? find_or_create_page+0x43/0x90
>> [66598.373596]  [<d2344733>] ? nilfs_grab_buffer+0x33/0xc0 [nilfs2]
>> [66598.381684] nbd4: Attempted send on closed socket
>> [66598.381684] end_request: I/O error, dev nbd4, sector 461512
>> [66598.394594] nbd4: Attempted send on closed socket
>> [66598.394594] end_request: I/O error, dev nbd4, sector 461760
>> [66598.394594] nbd4: Attempted send on closed socket
>> [66598.394594] end_request: I/O error, dev nbd4, sector 462008
>> [66598.404089]  [<d2357bcc>] ?
>> nilfs_gccache_submit_read_data+0x2c/0x140 [nilfs2]
>> [66598.404089]  [<d2358573>] ?
>> nilfs_ioctl_clean_segments.isra.8+0x2e3/0x7a0 [nilfs2]
>> [66598.404089]  [<d2348d09>] ? nilfs_btree_do_lookup+0x1f9/0x290 [nilfs2]
>> [66598.404089]  [<d2358e36>] ? nilfs_ioctl+0x1f6/0x40c [nilfs2]
>> [66598.404089]  [<c1033b14>] ? finish_task_switch+0x34/0xb0
>> [66598.404089]  [<c13192cd>] ? schedule+0x28d/0x9e0
>> [66598.404089]  [<d2358c40>] ? nilfs_ioctl+0x0/0x40c [nilfs2]
>> [66598.407101] nbd4: Attempted send on closed socket
>> [66598.407101] end_request: I/O error, dev nbd4, sector 462256
>> [66598.407101] nbd4: Attempted send on closed socket
>> [66598.407101] end_request: I/O error, dev nbd4, sector 462504
>> [66598.407101] nbd4: Attempted send on closed socket
>> [66598.407101] end_request: I/O error, dev nbd4, sector 462752
>> [66598.430221]  [<c1113b99>] ? do_vfs_ioctl+0x79/0x570
>> [66598.430221]  [<c11ac164>] ? copy_to_user+0x34/0x50
>> [66598.430221]  [<c11140f7>] ? sys_ioctl+0x67/0x80
>> [66598.430221]  [<c131c360>] ? syscall_call+0x7/0xb
>> [66598.430221] Mem-Info:
>> [66598.430221] DMA per-cpu:
>> [66598.430221] CPU    0: hi:    0, btch:   1 usd:   0
>> [66598.430221] Normal per-cpu:
>> [66598.430221] CPU    0: hi:   90, btch:  15 usd:  11
>> [66598.430221] active_anon:0 inactive_anon:1 isolated_anon:0
>> [66598.430221]  active_file:28328 inactive_file:28550 isolated_file:128
>> [66598.430221]  unevictable:0 dirty:0 writeback:0 unstable:0
>> [66598.430221]  free:0 slab_reclaimable:2787 slab_unreclaimable:1104
>> [66598.430221]  mapped:1 shmem:0 pagetables:27 bounce:0
>> [66598.430221] DMA free:0kB min:120kB low:148kB high:180kB
>> active_anon:0kB inactive_anon:4kB active_file:5984kB
>> inactive_file:7032kB unevictable:0kB isolated(anon):0kB
>> isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB
>> mapped:0kB shmem:0kB slab_reclaimable:2876kB slab_unreclaimable:8kB
>> kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB
>> writeback_tmp:0kB pages_scanned:18118 all_unreclaimable? yes
>> [66598.430221] lowmem_reserve[]: 0 238 238 238
>
> This log actually shows that the kernel memory shortage happened.
>
> You should decrease mc_nsegments_per_clean at least less than 32.
>
> Well, I'll consider changing the parser routine to allow subsecond
> values for the interval parameters.
>
> Thanks,
> Ryusuke Konishi
>
>
>> if you give me some hints and I will try again tomorrow :)
>>
>> Dima Tisnek
>>
>> On 26 May 2011 11:32, dexen deVries <dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> > Hi,
>> >
>> >
>> > On Thursday 26 of May 2011 20:11:55 you wrote:
>> >> I'm testing nilfs2 and other fs's for use on cheap flash cards, trying
>> >> to avoid writing same location all the time.
>> >
>> > I'm using nilfs2 on a small server with a cheap-o 16GB SSD extracted from
>> > Eeepc for the same reason; works great.
>> >
>> >
>> >
>> >> My test program makes lots of small sqlite transactions which sqlite
>> >> syncs to disk.
>> >> In less than 2000 transaction 1GB nilfs2 volume ran out of disk space.
>> >> tried unmount, mount again, didn't help
>> >> block device is nbd, works with with other fs's
>> >>
>> >> lscp shows there are 7121 checkpoints and somehow old ones are not
>> >> removed automatically.
>> >
>> > First off, the default configuration of nilfs_cleanerd is to keep all
>> > checkpoints for at least one hour (3600 seconds). See file
>> > /etc/nilfs_cleanerd.conf, option `protection_period'. For testing you may want
>> > to change the protection period to just a few seconds and see if that helps.
>> > Either via the config file (and issue a SIGHUP so it reloads the config) or via
>> > the `-p SECONDS' argument (see manpage).
>> >
>> > To see what's going on, you may want to change (temporarily) the
>> > `log_priority' in config file to `debug'; in /var/log/debug you should then see
>> > statements describing actions of the nilfs_cleanerd.
>> >
>> >
>> > Example:
>> >
>> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: wake up
>> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: ncleansegs = 1175
>> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: 4 segments selected to be cleaned
>> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: protected checkpoints =
>> > [156725,157003] (protection period >= 1306430633)
>> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1844 cleaned
>> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1845 cleaned
>> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1846 cleaned
>> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1847 cleaned
>> > May 26 20:23:53 blitz nilfs_cleanerd[3198]: wait 0.488223000
>> >
>> >
>> > where the `ncleansegs' is the number of clean (free) segments you already
>> > have, and `protected checkpoints' indicates range of checkpoint numbers that
>> > are still under protection (due to the `protection_period' setting)
>> >
>> >
>> > In any case, my understanding is that in typical DB, each transaction (which
>> > may be each command, if you don't begin/commit transaction explicitly) causes
>> > an fsync() which creates a new checkpoint. On a small drive that *may* cause
>> > creation of so many checkpoints in a short time they don't get GC'd before the
>> > drive fills up. Not sure yet how to work around that.
>> >
>> >
>> >
>> > Two more possible sources of the problem:
>> > 1) GC used to break in certain scenario: the FS could become internally
>> > inconsistent (no data loss, but it wouldn't perform GC anymore) if two or more
>> > nilfs_cleanerds were processing it at the same time. It's probably fixed with
>> > the most recent patches. To check if that's the case, see output of `dmesg'
>> > command; it would indicate problems in NILFS.
>> >
>> > 2) new `nilfs_cleanerd' process may become stuck on semaphore if you kill the
>> > old one hard  (for example, kill -9). That used to leave aux file in /dev/shm/,
>> > like /dev/shm/sem.nilfs-cleaner-2067. To check if that's the case, run
>> > nilfs_cleanred through strace, like:
>> >
>> > # strace -f nilfs_cleanerd /dev/YOUR_FILESYSTEM
>> >
>> > if it hangs at one point on futex() call, that's it. A brute-force, but sure-
>> > fire way is to kill all instances of nilfs_cleanerd and remove files matching
>> > /dev/shm/sem.nilfs-cleaner-*
>> >
>> >
>> > Hope that helps somehow~
>> >
>> >
>> > --
>> > dexen deVries
>> >
>> > ``One can't proceed from the informal to the formal by formal means.''
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nilfs2 doesn't garbage collect checkpoints for me
       [not found]                 ` <BANLkTinnpFyrxeO2_DF5gXLgas2WLdqw4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-06-01  8:03                   ` dexen deVries
       [not found]                     ` <201106011003.24656.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: dexen deVries @ 2011-06-01  8:03 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Dima,


On Wednesday 01 of June 2011 00:08:52 you wrote:
> (...)
> now my next problem, nilfs interacts rather badly with nbd(...)

What's the reason for nbd here anyway? Thought you wanted to access a flash 
memory card...?

You may want to play with `sync', `fua' and `flush' options of nbd-server, if 
you can still ensure reliable operation.


You may want to increase /proc/sys/vm/min_free_kbytes; if some proces eat up 
most of the 256MB of RAM, it's it that leak memory and should be shot by OOM 
;-)


regards,
-- 
dexen deVries

[[[↓][→]]]

For example, if the first thing in the file is:
   <?kzy irefvba="1.0" rapbqvat="ebg13"?>
an XML parser will recognize that the document is stored in the traditional 
ROT13 encoding.

(( Joe English, http://www.flightlab.com/~joe/sgml/faq-not.txt ))
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nilfs2 doesn't garbage collect checkpoints for me
       [not found]                     ` <201106011003.24656.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2011-06-01 17:32                       ` Dima Tisnek
       [not found]                         ` <BANLkTimhLPM3TU1uw10Ub5t-GEW_s-B_tQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Dima Tisnek @ 2011-06-01 17:32 UTC (permalink / raw)
  To: dexen deVries; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

I'm trying to analyze how different filesystems, settings and user
mode settings interact with write amplification and write endurance on
flash.

Thus I need to get details of every write that fs issues to the
underlying block device.

I copied/wrote my own nbd server in python that logs writes and then I
analyze that against some flash model, e.g.:
block1,block2,block3 -> 1 sector 1 erase
block1,timeout,block2 -> 1 sector 2 erases
block1,block2,block1 again -> 1 sector 2 erases

I'm still working on these models, as different flashes clearly work
differently and there's often no way to find out exactly how, for now
I assume reasonably dumb flash. I'm running nbd server in the host and
nbd client in the guest (virtual) os, as I want to keep server
separate from test memory pressure. My current test memory usage is
<40M used, 7M free, few M buffers, 200M cached, so I think I solved
memory issues.

I'll post the detailed results in my blog when I make sense of them
and make sure I did everything correctly.
My critical workload is a lot of small transactions that hits flash
much harder than large files or few large transactions. I use sqlite,
it sync's at least once per transaction. Think regular user app like
firefox.

In short:
ext3/ext4 with journal are actually quite reasonable -- most used
erase sector gets 1 erase per transaction.
ext4 without journal is only 0.5x total bytes written, but less safe
ext4 sripe/stride size == erase block size doesn't help anything
nilfs is very promising wrt write endurance, but total written bytes is 7x more
nilfs and btrfs are too raw for production use

d.

On 1 June 2011 01:03, dexen deVries <dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Dima,
>
>
> On Wednesday 01 of June 2011 00:08:52 you wrote:
>> (...)
>> now my next problem, nilfs interacts rather badly with nbd(...)
>
> What's the reason for nbd here anyway? Thought you wanted to access a flash
> memory card...?
>
> You may want to play with `sync', `fua' and `flush' options of nbd-server, if
> you can still ensure reliable operation.
>
>
> You may want to increase /proc/sys/vm/min_free_kbytes; if some proces eat up
> most of the 256MB of RAM, it's it that leak memory and should be shot by OOM
> ;-)
>
>
> regards,
> --
> dexen deVries
>
> [[[↓][→]]]
>
> For example, if the first thing in the file is:
>   <?kzy irefvba="1.0" rapbqvat="ebg13"?>
> an XML parser will recognize that the document is stored in the traditional
> ROT13 encoding.
>
> (( Joe English, http://www.flightlab.com/~joe/sgml/faq-not.txt ))
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nilfs2 doesn't garbage collect checkpoints for me
       [not found]                         ` <BANLkTimhLPM3TU1uw10Ub5t-GEW_s-B_tQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-06-01 17:34                           ` dexen deVries
       [not found]                             ` <201106011934.30725.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: dexen deVries @ 2011-06-01 17:34 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Dima,


On Wednesday 01 June 2011 19:32:07 you wrote:
> I'm trying to analyze how different filesystems, settings and user
> mode settings interact with write amplification and write endurance on
> flash.
> 
> Thus I need to get details of every write that fs issues to the
> underlying block device.

Have you considered either of:


# iostat -d -k 1

http://linux.die.net/man/8/blktrace


-- 
dexen deVries

``One can't proceed from the informal to the formal by formal means.''
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nilfs2 doesn't garbage collect checkpoints for me
       [not found]                             ` <201106011934.30725.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2011-06-01 18:21                               ` Dima Tisnek
       [not found]                                 ` <BANLkTinsZZeRDmQR8sAWrqUP1N4UK5ktAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Dima Tisnek @ 2011-06-01 18:21 UTC (permalink / raw)
  To: dexen deVries; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Thanks for the tip, bkltrace seems to give me what I want and much more!

(iostat doesn't give me block address)

On 1 June 2011 10:34, dexen deVries <dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Dima,
>
>
> On Wednesday 01 June 2011 19:32:07 you wrote:
>> I'm trying to analyze how different filesystems, settings and user
>> mode settings interact with write amplification and write endurance on
>> flash.
>>
>> Thus I need to get details of every write that fs issues to the
>> underlying block device.
>
> Have you considered either of:
>
>
> # iostat -d -k 1
>
> http://linux.die.net/man/8/blktrace
>
>
> --
> dexen deVries
>
> ``One can't proceed from the informal to the formal by formal means.''
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nilfs2 doesn't garbage collect checkpoints for me
       [not found]                                 ` <BANLkTinsZZeRDmQR8sAWrqUP1N4UK5ktAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-06-01 19:19                                   ` dexen deVries
  0 siblings, 0 replies; 10+ messages in thread
From: dexen deVries @ 2011-06-01 19:19 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi again, Dima,


On Wednesday 01 June 2011 20:21:07 you wrote:
> Thanks for the tip, bkltrace seems to give me what I want and much more!

yay :)


dunno if that'll help you at all, but:
I've been observing some process writing to NILFS2 to-day. It was issuing one 
(filesystem) transaction every couple seconds. About 250kb was written to the 
drive (of both data and metadata) -- and the interesting thing is, it was 
written as one I/O operation (as reported by iostat).

Depending on configuration and properties of your storage (possibly including 
multiple-sector i/o configuration, see hdparm's `-m' option), a (filesystem-
level) transaction on to NILFS2 may issue just one drive-level operation.

As opposed to other filesystems, which keep data and various kinds of metadataa 
all separate.

Of course later on the GC will kick in, but it also writes large chunks at 
once, rather than spreading them all over the storage.



Cheers,
-- 
dexen deVries

``One can't proceed from the informal to the formal by formal means.''
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-06-01 19:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-26 18:11 nilfs2 doesn't garbage collect checkpoints for me Dima Tisnek
     [not found] ` <BANLkTim4BBKwFJUzbnsKw0_Ru2k8ZW3MYw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-26 18:32   ` dexen deVries
     [not found]     ` <201105262032.54257.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-05-26 20:24       ` Dima Tisnek
     [not found]         ` <BANLkTimNm6QcNOmc1Gwp2K+SVKoRV8+8Cg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-27  4:38           ` Ryusuke Konishi
     [not found]             ` <20110527.133803.215389018.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-05-31 22:08               ` Dima Tisnek
     [not found]                 ` <BANLkTinnpFyrxeO2_DF5gXLgas2WLdqw4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-06-01  8:03                   ` dexen deVries
     [not found]                     ` <201106011003.24656.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-06-01 17:32                       ` Dima Tisnek
     [not found]                         ` <BANLkTimhLPM3TU1uw10Ub5t-GEW_s-B_tQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-06-01 17:34                           ` dexen deVries
     [not found]                             ` <201106011934.30725.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-06-01 18:21                               ` Dima Tisnek
     [not found]                                 ` <BANLkTinsZZeRDmQR8sAWrqUP1N4UK5ktAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-06-01 19:19                                   ` dexen deVries

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.