more gc experiences

* more gc experiences
@ 2015-10-04 12:49 Marc Lehmann
  2015-10-05  7:25 ` more gc / gc script refinements Marc Lehmann
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Lehmann @ 2015-10-04 12:49 UTC (permalink / raw)
  To: linux-f2fs-devel

Still working on my nearly full volume.

After stopping at ~180GB free due to low performance, I reconfigured the GC
so it runs more often and let it run for less than two hours.

   echo   500 >gc_min_sleep_time
   echo   500 >gc_max_sleep_time
   echo   500 >gc_no_gc_sleep_time

status before/after:
http://ue.tst.eu/dc80b74b69bbb431f51d731d8b075324.txt
http://ue.tst.eu/821be8e5c227653a90596c1503b84567.txt

During GC I had a steady 45MB/read + 45MB/write. The number of Dirty segments
didn't reduce very much, but that is likely due to the structure of the data:

Apart from fs metadata, I also did rsync without -H, followed by rsync with
-H. The latter replaces physical file copies that rsync created by hardlinks,
if they were hardlinks in the source. The source had a moderate anmount of
hardlinked files.

Since this probably created nicely spread holes all over the data, it's
expected that the GC has to copy a lot of data (at -s64), so overall, during
this time, the GC seemed to work very fine.

After that, I "stopped" the GC again and started the rsync, which then
proceeded to copy another 110GB (ending up at 70GB free), where the
performance again became unusably slow.

I then decided to give F2FS_IOC_GARBAGE_COLLECT a try with the following
script, and found some issues:

http://ue.tst.eu/9723851c87bb35e5899534123a5af497.txt

The first problem is that the ioctl seems to return instantly and
successfully when the bg garbage collect thread runs. Or at leats, thats my
theory: the sympton was that most of the time calls took 1-4 seconds, but
regularly (presumably when the kernel gc runs), the call returned within
microseconds.

This causes unnecessary high CPU usage - I think the call should just wait
for the GC lock in that case, or (less preferably) somehow signal this
third condition so the user code can do something else.

Which brings us to the next problem - calling the GC ioctl in a loop quickly
generated 23GB of dirty pages, which then more or less locked up the box - no
login was possible for 6 minutes after I killed the GC script, no nfs
operations took place.

While this is a shortcoming with linux in general, it highlights the
principal problem of not having any rate control in f2fs's gc - basically,
the user has to guess when the GC is done, and when the next round can
start, which is, in general, impossible, as only the fs knows the real I/O
load. Or in other words, here, again, the script would have to contain a
magic delay, just like gc_min_sleep_time, after each round.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread