All of lore.kernel.org
 help / color / mirror / Atom feed
* Work (really slow directory access on ext4)
@ 2014-08-06 14:49 Theodore Ts'o
  2014-08-06 18:26 ` Arlie Stephens
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2014-08-06 14:49 UTC (permalink / raw)
  To: kernelnewbies

I don't subscribe to kernelnewbies, but I came across this thread in
the mail archive while researching an unrelated issue.

Valdis' observations are on the mark here.  It's almost certain that
you are getting overwhelmed with other disk traffic, because your
directory isn't *that* big.

That being said, there are certainly issues with really really big
directories, and solving this is certainly not going to be a newbie
project (if it was easy to solve, it would have been addressed a long
time ago).   See:

http://en.it-usenet.org/thread/11916/10367/

for the background.  It's a little bit dated, in that we do use a
64-bit hash on 64-bit systems, but the fundamental issues are still
there.

If you sort the readdir files by inode order, this can help
significantly.  Some userspace programs, such as mutt, do this.
Unfortunately "ls" does not.  (That might be a good newbie project,
since it's a userspace-only project.  However, I'm pretty sure the
shellutils maintainers will also react negatively if they are sent
patches which don't compile.  :-)

A proof of concept of how this can be a win can be found here:

http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/contrib/spd_readdir.c

LD_PRELOAD aren't guaranteed to work on all programs, so this is much
more of a hack than something I'd recommend for extended production
use.  But it shows that if you have a readdir+stat workload, sorting
by inode makes a huge difference.

As far as getting traces to better understand problems, I strongly
suggest that you try things like vmstat, iostat, and blktrace; system
call traces like strace aren't going to get you very far.  (See
http://brooker.co.za/blog/2013/07/14/io-performance.html for a nice
introduction to blktrace).  Use the scientific method; collect
baseline statistics using vmstat, iostat, sar, before you run your
test workload, so you know how much I/O is going on before you start
your test.  If you can run your test on a quiscient system, that's a
really good idea.  Then collect statistics as your run your workload,
and then only tweak one variable at a time, and record everything in a
systematic way.

Finally, if you have more problems of a technical nature with respect
to the ext4, there is the ext3-users at redhat.com list, or the
developer's list at linux-ext4 at vger.kernel.org.  It would be nice if
you tried the ext3-users or the kernel-newbies or tried googling to
see if anyone else has come across the problem and figured out the
solution already, but if you can't figure things out any other way, do
feel free to ask the linux-ext4 list.  We won't bite.  :-)

Cheers,

						- Ted

P.S.  If you have a large number of directories which are much larger
than you expect, and you don't want to do the "mkdir foo.new; mv foo/*
foo.new ; rmdir foo; mv foo.new foo" trick on a large number of
directories, you can also schedule downtime and while the file system
is unmounted, use "e2fsck -fD".  See the man page for more details.
It won't solve all of your problems, and it might not solve any of
your problem, but it will probably make the performance of large
directories somewhat better.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-08-06 14:49 Work (really slow directory access on ext4) Theodore Ts'o
@ 2014-08-06 18:26 ` Arlie Stephens
  2014-08-06 19:29   ` Nick Krause
  0 siblings, 1 reply; 12+ messages in thread
From: Arlie Stephens @ 2014-08-06 18:26 UTC (permalink / raw)
  To: kernelnewbies

On Aug 06 2014, Theodore Ts'o wrote:
> 
> I don't subscribe to kernelnewbies, but I came across this thread in
> the mail archive while researching an unrelated issue.
> 
> Valdis' observations are on the mark here.  It's almost certain that
> you are getting overwhelmed with other disk traffic, because your
> directory isn't *that* big.

Thank you very much. As the user in question, I'm afraid this one
turns out to be a clear case of "user is an idiot." 

I made a dumb mistake in the way I was measuring things. The situation
on this server is not as bad as it looked. 

> That being said, there are certainly issues with really really big
> directories, and solving this is certainly not going to be a newbie
> project (if it was easy to solve, it would have been addressed a long
> time ago).   See:
> 
> http://en.it-usenet.org/thread/11916/10367/

However, this response is precious. Suddenly a whole bunch of things
make sense from that posting alone. Last time I looked seriously at
file system code, it was the Berkeley Fast File System, also known as
UFS. I've never had time and inclination to look at a modern file
system. That article managed to straighten out multiple misconceptions
for me, and point me in good directions. 

> for the background.  It's a little bit dated, in that we do use a
> 64-bit hash on 64-bit systems, but the fundamental issues are still
> there.

And that's in addition to what you covered here - which includes what
might be a useful workaround for the application which may or may not
be hitting a problem that the ls test was intended to simplify. I'm
passing that on to the app. developer. 

Many, many thanks.  

> If you sort the readdir files by inode order, this can help
> significantly.  Some userspace programs, such as mutt, do this.
> Unfortunately "ls" does not.  (That might be a good newbie project,
> since it's a userspace-only project.  However, I'm pretty sure the
> shellutils maintainers will also react negatively if they are sent
> patches which don't compile.  :-)
> 
> A proof of concept of how this can be a win can be found here:
> 
> http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/contrib/spd_readdir.c
> 
> LD_PRELOAD aren't guaranteed to work on all programs, so this is much
> more of a hack than something I'd recommend for extended production
> use.  But it shows that if you have a readdir+stat workload, sorting
> by inode makes a huge difference.
> 
> As far as getting traces to better understand problems, I strongly
> suggest that you try things like vmstat, iostat, and blktrace; system
> call traces like strace aren't going to get you very far.  (See
> http://brooker.co.za/blog/2013/07/14/io-performance.html for a nice
> introduction to blktrace).  Use the scientific method; collect
> baseline statistics using vmstat, iostat, sar, before you run your
> test workload, so you know how much I/O is going on before you start
> your test.  If you can run your test on a quiscient system, that's a
> really good idea.  Then collect statistics as your run your workload,
> and then only tweak one variable at a time, and record everything in a
> systematic way.

Another tool I didn't know about. Thank you very much. 
> 
> Finally, if you have more problems of a technical nature with respect
> to the ext4, there is the ext3-users at redhat.com list, or the
> developer's list at linux-ext4 at vger.kernel.org.  It would be nice if
> you tried the ext3-users or the kernel-newbies or tried googling to
> see if anyone else has come across the problem and figured out the
> solution already, but if you can't figure things out any other way, do
> feel free to ask the linux-ext4 list.  We won't bite.  :-)

Thank you. I'll make sure to do my homework properly in future - and
never never believe things senior members of my team tell me without
verifying them first, at least not if I'm going to post about them :-( 

> 
> Cheers,
> 
> 						- Ted
> 
> P.S.  If you have a large number of directories which are much larger
> than you expect, and you don't want to do the "mkdir foo.new; mv foo/*
> foo.new ; rmdir foo; mv foo.new foo" trick on a large number of
> directories, you can also schedule downtime and while the file system
> is unmounted, use "e2fsck -fD".  See the man page for more details.
> It won't solve all of your problems, and it might not solve any of
> your problem, but it will probably make the performance of large
> directories somewhat better.

Another hint of substantially more value than everything I posted
about this topic. 

Thank you again.

-- 
Arlie

(Arlie Stephens					arlie at worldash.org)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-08-06 18:26 ` Arlie Stephens
@ 2014-08-06 19:29   ` Nick Krause
  0 siblings, 0 replies; 12+ messages in thread
From: Nick Krause @ 2014-08-06 19:29 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Aug 6, 2014 at 2:26 PM, Arlie Stephens <arlie@worldash.org> wrote:
> On Aug 06 2014, Theodore Ts'o wrote:
>>
>> I don't subscribe to kernelnewbies, but I came across this thread in
>> the mail archive while researching an unrelated issue.
>>
>> Valdis' observations are on the mark here.  It's almost certain that
>> you are getting overwhelmed with other disk traffic, because your
>> directory isn't *that* big.
>
> Thank you very much. As the user in question, I'm afraid this one
> turns out to be a clear case of "user is an idiot."
>
> I made a dumb mistake in the way I was measuring things. The situation
> on this server is not as bad as it looked.
>
>> That being said, there are certainly issues with really really big
>> directories, and solving this is certainly not going to be a newbie
>> project (if it was easy to solve, it would have been addressed a long
>> time ago).   See:
>>
>> http://en.it-usenet.org/thread/11916/10367/
>
> However, this response is precious. Suddenly a whole bunch of things
> make sense from that posting alone. Last time I looked seriously at
> file system code, it was the Berkeley Fast File System, also known as
> UFS. I've never had time and inclination to look at a modern file
> system. That article managed to straighten out multiple misconceptions
> for me, and point me in good directions.
>
>> for the background.  It's a little bit dated, in that we do use a
>> 64-bit hash on 64-bit systems, but the fundamental issues are still
>> there.
>
> And that's in addition to what you covered here - which includes what
> might be a useful workaround for the application which may or may not
> be hitting a problem that the ls test was intended to simplify. I'm
> passing that on to the app. developer.
>
> Many, many thanks.
>
>> If you sort the readdir files by inode order, this can help
>> significantly.  Some userspace programs, such as mutt, do this.
>> Unfortunately "ls" does not.  (That might be a good newbie project,
>> since it's a userspace-only project.  However, I'm pretty sure the
>> shellutils maintainers will also react negatively if they are sent
>> patches which don't compile.  :-)
>>
>> A proof of concept of how this can be a win can be found here:
>>
>> http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/contrib/spd_readdir.c
>>
>> LD_PRELOAD aren't guaranteed to work on all programs, so this is much
>> more of a hack than something I'd recommend for extended production
>> use.  But it shows that if you have a readdir+stat workload, sorting
>> by inode makes a huge difference.
>>
>> As far as getting traces to better understand problems, I strongly
>> suggest that you try things like vmstat, iostat, and blktrace; system
>> call traces like strace aren't going to get you very far.  (See
>> http://brooker.co.za/blog/2013/07/14/io-performance.html for a nice
>> introduction to blktrace).  Use the scientific method; collect
>> baseline statistics using vmstat, iostat, sar, before you run your
>> test workload, so you know how much I/O is going on before you start
>> your test.  If you can run your test on a quiscient system, that's a
>> really good idea.  Then collect statistics as your run your workload,
>> and then only tweak one variable at a time, and record everything in a
>> systematic way.
>
> Another tool I didn't know about. Thank you very much.
>>
>> Finally, if you have more problems of a technical nature with respect
>> to the ext4, there is the ext3-users at redhat.com list, or the
>> developer's list at linux-ext4 at vger.kernel.org.  It would be nice if
>> you tried the ext3-users or the kernel-newbies or tried googling to
>> see if anyone else has come across the problem and figured out the
>> solution already, but if you can't figure things out any other way, do
>> feel free to ask the linux-ext4 list.  We won't bite.  :-)
>
> Thank you. I'll make sure to do my homework properly in future - and
> never never believe things senior members of my team tell me without
> verifying them first, at least not if I'm going to post about them :-(
>
>>
>> Cheers,
>>
>>                                               - Ted
>>
>> P.S.  If you have a large number of directories which are much larger
>> than you expect, and you don't want to do the "mkdir foo.new; mv foo/*
>> foo.new ; rmdir foo; mv foo.new foo" trick on a large number of
>> directories, you can also schedule downtime and while the file system
>> is unmounted, use "e2fsck -fD".  See the man page for more details.
>> It won't solve all of your problems, and it might not solve any of
>> your problem, but it will probably make the performance of large
>> directories somewhat better.
>
> Another hint of substantially more value than everything I posted
> about this topic.
>
> Thank you again.
>
> --
> Arlie
>
> (Arlie Stephens                                 arlie at worldash.org)
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

Thanks Ted,
For clearing this up for me seems the issue was not in ext4, and would
you mind ccing me in this conversation
as a learning read.
Regards and Thanks,
Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-07-31 23:41                             ` Henry Hallam
@ 2014-08-01  1:47                               ` Nick Krause
  0 siblings, 0 replies; 12+ messages in thread
From: Nick Krause @ 2014-08-01  1:47 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Jul 31, 2014 at 7:41 PM, Henry Hallam <henry@pericynthion.org> wrote:
> Try redirecting the ls output to /dev/null or a file, thus disabling
> its color highlighting and thus removing a bunch of syscalls.  See if
> it's now the same no matter what choice of 'time'.
>
> On Thu, Jul 31, 2014 at 4:36 PM, Arlie Stephens <arlie@worldash.org> wrote:
>> Hi Nick,
>>
>> [Context - directory ls taking 4-15 seconds; directory large, with
>> long filenames, but nowhere near as huge as Valdis' mail directory.]
>>
>> I've now discovered a really bizarre pattern, and I'm inclined to stop
>> blaming the file system until some clarity develops. If I ever get it
>> to the point where I can produce a high quality bug report - with or
>> without patch - I will do so - but what I have now is anything but
>> clear and high quality.
>>
>> On Jul 30 2014, Nick Krause wrote:
>>> On Wed, Jul 30, 2014 at 3:48 PM,  <Valdis.Kletnieks@vt.edu> wrote:
>>> > On Wed, 30 Jul 2014 10:38:13 -0700, Arlie Stephens said:
>>> >
>>> >> On the good side, Vladis' observations of his mail directory have been
>>> >> a great help.
>>> >
>>> > And remember, that's on a single laptop-class hard drive, no fancy raid or
>>> > anything. (Though it *is* a hybrid, with 32G of flash cache on the front end).
>>> >
>>> > You throw some *real* hardware at it, it of course would go even faster.
>>>
>>> Just send me the logs and anything else you think may help me.
>>> Please note cc the ext4 mailing list as this will also let the other
>>> ext4 developers and maintainers known about your problem.
>>> Cheers Nick
>>
>> I'm now in a state of complete bafflement.
>>
>> It turns out we have a whole collection of misbehaving directories,
>> making this testable without waiting for caches to clear.
>>
>> I have a couple of strace's of fast ls's, and a function ftrace that
>> captured about half of a 7 second ls. (The latter is huge, and
>> probably not suitable for posting.)
>>
>> I also have a really bizarre observation, the kind that makes you
>> wonder whether you are actually dreaming. It appears that the
>> misbehaviour is strongly influenced by the choice of "time" function.
>> The problem only occurs when using the shell built-in. /usr/bin/time
>> always produces a fast response.
>>
>> Stranger still - flat out impossible, I'd have said before seeing it -
>> a "fast" ls, run with /usr/bin/time can be followed *immediately*
>> by a slow "ls", run with bash' time. It's as if the first one doesn't
>> warm the cache, which is completely absurd - except I've been able to
>> make this happen 5 times in a row, first with strace and then
>> without.
>>
>> # with /usr/bin/time the ls is fast
>> $ time -p ls bad_dir
>> ...
>> real 0.21
>> user 0.00
>> sys 0.00
>>
>>
>> # with the builtin time, right *after* the strace run, the time can be
>> # horrible.
>> $ time -p ls bad_dir
>> ...
>> real 5.60
>> user 0.00
>> sys 0.17
>>
>> # run it again, and the directory is in cache as expected.
>> $ time -p ls bad_dir
>> ...
>> real 0.11
>> user 0.00
>> sys 0.02
>>
>>
>> This is not an artefact of one or other time reporting incorrectly -
>> I'm noticing a long pause before output occurs, but only on the middle
>> test of the three.
>>
>> I can't imagine any sane way for this to be happening, short of
>> coincidence or user error - and I've now seen this sequence 5 times in
>> a row, on 5 different directories created and populated by the same
>> app. (Three times with strace, twice without.)
>>
>>
>> --
>> Arlie
>>
>> (Arlie Stephens                                 arlie at worldash.org)
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

I agree with Hugo, seems right to send me the output in a file to read
to see if this actually is a bug with ext4.
Regards Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-07-31 23:36                           ` Arlie Stephens
@ 2014-07-31 23:41                             ` Henry Hallam
  2014-08-01  1:47                               ` Nick Krause
  0 siblings, 1 reply; 12+ messages in thread
From: Henry Hallam @ 2014-07-31 23:41 UTC (permalink / raw)
  To: kernelnewbies

Try redirecting the ls output to /dev/null or a file, thus disabling
its color highlighting and thus removing a bunch of syscalls.  See if
it's now the same no matter what choice of 'time'.

On Thu, Jul 31, 2014 at 4:36 PM, Arlie Stephens <arlie@worldash.org> wrote:
> Hi Nick,
>
> [Context - directory ls taking 4-15 seconds; directory large, with
> long filenames, but nowhere near as huge as Valdis' mail directory.]
>
> I've now discovered a really bizarre pattern, and I'm inclined to stop
> blaming the file system until some clarity develops. If I ever get it
> to the point where I can produce a high quality bug report - with or
> without patch - I will do so - but what I have now is anything but
> clear and high quality.
>
> On Jul 30 2014, Nick Krause wrote:
>> On Wed, Jul 30, 2014 at 3:48 PM,  <Valdis.Kletnieks@vt.edu> wrote:
>> > On Wed, 30 Jul 2014 10:38:13 -0700, Arlie Stephens said:
>> >
>> >> On the good side, Vladis' observations of his mail directory have been
>> >> a great help.
>> >
>> > And remember, that's on a single laptop-class hard drive, no fancy raid or
>> > anything. (Though it *is* a hybrid, with 32G of flash cache on the front end).
>> >
>> > You throw some *real* hardware at it, it of course would go even faster.
>>
>> Just send me the logs and anything else you think may help me.
>> Please note cc the ext4 mailing list as this will also let the other
>> ext4 developers and maintainers known about your problem.
>> Cheers Nick
>
> I'm now in a state of complete bafflement.
>
> It turns out we have a whole collection of misbehaving directories,
> making this testable without waiting for caches to clear.
>
> I have a couple of strace's of fast ls's, and a function ftrace that
> captured about half of a 7 second ls. (The latter is huge, and
> probably not suitable for posting.)
>
> I also have a really bizarre observation, the kind that makes you
> wonder whether you are actually dreaming. It appears that the
> misbehaviour is strongly influenced by the choice of "time" function.
> The problem only occurs when using the shell built-in. /usr/bin/time
> always produces a fast response.
>
> Stranger still - flat out impossible, I'd have said before seeing it -
> a "fast" ls, run with /usr/bin/time can be followed *immediately*
> by a slow "ls", run with bash' time. It's as if the first one doesn't
> warm the cache, which is completely absurd - except I've been able to
> make this happen 5 times in a row, first with strace and then
> without.
>
> # with /usr/bin/time the ls is fast
> $ time -p ls bad_dir
> ...
> real 0.21
> user 0.00
> sys 0.00
>
>
> # with the builtin time, right *after* the strace run, the time can be
> # horrible.
> $ time -p ls bad_dir
> ...
> real 5.60
> user 0.00
> sys 0.17
>
> # run it again, and the directory is in cache as expected.
> $ time -p ls bad_dir
> ...
> real 0.11
> user 0.00
> sys 0.02
>
>
> This is not an artefact of one or other time reporting incorrectly -
> I'm noticing a long pause before output occurs, but only on the middle
> test of the three.
>
> I can't imagine any sane way for this to be happening, short of
> coincidence or user error - and I've now seen this sequence 5 times in
> a row, on 5 different directories created and populated by the same
> app. (Three times with strace, twice without.)
>
>
> --
> Arlie
>
> (Arlie Stephens                                 arlie at worldash.org)
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-07-30 20:45                         ` Nick Krause
@ 2014-07-31 23:36                           ` Arlie Stephens
  2014-07-31 23:41                             ` Henry Hallam
  0 siblings, 1 reply; 12+ messages in thread
From: Arlie Stephens @ 2014-07-31 23:36 UTC (permalink / raw)
  To: kernelnewbies

Hi Nick,

[Context - directory ls taking 4-15 seconds; directory large, with
long filenames, but nowhere near as huge as Valdis' mail directory.]

I've now discovered a really bizarre pattern, and I'm inclined to stop
blaming the file system until some clarity develops. If I ever get it
to the point where I can produce a high quality bug report - with or
without patch - I will do so - but what I have now is anything but
clear and high quality. 

On Jul 30 2014, Nick Krause wrote:
> On Wed, Jul 30, 2014 at 3:48 PM,  <Valdis.Kletnieks@vt.edu> wrote:
> > On Wed, 30 Jul 2014 10:38:13 -0700, Arlie Stephens said:
> >
> >> On the good side, Vladis' observations of his mail directory have been
> >> a great help.
> >
> > And remember, that's on a single laptop-class hard drive, no fancy raid or
> > anything. (Though it *is* a hybrid, with 32G of flash cache on the front end).
> >
> > You throw some *real* hardware at it, it of course would go even faster.
> 
> Just send me the logs and anything else you think may help me.
> Please note cc the ext4 mailing list as this will also let the other
> ext4 developers and maintainers known about your problem.
> Cheers Nick

I'm now in a state of complete bafflement.  

It turns out we have a whole collection of misbehaving directories, 
making this testable without waiting for caches to clear. 

I have a couple of strace's of fast ls's, and a function ftrace that
captured about half of a 7 second ls. (The latter is huge, and
probably not suitable for posting.)

I also have a really bizarre observation, the kind that makes you
wonder whether you are actually dreaming. It appears that the
misbehaviour is strongly influenced by the choice of "time" function. 
The problem only occurs when using the shell built-in. /usr/bin/time 
always produces a fast response. 

Stranger still - flat out impossible, I'd have said before seeing it - 
a "fast" ls, run with /usr/bin/time can be followed *immediately* 
by a slow "ls", run with bash' time. It's as if the first one doesn't
warm the cache, which is completely absurd - except I've been able to
make this happen 5 times in a row, first with strace and then
without. 

# with /usr/bin/time the ls is fast
$ time -p ls bad_dir
...
real 0.21
user 0.00
sys 0.00


# with the builtin time, right *after* the strace run, the time can be 
# horrible. 
$ time -p ls bad_dir
...
real 5.60
user 0.00
sys 0.17

# run it again, and the directory is in cache as expected.
$ time -p ls bad_dir
...
real 0.11
user 0.00
sys 0.02


This is not an artefact of one or other time reporting incorrectly -
I'm noticing a long pause before output occurs, but only on the middle
test of the three. 

I can't imagine any sane way for this to be happening, short of
coincidence or user error - and I've now seen this sequence 5 times in
a row, on 5 different directories created and populated by the same
app. (Three times with strace, twice without.) 


-- 
Arlie

(Arlie Stephens					arlie at worldash.org)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-07-30 19:48                       ` Valdis.Kletnieks at vt.edu
@ 2014-07-30 20:45                         ` Nick Krause
  2014-07-31 23:36                           ` Arlie Stephens
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Krause @ 2014-07-30 20:45 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Jul 30, 2014 at 3:48 PM,  <Valdis.Kletnieks@vt.edu> wrote:
> On Wed, 30 Jul 2014 10:38:13 -0700, Arlie Stephens said:
>
>> On the good side, Vladis' observations of his mail directory have been
>> a great help.
>
> And remember, that's on a single laptop-class hard drive, no fancy raid or
> anything. (Though it *is* a hybrid, with 32G of flash cache on the front end).
>
> You throw some *real* hardware at it, it of course would go even faster.

Just send me the logs and anything else you think may help me.
Please note cc the ext4 mailing list as this will also let the other
ext4 developers and maintainers known about your problem.
Cheers Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-07-30 17:38                     ` Arlie Stephens
@ 2014-07-30 19:48                       ` Valdis.Kletnieks at vt.edu
  2014-07-30 20:45                         ` Nick Krause
  0 siblings, 1 reply; 12+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2014-07-30 19:48 UTC (permalink / raw)
  To: kernelnewbies

On Wed, 30 Jul 2014 10:38:13 -0700, Arlie Stephens said:

> On the good side, Vladis' observations of his mail directory have been
> a great help.

And remember, that's on a single laptop-class hard drive, no fancy raid or
anything. (Though it *is* a hybrid, with 32G of flash cache on the front end).

You throw some *real* hardware at it, it of course would go even faster.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 848 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140730/38dc3fdf/attachment.bin 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-07-30  2:34                   ` Nick Krause
@ 2014-07-30 17:38                     ` Arlie Stephens
  2014-07-30 19:48                       ` Valdis.Kletnieks at vt.edu
  0 siblings, 1 reply; 12+ messages in thread
From: Arlie Stephens @ 2014-07-30 17:38 UTC (permalink / raw)
  To: kernelnewbies

Hi Nick,

On Jul 29 2014, Nick Krause wrote:
> >> I was doing a vanilla ls. So was the original reporter, unless he has
> >> some really strange aliases.
> >>
> >>
> >> I'm afraid I'll be rather unpopular if I drop the caches on the system
> >> in question, creating a burst of poor performance, so my best bet is
> >> probably to see what I can do with ftrace on Monday, or perhaps
> >> partway through the weekend.
> >>
> >> There is normally a fair amount of disk activity going on - much of it
> >> writes. So I can expect cached blocks to age out in a reasonable time.
> >>
> > Arlie,
> > Whenever you get around to it is fine.
> > Just send me a log.
> > Cheers Nick
> 
> Arlie,
> just a friendly reminder can you try to send me the log this week.
> Regards Nick

I was just going to post an apology for going dark on you. I made one
attempt to capture the data yesterday, and messed up - no useful data
saved. And then half the world invaded my workspace with higher
priority tasks ;-)  

I'm going to make another attempt at it this morning.

On the good side, Vladis' observations of his mail directory have been
a great help. Now I know that simply being a large ext4 directory is
not the problem ;-)  I.e. ext4 really isn't as brain damaged as I
feared. (We had someone here who was initially sure that was it, and
he has more experience in linux server space than I do, so I took his
initial opinion at face value.) 

More soon, I hope.

-- 
Arlie

(Arlie Stephens					arlie at worldash.org)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-07-26  1:22                 ` Nick Krause
@ 2014-07-30  2:34                   ` Nick Krause
  2014-07-30 17:38                     ` Arlie Stephens
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Krause @ 2014-07-30  2:34 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Jul 25, 2014 at 9:22 PM, Nick Krause <xerofoify@gmail.com> wrote:
> On Fri, Jul 25, 2014 at 9:08 PM, Arlie Stephens <arlie@worldash.org> wrote:
>> On Jul 25 2014, Valdis.Kletnieks at vt.edu wrote:
>>> On Fri, 25 Jul 2014 15:23:42 -0700, Arlie Stephens said:
>>>
>>> > If you want an annoying problem, explain and/or fix directory
>>> > performance on ext4. I've got a server where an ls of a directory took
>>> > 5 seconds, according to "time", even though it only has 295 entries at
>>> > present.
>>>
>>> I don't suppose you could get a trace of where that ls is spending its
>>> time with the kernel's trace facilities, or even just getting a stack trace
>>> of where that ls is in the kernel?
>>
>> These are all very good questions.
>>
>> To my amazement, I found that no one had yet fixed the problem by
>> deleting and recreating the directory, and I do have sudo access.
>> This time it was only 4 seconds...
>>      real 0m3.992s
>>      user 0m0.005s
>>      sys  0m0.052s
>>
>>> I'll go out on a limb and ask if a *second* ls of the same directory runs
>>> quickly because it's now cache-hot.  If so, I'd start looking at whether
>>> there's large amounts of *other* disk activity going on, and the reads of the
>>> directory are getting hung in the I/O queue behind other disk
>>> read/writes.
>>
>> Sure enough, the cache saved me on a second read -
>>      real 0m0.010s
>>      user 0m0.000s
>>      sys  0m0.010s
>>
>>> Also, are you doing an 'ls' (which just requires reading the name/inode#
>>> pairs), or an 'ls -l' whihc in addition requires a stat() call to read in the
>>> inode itself)?  That makes a lot of difference.  Cache-cold on my laptop, and a
>>> *huge* Mail/linux-kernel directory (yes, it really *is* an 11M directory,
>>> it's got a half-million entries in it):
>>
>> I was doing a vanilla ls. So was the original reporter, unless he has
>> some really strange aliases.
>>
>>
>> I'm afraid I'll be rather unpopular if I drop the caches on the system
>> in question, creating a burst of poor performance, so my best bet is
>> probably to see what I can do with ftrace on Monday, or perhaps
>> partway through the weekend.
>>
>> There is normally a fair amount of disk activity going on - much of it
>> writes. So I can expect cached blocks to age out in a reasonable time.
>>
>>
>>> [~] echo 3 >| /proc/sys/vm/drop_caches
>>> [~] cd Mail
>>> [~/Mail] time ls linux-kernel/ | wc -l
>>> 478401
>>>
>>> real    0m2.387s
>>> user    0m0.500s
>>> sys     0m0.433s
>>> [~/Mail] ls -ld linux-kernel/
>>> drwxr-xr-x. 2 valdis valdis 11005952 Jul 25 19:30 linux-kernel/
>>
>> Compared to your directory, mine is microscopic
>>
>> $ ls -ld xxxx
>> drwxr-xr-x 2 yyy yyy 36864 Jul 25 12:19 xxxx
>>
>>
>>> [~/Mail] time ls -l linux-kernel/ | wc -l
>>> 478402
>>>
>>> real    0m32.915s
>>> user    0m2.483s
>>> sys     0m20.787s
>>
>> --
>> Arlie
>>
>> (Arlie Stephens                                 arlie at worldash.org)
>
>
> Arlie,
> Whenever you get around to it is fine.
> Just send me a log.
> Cheers Nick

Arlie,
just a friendly reminder can you try to send me the log this week.
Regards Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-07-26  1:08               ` Work (really slow directory access on ext4) Arlie Stephens
@ 2014-07-26  1:22                 ` Nick Krause
  2014-07-30  2:34                   ` Nick Krause
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Krause @ 2014-07-26  1:22 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Jul 25, 2014 at 9:08 PM, Arlie Stephens <arlie@worldash.org> wrote:
> On Jul 25 2014, Valdis.Kletnieks at vt.edu wrote:
>> On Fri, 25 Jul 2014 15:23:42 -0700, Arlie Stephens said:
>>
>> > If you want an annoying problem, explain and/or fix directory
>> > performance on ext4. I've got a server where an ls of a directory took
>> > 5 seconds, according to "time", even though it only has 295 entries at
>> > present.
>>
>> I don't suppose you could get a trace of where that ls is spending its
>> time with the kernel's trace facilities, or even just getting a stack trace
>> of where that ls is in the kernel?
>
> These are all very good questions.
>
> To my amazement, I found that no one had yet fixed the problem by
> deleting and recreating the directory, and I do have sudo access.
> This time it was only 4 seconds...
>      real 0m3.992s
>      user 0m0.005s
>      sys  0m0.052s
>
>> I'll go out on a limb and ask if a *second* ls of the same directory runs
>> quickly because it's now cache-hot.  If so, I'd start looking at whether
>> there's large amounts of *other* disk activity going on, and the reads of the
>> directory are getting hung in the I/O queue behind other disk
>> read/writes.
>
> Sure enough, the cache saved me on a second read -
>      real 0m0.010s
>      user 0m0.000s
>      sys  0m0.010s
>
>> Also, are you doing an 'ls' (which just requires reading the name/inode#
>> pairs), or an 'ls -l' whihc in addition requires a stat() call to read in the
>> inode itself)?  That makes a lot of difference.  Cache-cold on my laptop, and a
>> *huge* Mail/linux-kernel directory (yes, it really *is* an 11M directory,
>> it's got a half-million entries in it):
>
> I was doing a vanilla ls. So was the original reporter, unless he has
> some really strange aliases.
>
>
> I'm afraid I'll be rather unpopular if I drop the caches on the system
> in question, creating a burst of poor performance, so my best bet is
> probably to see what I can do with ftrace on Monday, or perhaps
> partway through the weekend.
>
> There is normally a fair amount of disk activity going on - much of it
> writes. So I can expect cached blocks to age out in a reasonable time.
>
>
>> [~] echo 3 >| /proc/sys/vm/drop_caches
>> [~] cd Mail
>> [~/Mail] time ls linux-kernel/ | wc -l
>> 478401
>>
>> real    0m2.387s
>> user    0m0.500s
>> sys     0m0.433s
>> [~/Mail] ls -ld linux-kernel/
>> drwxr-xr-x. 2 valdis valdis 11005952 Jul 25 19:30 linux-kernel/
>
> Compared to your directory, mine is microscopic
>
> $ ls -ld xxxx
> drwxr-xr-x 2 yyy yyy 36864 Jul 25 12:19 xxxx
>
>
>> [~/Mail] time ls -l linux-kernel/ | wc -l
>> 478402
>>
>> real    0m32.915s
>> user    0m2.483s
>> sys     0m20.787s
>
> --
> Arlie
>
> (Arlie Stephens                                 arlie at worldash.org)


Arlie,
Whenever you get around to it is fine.
Just send me a log.
Cheers Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Work (really slow directory access on ext4)
  2014-07-25 23:35             ` Work Valdis.Kletnieks at vt.edu
@ 2014-07-26  1:08               ` Arlie Stephens
  2014-07-26  1:22                 ` Nick Krause
  0 siblings, 1 reply; 12+ messages in thread
From: Arlie Stephens @ 2014-07-26  1:08 UTC (permalink / raw)
  To: kernelnewbies

On Jul 25 2014, Valdis.Kletnieks at vt.edu wrote:
> On Fri, 25 Jul 2014 15:23:42 -0700, Arlie Stephens said:
> 
> > If you want an annoying problem, explain and/or fix directory
> > performance on ext4. I've got a server where an ls of a directory took
> > 5 seconds, according to "time", even though it only has 295 entries at
> > present.
> 
> I don't suppose you could get a trace of where that ls is spending its
> time with the kernel's trace facilities, or even just getting a stack trace
> of where that ls is in the kernel?

These are all very good questions. 

To my amazement, I found that no one had yet fixed the problem by
deleting and recreating the directory, and I do have sudo access. 
This time it was only 4 seconds...
     real 0m3.992s
     user 0m0.005s
     sys  0m0.052s

> I'll go out on a limb and ask if a *second* ls of the same directory runs
> quickly because it's now cache-hot.  If so, I'd start looking at whether
> there's large amounts of *other* disk activity going on, and the reads of the
> directory are getting hung in the I/O queue behind other disk
> read/writes.

Sure enough, the cache saved me on a second read - 
     real 0m0.010s
     user 0m0.000s
     sys  0m0.010s

> Also, are you doing an 'ls' (which just requires reading the name/inode#
> pairs), or an 'ls -l' whihc in addition requires a stat() call to read in the
> inode itself)?  That makes a lot of difference.  Cache-cold on my laptop, and a
> *huge* Mail/linux-kernel directory (yes, it really *is* an 11M directory,
> it's got a half-million entries in it):

I was doing a vanilla ls. So was the original reporter, unless he has
some really strange aliases.


I'm afraid I'll be rather unpopular if I drop the caches on the system
in question, creating a burst of poor performance, so my best bet is
probably to see what I can do with ftrace on Monday, or perhaps
partway through the weekend.  

There is normally a fair amount of disk activity going on - much of it
writes. So I can expect cached blocks to age out in a reasonable time. 


> [~] echo 3 >| /proc/sys/vm/drop_caches
> [~] cd Mail
> [~/Mail] time ls linux-kernel/ | wc -l
> 478401
> 
> real    0m2.387s
> user    0m0.500s
> sys     0m0.433s
> [~/Mail] ls -ld linux-kernel/
> drwxr-xr-x. 2 valdis valdis 11005952 Jul 25 19:30 linux-kernel/

Compared to your directory, mine is microscopic

$ ls -ld xxxx
drwxr-xr-x 2 yyy yyy 36864 Jul 25 12:19 xxxx


> [~/Mail] time ls -l linux-kernel/ | wc -l
> 478402
> 
> real    0m32.915s
> user    0m2.483s
> sys     0m20.787s

-- 
Arlie

(Arlie Stephens					arlie at worldash.org)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-08-06 19:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-06 14:49 Work (really slow directory access on ext4) Theodore Ts'o
2014-08-06 18:26 ` Arlie Stephens
2014-08-06 19:29   ` Nick Krause
  -- strict thread matches above, loose matches on Subject: below --
2014-07-24 16:38 Work Nick Krause
2014-07-24 16:51 ` Work Andev
2014-07-24 17:10   ` Work Nick Krause
2014-07-25  2:23     ` Work Nick Krause
2014-07-25 17:42       ` Work Valdis.Kletnieks at vt.edu
2014-07-25 21:54         ` Work Nick Krause
2014-07-25 22:23           ` Work Arlie Stephens
2014-07-25 23:35             ` Work Valdis.Kletnieks at vt.edu
2014-07-26  1:08               ` Work (really slow directory access on ext4) Arlie Stephens
2014-07-26  1:22                 ` Nick Krause
2014-07-30  2:34                   ` Nick Krause
2014-07-30 17:38                     ` Arlie Stephens
2014-07-30 19:48                       ` Valdis.Kletnieks at vt.edu
2014-07-30 20:45                         ` Nick Krause
2014-07-31 23:36                           ` Arlie Stephens
2014-07-31 23:41                             ` Henry Hallam
2014-08-01  1:47                               ` Nick Krause

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.