does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput?

All of lore.kernel.org
 help / color / mirror / Atom feed

* does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput?
@ 2013-08-24  2:33 Linda Walsh
  2013-08-24  4:07 ` Stan Hoeppner
  0 siblings, 1 reply; 5+ messages in thread
From: Linda Walsh @ 2013-08-24  2:33 UTC (permalink / raw)
  To: xfs-oss

I just untar'ed several(8) copies of of a 3.4GB test dir into separate
test directories,

The untarring is in background, but I wait for each "tar" to return,
so I can list out the times

I untar the same image 8 times in 8 different dirs.

The initial tars go very quickly, consuming 100% cpu
as the first tars basically extract to buffer memory,
but after the 4th, times begin increasing
almost geometrically:

1...5.03sec 0.12usr 4.90sys (99.84% cpu)
2...5.17sec 0.12usr 5.03sys (99.71% cpu)
3...5.36sec 0.13usr 5.21sys (99.68% cpu)
4...5.35sec 0.15usr 5.17sys (99.66% cpu)
5...7.36sec 0.12usr 5.69sys (79.14% cpu)
6...27.81sec 0.16usr 6.76sys (24.93% cpu)
7...85.54sec 0.21usr 7.33sys (8.83% cpu)
8...101.64sec 0.25usr 7.88sys (8.00% cpu)
2nd run:
1...5.23sec 0.12usr 5.10sys (99.73% cpu)
2...5.25sec 0.15usr 5.08sys (99.71% cpu)
3...6.08sec 0.13usr 5.09sys (85.86% cpu)
4...5.31sec 0.14usr 5.15sys (99.71% cpu)
5...14.02sec 0.18usr 6.28sys (46.11% cpu)
6...23.32sec 0.17usr 6.47sys (28.50% cpu)
7...31.14sec 0.21usr 6.74sys (22.32% cpu)
8...82.36sec 0.22usr 7.23sys (9.05% cpu)

Now -- and for 3-4 minutes after this point, I see
7 kworker processes -- 6 of them on the even cpu's (2-node numa)
and 1 on an odd cpu.  The 7 consume about 12-18% cpu each
and the odd one is "matched" by a "flush-254:odd" process that
has the same cpu as the odd kworker.

I added a "time sync" after the script finished (to show,
approximately, how long disk activity continues)
222.95sec 0.00usr 0.23sys (0.10% cpu)

So what are all the kworkers doing and does having 6 of them
do things at the same time really help disk-throughput?

Seems like they would conflict w/each other, cause
disk contention, and extra fragmentation as they
do things?  If they were all writing to separate
disks, that would make sense, but do that many kworker
threads need to be finishing out disk I/O on 1 disk?

FWIW, I remove and create a skeleton dir-struct before I
untar to the dirs.
The "rm -fr" on those dirs happens in parallel with a "wait" at
the end to wait for all of them.  That takes:
1.82sec 0.07usr 12.91sys (711.34% cpu)

Creating the dirnames + empty filenames) takes:
(for ~1-2 dirs, and 5000+ filenames)

6.85sec 0.38usr 11.68sys (176.03% cpu)

So is it efficient to use that many writers on 1 disk?

Note the disks' max-write speed in writing a large
contiguous multi gig file, is about 1GB/s and is mounted
(showing mount output):

 /dev/mapper/HnS-Home on /home type xfs \
  (rw,nodiratime,relatime,swalloc,attr2,largeio,inode64,allocsize=128k,\
  logbsize=256k,sunit=128,swidth=1536,noquota)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput?
  2013-08-24  2:33 does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput? Linda Walsh
@ 2013-08-24  4:07 ` Stan Hoeppner
  2013-08-24 17:18   ` Linda Walsh
  0 siblings, 1 reply; 5+ messages in thread
From: Stan Hoeppner @ 2013-08-24  4:07 UTC (permalink / raw)
  To: Linda Walsh; +Cc: xfs-oss

On 8/23/2013 9:33 PM, Linda Walsh wrote:

> So what are all the kworkers doing and does having 6 of them
> do things at the same time really help disk-throughput?
> 
> Seems like they would conflict w/each other, cause
> disk contention, and extra fragmentation as they
> do things?  If they were all writing to separate
> disks, that would make sense, but do that many kworker
> threads need to be finishing out disk I/O on 1 disk?

https://raw.github.com/torvalds/linux/master/Documentation/workqueue.txt

-- 
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput?
  2013-08-24  4:07 ` Stan Hoeppner
@ 2013-08-24 17:18   ` Linda Walsh
  2013-08-24 19:09     ` Stan Hoeppner
  0 siblings, 1 reply; 5+ messages in thread
From: Linda Walsh @ 2013-08-24 17:18 UTC (permalink / raw)
  To: stan; +Cc: xfs-oss

Stan Hoeppner wrote:
> On 8/23/2013 9:33 PM, Linda Walsh wrote:
> 
>> So what are all the kworkers doing and does having 6 of them do
>> things at the same time really help disk-throughput?
>>
>> Seems like they would conflict w/each other, cause disk
>> contention, and extra fragmentation as they do things?  If they
>> were all writing to separate disks, that would make sense, but do
>> that many kworker threads need to be finishing out disk I/O on
>> 1 disk?
> 
> https://raw.github.com/torvalds/linux/master/Documentation/workqueue.txt
----

Thanks for the pointer.

I see ways to limit #workers/cpu if they were hogging too much cpu,
which isn't the problem..  My concern is that the work they are
doing is all writing info back to the same physical disk -- and that
while >1 writer can improve throughput, generally, it would be best
if the pending I/O was sorted in disk order and written out using
the elevator algorithm.  I.e. I can't imagine that it takes 6-8
processes (mostly limiting themselves to 1 NUMA node) to keep the
elevator filled?

Shouldn't there be an additional way to limit the concurrency of
kworkers assigned to a single device -- esp.  if the blocking factor
on each of them is the device?  Together, they aren't using more
than, maybe, 2 core's worth of cpu.  Rough estimates on my part show
that for this partition, being RAID based and how it is setup,
2 writers can definitely be beneficial, 3-4 often, 5-6, starts to
cause more thrashing (disk seeking trying to keep up), and 7-8...
well that just gets worse, usually.  The fact that it takes as long
or longer to write out the data than it does for the program to
execute makes me think that it isn't being done very efficiently.

Already, BTW, I changed this "test setup" script (it's a setup
script for another test) from untarring the the 8 copies in parallel
to 1 untar at a time.  It was considerably slower

I can try some of the knobs on the wq but the only knob I see is
limiting # workers / cpu -- and since I'm only seeing 1 worker/cpu,
I don't # see how that would help.  It's the /device workers that
need to be # limited.

Wasn't it the case that at some point in the past xfs had "per
device kernel-threads" to help with disk writing, before the advent
of kworkers?  In the case of writing to devices, it seems the
file-system driver controlling the number of concurrent workers
makes more sense -- and even that, either needs to have the smart to
know how many extra workers a "disk" can handle (i.e. if it is a 12
spindle RAID, it can handle alot more concurrency than a 3 - spindle
RAID-0 composed of 4 RAID-5's each.  (I haven't forgotten about your
recommendations, to go all RAID10, but have to wait on budget
allocations ;-)).

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput?
  2013-08-24 17:18   ` Linda Walsh
@ 2013-08-24 19:09     ` Stan Hoeppner
  2013-08-24 23:22       ` Linda Walsh
  0 siblings, 1 reply; 5+ messages in thread
From: Stan Hoeppner @ 2013-08-24 19:09 UTC (permalink / raw)
  To: Linda Walsh; +Cc: xfs-oss

On 8/24/2013 12:18 PM, Linda Walsh wrote:
> 
> 
> Stan Hoeppner wrote:
>> On 8/23/2013 9:33 PM, Linda Walsh wrote:
>>
>>> So what are all the kworkers doing and does having 6 of them do
>>> things at the same time really help disk-throughput?
>>>
>>> Seems like they would conflict w/each other, cause disk
>>> contention, and extra fragmentation as they do things?  If they
>>> were all writing to separate disks, that would make sense, but do
>>> that many kworker threads need to be finishing out disk I/O on
>>> 1 disk?
>>
>> https://raw.github.com/torvalds/linux/master/Documentation/workqueue.txt
> ----
> 
> Thanks for the pointer.
> 
> I see ways to limit #workers/cpu if they were hogging too much cpu,
> which isn't the problem..  My concern is that the work they are
> doing is all writing info back to the same physical disk -- and that
> while >1 writer can improve throughput, generally, it would be best
> if the pending I/O was sorted in disk order and written out using
> the elevator algorithm.  I.e. I can't imagine that it takes 6-8
> processes (mostly limiting themselves to 1 NUMA node) to keep the
> elevator filled?

You're making a number of incorrect assumptions here.  The work queues
are generic, which is clearly spelled out in the document above.  The
kworker threads are just that, kernel threads, not processes as you
assume above.  XFS is not the only subsystem that uses them.  Any
subsystem or driver can use work queues.  You can't tell what's
executing within a kworker thread from top or ps output.  You must look
at the stack trace.

The work you are seeing in those 7 or 8 kworker threads is not all
parallel XFS work.  Your block device driver, whether libata, SCSI, or
proprietary RAID card driver, is placing work in these queues as well.
The work queues are not limited to filesystems and block device drivers.
 Any device driver or kernel subsystem can use work queues.

Nothing bypasses the elevator; sectors are still sorted.  But keep in
mind if you're using a hardware RAID controller -it- does the final
sorting of writeback anyway, so this is a non issue.  I can't recall if
you use md/RAID or an LSI RAID controller.  ISTR you stating LSI
sometime back, but my memory may be faulty here.

So in a nutshell, whatever performance issue you're having, if you
indeed have an issue, isn't caused by work queues or the number of
kworker threads on your system, per CPU, or otherwise.  You need to look
elsewhere for the bottleneck.  Given it's lightning fast up to the point
buffers start flushing to disk it's pretty clear your spindles simply
can't keep up.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput?
  2013-08-24 19:09     ` Stan Hoeppner
@ 2013-08-24 23:22       ` Linda Walsh
  0 siblings, 0 replies; 5+ messages in thread
From: Linda Walsh @ 2013-08-24 23:22 UTC (permalink / raw)
  To: stan; +Cc: xfs-oss



Stan Hoeppner wrote:
> On 8/24/2013 12:18 PM, Linda Walsh wrote:
>>
>> Stan Hoeppner wrote:
>>> On 8/23/2013 9:33 PM, Linda Walsh wrote:
>>>
>>>> So what are all the kworkers doing and does having 6 of them do
>>>> things at the same time really help disk-throughput?
>>>>
>>>> Seems like they would conflict w/each other, cause disk
>>>> contention, and extra fragmentation as they do things?  If they
>>>> were all writing to separate disks, that would make sense, but do
>>>> that many kworker threads need to be finishing out disk I/O on
>>>> 1 disk?
>>> https://raw.github.com/torvalds/linux/master/Documentation/workqueue.txt
>> ----
>>
>> Thanks for the pointer.
>>
>> I see ways to limit #workers/cpu if they were hogging too much cpu,
>> which isn't the problem..  My concern is that the work they are
>> doing is all writing info back to the same physical disk -- and that
>> while >1 writer can improve throughput, generally, it would be best
>> if the pending I/O was sorted in disk order and written out using
>> the elevator algorithm.  I.e. I can't imagine that it takes 6-8
>> processes (mostly limiting themselves to 1 NUMA node) to keep the
>> elevator filled?
> 
> You're making a number of incorrect assumptions here.  The work queues
> are generic, which is clearly spelled out in the document above.  The
> kworker threads are just that, kernel threads, not processes as you
> assume above.
----
	Sorry, terminology.  Linux threads are implemented as processes with
minor differences -- they are threads, though as the kernel see them.

>  XFS is not the only subsystem that uses them.  Any
> subsystem or driver can use work queues.  You can't tell what's
> executing within a kworker thread from top or ps output.  You must look
> at the stack trace.
> 
> The work you are seeing in those 7 or 8 kworker threads is not all
> parallel XFS work.  Your block device driver, whether libata, SCSI, or
> proprietary RAID card driver, is placing work in these queues as well.
---
Hmmm.... I hadn't thought of the driver doing that... I sort thought
it just took blocks as fed by the kernel and when it was done with
a DMA, then it told the kernel it was done and was ready for another.

I thought such drivers did direct IO at that point -- i.e. they are below
the elevator algorithm?


> The work queues are not limited to filesystems and block device drivers.
>  Any device driver or kernel subsystem can use work queues.
---
	True, but I when I see a specific number come up and work
constantly when I unpack a tar, I would see it as related to that
command.   What other things would use that much cpu?

> 
> Nothing bypasses the elevator; sectors are still sorted.  But keep in
> mind if you're using a hardware RAID controller -it- does the final
> sorting of writeback anyway, so this is a non issue.
LSI raid

> 
> So in a nutshell, whatever performance issue you're having, if you
> indeed have an issue, isn't caused by work queues or the number of
> kworker threads on your system, per CPU, or otherwise.

Um... but it could be made worse by having an excessive number of
threads all contending for a limited resource.   The more contenders
for a limited resource, the more the scheduler has to sort out who
gets access to the resource next.

If you have 6 threads dumping sectors to different areas of the
disk that need seeks between each thread's output becoming complete,
then you have a seek penalty with each thread switch -- vs. if
they were coalesced and sorted into 1 queue, 1 worker could do
the work of the 6 without the extra seeks between the different
kworkers emptying their queues.


> You need to look
> elsewhere for the bottleneck.  Given it's lightning fast up to the point
> buffers start flushing to disk it's pretty clear your spindles simply
> can't keep up.
----
	That's not the point (though it is a given).  What I'm focusing on
is how the kernel handles a backlog.

	If I want throughput, I use 1 writer -- to an unfragmented file that
won't require seeks.  If I try to use 2 writers -- each to unfrag'd files
and run them at the same time, It's almost certain that that the throughput will
drop == since the disk will have to seek back and forth between the two files
to give "disk-write-resources" to each writer.

	It would be faster if I did both files sequentially rather than trying to
do them in parallel, The disk is limited to ~1GB/s, -- every seek that needs to
be done to get files out reduces that.  So tar splats 5000 files into memory.
Then it takes time for those to be written.   If I write 5000 files sequentially
with 1 writer, I will get faster performance than if I use 25 threads each
dumping 50 files in parallel.  The disk subsystem's responsiveness drops
due to all the seeks between writes, whereas if it was 1 big sorted write --
it could be written out in 1-2 elevator passes... I don't think it is being
that efficient.  Thus my Q about whether or not it was really the optimal way
to improve throughput to have "too many writers" accessing a resource at the
same time.

	I'm not saying there is a "problem" per se, I'm just asking/wondering
how so many writers won't have the disk seeking all over the place to round-robin
service their requests.

	FWIW, the disk could probably handle 2-3 writers and show improvement
over a single -- but anything over that, and I have started to see an overall
drop in throughput.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-08-24 23:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-24  2:33 does having ~Ncore+1? kworkers flushing XFS to 1 disk improve throughput? Linda Walsh
2013-08-24  4:07 ` Stan Hoeppner
2013-08-24 17:18   ` Linda Walsh
2013-08-24 19:09     ` Stan Hoeppner
2013-08-24 23:22       ` Linda Walsh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.