All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: file streams allocator behavior
@ 2014-10-25 18:56 Richard Scobie
  2014-10-25 21:26 ` Stan Hoeppner
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Scobie @ 2014-10-25 18:56 UTC (permalink / raw)
  To: xfs

Stan Hoeppner said:

 > How can I disable or change the filestreams behavior so all files go
 > into the one AG for the single directory test?

Hi Stan,

Instead of mounting with -o filestreams, would using the chattr flag 
instead help?

See 
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch06s16.html

Regards,

Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: file streams allocator behavior
  2014-10-25 18:56 file streams allocator behavior Richard Scobie
@ 2014-10-25 21:26 ` Stan Hoeppner
  2014-10-26 14:26   ` Brian Foster
  0 siblings, 1 reply; 8+ messages in thread
From: Stan Hoeppner @ 2014-10-25 21:26 UTC (permalink / raw)
  To: Richard Scobie, xfs

On 10/25/2014 01:56 PM, Richard Scobie wrote:
> Stan Hoeppner said:
> 
>> How can I disable or change the filestreams behavior so all files go
>> into the one AG for the single directory test?
> 
> Hi Stan,
> 
> Instead of mounting with -o filestreams, would using the chattr flag
> instead help?
> 
> See
> http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch06s16.html

That won't help.  That turns it on (if it's not enabled by default these
days).  I need to turn off the behavior I'm seeing, whether it's due to
the filestreams allocator or default inode64.  Then again it may not be
possible to turn it off...

Anyone have other ideas on how to accomplish my goal?  Parallel writes
to a single AG on the outer platter edge vs the same to all AGs across
the entire platter?  I'm simply trying to demonstrate the differences in
aggregate bandwidth due to the extra seek latency of all AGs case.

Thanks,
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: file streams allocator behavior
  2014-10-25 21:26 ` Stan Hoeppner
@ 2014-10-26 14:26   ` Brian Foster
  2014-10-26 17:26     ` Stan Hoeppner
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2014-10-26 14:26 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Richard Scobie, xfs

On Sat, Oct 25, 2014 at 04:26:54PM -0500, Stan Hoeppner wrote:
> On 10/25/2014 01:56 PM, Richard Scobie wrote:
> > Stan Hoeppner said:
> > 
> >> How can I disable or change the filestreams behavior so all files go
> >> into the one AG for the single directory test?
> > 
> > Hi Stan,
> > 
> > Instead of mounting with -o filestreams, would using the chattr flag
> > instead help?
> > 
> > See
> > http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch06s16.html
> 
> That won't help.  That turns it on (if it's not enabled by default these
> days).  I need to turn off the behavior I'm seeing, whether it's due to
> the filestreams allocator or default inode64.  Then again it may not be
> possible to turn it off...
> 
> Anyone have other ideas on how to accomplish my goal?  Parallel writes
> to a single AG on the outer platter edge vs the same to all AGs across
> the entire platter?  I'm simply trying to demonstrate the differences in
> aggregate bandwidth due to the extra seek latency of all AGs case.
> 

What about just preallocating the files? Obviously this removes block
allocation contention from your experiment, but it's not clear if that's
relevant to your test. If I create a smaller, but analogous fs to yours,
I seem to get this behavior from just doing an fallocate of each file in
advance.

E.g., Create directory 0, fallocate 44 files all of which end up in AG
0. Create directory 1, fallocate 44 files which end up in AG 1, etc.
>From there you can do direct I/O overwrites to 44 files across each AG
or 44 files in any single AG.

Brian

> Thanks,
> Stan
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: file streams allocator behavior
  2014-10-26 14:26   ` Brian Foster
@ 2014-10-26 17:26     ` Stan Hoeppner
  2014-10-26 22:18       ` Brian Foster
  0 siblings, 1 reply; 8+ messages in thread
From: Stan Hoeppner @ 2014-10-26 17:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: Richard Scobie, xfs



On 10/26/2014 09:26 AM, Brian Foster wrote:
> On Sat, Oct 25, 2014 at 04:26:54PM -0500, Stan Hoeppner wrote:
>> On 10/25/2014 01:56 PM, Richard Scobie wrote:
>>> Stan Hoeppner said:
>>>
>>>> How can I disable or change the filestreams behavior so all files go
>>>> into the one AG for the single directory test?
>>>
>>> Hi Stan,
>>>
>>> Instead of mounting with -o filestreams, would using the chattr flag
>>> instead help?
>>>
>>> See
>>> http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch06s16.html
>>
>> That won't help.  That turns it on (if it's not enabled by default these
>> days).  I need to turn off the behavior I'm seeing, whether it's due to
>> the filestreams allocator or default inode64.  Then again it may not be
>> possible to turn it off...
>>
>> Anyone have other ideas on how to accomplish my goal?  Parallel writes
>> to a single AG on the outer platter edge vs the same to all AGs across
>> the entire platter?  I'm simply trying to demonstrate the differences in
>> aggregate bandwidth due to the extra seek latency of all AGs case.
>>
> 
> What about just preallocating the files? Obviously this removes block
> allocation contention from your experiment, but it's not clear if that's
> relevant to your test. If I create a smaller, but analogous fs to yours,
> I seem to get this behavior from just doing an fallocate of each file in
> advance.
> 
> E.g., Create directory 0, fallocate 44 files all of which end up in AG
> 0. Create directory 1, fallocate 44 files which end up in AG 1, etc.
> From there you can do direct I/O overwrites to 44 files across each AG
> or 44 files in any single AG.

I figured preallocating would get me what I want but I've never used
fallocate, nor dd into fallocated files.  Is there anything special
required here with dd, or can I simply specify the filename to dd, and
make sure bs + count doesn't go beyond EOF?

Thanks,
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: file streams allocator behavior
  2014-10-26 17:26     ` Stan Hoeppner
@ 2014-10-26 22:18       ` Brian Foster
  0 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2014-10-26 22:18 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Richard Scobie, xfs

On Sun, Oct 26, 2014 at 12:26:57PM -0500, Stan Hoeppner wrote:
> 
> 
> On 10/26/2014 09:26 AM, Brian Foster wrote:
> > On Sat, Oct 25, 2014 at 04:26:54PM -0500, Stan Hoeppner wrote:
> >> On 10/25/2014 01:56 PM, Richard Scobie wrote:
> >>> Stan Hoeppner said:
> >>>
> >>>> How can I disable or change the filestreams behavior so all files go
> >>>> into the one AG for the single directory test?
> >>>
> >>> Hi Stan,
> >>>
> >>> Instead of mounting with -o filestreams, would using the chattr flag
> >>> instead help?
> >>>
> >>> See
> >>> http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch06s16.html
> >>
> >> That won't help.  That turns it on (if it's not enabled by default these
> >> days).  I need to turn off the behavior I'm seeing, whether it's due to
> >> the filestreams allocator or default inode64.  Then again it may not be
> >> possible to turn it off...
> >>
> >> Anyone have other ideas on how to accomplish my goal?  Parallel writes
> >> to a single AG on the outer platter edge vs the same to all AGs across
> >> the entire platter?  I'm simply trying to demonstrate the differences in
> >> aggregate bandwidth due to the extra seek latency of all AGs case.
> >>
> > 
> > What about just preallocating the files? Obviously this removes block
> > allocation contention from your experiment, but it's not clear if that's
> > relevant to your test. If I create a smaller, but analogous fs to yours,
> > I seem to get this behavior from just doing an fallocate of each file in
> > advance.
> > 
> > E.g., Create directory 0, fallocate 44 files all of which end up in AG
> > 0. Create directory 1, fallocate 44 files which end up in AG 1, etc.
> > From there you can do direct I/O overwrites to 44 files across each AG
> > or 44 files in any single AG.
> 
> I figured preallocating would get me what I want but I've never used
> fallocate, nor dd into fallocated files.  Is there anything special
> required here with dd, or can I simply specify the filename to dd, and
> make sure bs + count doesn't go beyond EOF?
> 

Ah, yeah. dd will truncate the file by default iirc, which would free
the preallocated blocks and start from scratch. Specify 'conv=notrunc'
as part of the command line to get around that.

Brian

> Thanks,
> Stan
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: file streams allocator behavior
  2014-10-26 23:56 ` Dave Chinner
@ 2014-10-27 23:24   ` Stan Hoeppner
  0 siblings, 0 replies; 8+ messages in thread
From: Stan Hoeppner @ 2014-10-27 23:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 10/26/2014 06:56 PM, Dave Chinner wrote:
> On Sat, Oct 25, 2014 at 01:12:48PM -0500, Stan Hoeppner wrote:
>> I recall reading a while back something about disabling the filestreams
>> allocator, or at least changing its behavior, but I'm unable to find that.
>>
>> What I'm trying to do is use parallel dd w/O_DIRECT to write 44 files in
>> parallel to 44 directories, thus all 44 AGs, in one test, then write 44
>> files to one dir, one AG, in another test.  The purpose of this
>> quick/dirty exercise is to demonstrate throughput differences due to
>> full platter seeking in the former case and localized seeking in the
>> latter case.
>>
>> But of course the problem I'm running into in the single directory case
>> is that the filestreams allocator starts writing all of the 44 files
>> into the appropriate AG, but then begins allocating extents for each
>> file in other AGs.  This is of course defeating the purpose of the tests.
> 
> That's caused by allocator contention. When you try to write 44
> files to the same dir in parallel, they'll all start with the same
> target AG, but then when one thread is allocating into AG 43 and has
> the AG locked, a second attempt to allocate to than AG will see the
> AG locked and so it will move to find the next AG that is not
> locked.

That's what I suspected given what I was seeing.

> Remember, AGs were not originally designed for confining physical
> locality - they are designed to allow allocator parallelism. Hence

Right.  But they sure do come in handy when used this way with
preallocated files.  I suspect we'll make heavy use of this when they
have me back to implement my recommendations.

> once the file has jumped to a new AG it will try to allocate
> sequentially from that point onwards in that same AG, until either
> ENOSPC or further contention.
> 
> Hence with a workload like this, if the writes continue for long
> enough each file will end up finding it's own uncontended AG and
> hence mostly end up contiguous on disk and not getting blocked
> waiting for allocation on other files. When you have as many writers
> as there are AGs, however, such a steady state is generally not
> possible as there will always be files trying to write into the same
> AG.

Yep.  That's exactly what xfs_bmap was showing.  Some files had extents
in 3-4 AGs, some in 8 or more AGs.

> As it is, filestreams is not designed for this sort of parallel
> workload. filestreams is designed to separate single threaded
> streams of IO into different locations, not handle concurrent writes
> into multiple files in the same directory.

Right.  I incorrectly refered to the filestreams allocator as I didn't
know of the inode64 congestion control mechanism at that time.  I simply
had recalled one of your emails where you and Christoph were discussing
mods to the filestreams allocator, and the file pattern to the AGs
looked familiar.

> As it is, the inode64 will probably demonstrate exactly the same
> behaviour because it will start by trying to write all the files to
> the same AG and hence hit allocator contention, too.

I was using the inode64 allocator, and yes, it did.  Brian saved my
bacon as I wanted to use preallocated files but didn't know exactly how.
 I knew all their extents would be in the AG I wanted them in, given the
way I intended to do it.

And doing this demonstrated what I anticipated.  Writing 2GB into each
of 132 files with 132 parallel dd processes with O_DIRECT, 264GB total,
I achieved:

AG0 only	1377 MB/s
AGs 0-43	 767 MB/s

Due to the nature of the application, we should be able to distribute
the files in such a manner that only 1 or 2 adjacent AGs are being
accesses concurrently.  This will greatly reduce head seeking,
increasing throughput, as demonstrated above.  All thanks to the
allocation group architecture of XFS.  We'd not be able to do this with
any other filesystem AFAIK.

Thanks,
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: file streams allocator behavior
  2014-10-25 18:12 Stan Hoeppner
@ 2014-10-26 23:56 ` Dave Chinner
  2014-10-27 23:24   ` Stan Hoeppner
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2014-10-26 23:56 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

On Sat, Oct 25, 2014 at 01:12:48PM -0500, Stan Hoeppner wrote:
> I recall reading a while back something about disabling the filestreams
> allocator, or at least changing its behavior, but I'm unable to find that.
> 
> What I'm trying to do is use parallel dd w/O_DIRECT to write 44 files in
> parallel to 44 directories, thus all 44 AGs, in one test, then write 44
> files to one dir, one AG, in another test.  The purpose of this
> quick/dirty exercise is to demonstrate throughput differences due to
> full platter seeking in the former case and localized seeking in the
> latter case.
> 
> But of course the problem I'm running into in the single directory case
> is that the filestreams allocator starts writing all of the 44 files
> into the appropriate AG, but then begins allocating extents for each
> file in other AGs.  This is of course defeating the purpose of the tests.

That's caused by allocator contention. When you try to write 44
files to the same dir in parallel, they'll all start with the same
target AG, but then when one thread is allocating into AG 43 and has
the AG locked, a second attempt to allocate to than AG will see the
AG locked and so it will move to find the next AG that is not
locked.

Remember, AGs were not originally designed for confining physical
locality - they are designed to allow allocator parallelism. Hence
once the file has jumped to a new AG it will try to allocate
sequentially from that point onwards in that same AG, until either
ENOSPC or further contention.

Hence with a workload like this, if the writes continue for long
enough each file will end up finding it's own uncontended AG and
hence mostly end up contiguous on disk and not getting blocked
waiting for allocation on other files. When you have as many writers
as there are AGs, however, such a steady state is generally not
possible as there will always be files trying to write into the same
AG.

As it is, filestreams is not designed for this sort of parallel
workload. filestreams is designed to separate single threaded
streams of IO into different locations, not handle concurrent writes
into multiple files in the same directory.

As it is, the inode64 will probably demonstrate exactly the same
behaviour because it will start by trying to write all the files to
the same AG and hence hit allocator contention, too.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* file streams allocator behavior
@ 2014-10-25 18:12 Stan Hoeppner
  2014-10-26 23:56 ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Stan Hoeppner @ 2014-10-25 18:12 UTC (permalink / raw)
  To: xfs

I recall reading a while back something about disabling the filestreams
allocator, or at least changing its behavior, but I'm unable to find that.

What I'm trying to do is use parallel dd w/O_DIRECT to write 44 files in
parallel to 44 directories, thus all 44 AGs, in one test, then write 44
files to one dir, one AG, in another test.  The purpose of this
quick/dirty exercise is to demonstrate throughput differences due to
full platter seeking in the former case and localized seeking in the
latter case.

But of course the problem I'm running into in the single directory case
is that the filestreams allocator starts writing all of the 44 files
into the appropriate AG, but then begins allocating extents for each
file in other AGs.  This is of course defeating the purpose of the tests.

> /mnt/VOL1/43# for i in `seq 0 43`;do xfs_bmap -v test-$i;done
> test-0:
>  EXT: FILE-OFFSET         BLOCK-RANGE              AG AG-OFFSET            TOTAL FLAGS
>    0: [0..1535]:          92341791520..92341793055 43 (160..1695)           1536 01111
>    1: [1536..3071]:       92341794688..92341796223 43 (3328..4863)          1536 00011
...
>   88: [135168..136703]:   9972480..9974015          0 (9972480..9974015)    1536 00011
>   89: [136704..138239]:   9984768..9986303          0 (9984768..9986303)    1536 00011
...
>  146: [224256..225791]:   2158167552..2158169087    1 (10684032..10685567)  1536
>  147: [225792..227327]:   2158181376..2158182911    1 (10697856..10699391)  1536
...
>  160: [245760..254975]:   10744866688..10744875903  5 (7449088..7458303)    9216 00011
>  161: [254976..256511]:   10744877440..10744878975  5 (7459840..7461375)    1536 00011
...
...
> test-43:
>  EXT: FILE-OFFSET         BLOCK-RANGE              AG AG-OFFSET             TOTAL FLAGS
>    0: [0..1535]:          92341936000..92341937535 43 (144640..146175)       1536 00011
>    1: [1536..3071]:       92342003584..92342005119 43 (212224..213759)       1536 00011
...
>   69: [105984..107519]:   4303912064..4303913599    2 (8945024..8946559)     1536 00011
>   70: [107520..109055]:   4303922816..4303924351    2 (8955776..8957311)     1536 00011
...
...
>  180: [276480..278015]:   8598943744..8598945279    4 (9009664..9011199)     1536 00011
...
>  181: [278016..279551]:   10744961920..10744963455  5 (7544320..7545855)     1536 00011
>  182: [279552..281087]:   10744968064..10744969599  5 (7550464..7551999)     1536 00011
...
...

Files being created are 1.6 GB.  Filesystem is 44 TB.  AGs are 1 TB.
AGs are 0-43.  Directories, /mnt/VOL1/0 - /mnt/VOL1/43.  Device is a
single RAID5 LUN.

How can I disable or change the filestreams behavior so all files go
into the one AG for the single directory test?

Thanks,
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-10-27 23:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-25 18:56 file streams allocator behavior Richard Scobie
2014-10-25 21:26 ` Stan Hoeppner
2014-10-26 14:26   ` Brian Foster
2014-10-26 17:26     ` Stan Hoeppner
2014-10-26 22:18       ` Brian Foster
  -- strict thread matches above, loose matches on Subject: below --
2014-10-25 18:12 Stan Hoeppner
2014-10-26 23:56 ` Dave Chinner
2014-10-27 23:24   ` Stan Hoeppner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.