From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 552DD7CA1 for ; Mon, 25 Jan 2016 13:00:57 -0600 (CST) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id 21AE18F8033 for ; Mon, 25 Jan 2016 11:00:57 -0800 (PST) Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by cuda.sgi.com with ESMTP id ilTl78pjlf86qIsi (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Mon, 25 Jan 2016 11:00:53 -0800 (PST) Received: by mail-wm0-f41.google.com with SMTP id 123so78174347wmz.0 for ; Mon, 25 Jan 2016 11:00:53 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <56A66869.3080506@ddn.com> References: <56A66869.3080506@ddn.com> Date: Mon, 25 Jan 2016 14:00:52 -0500 Message-ID: Subject: Re: xfs and swift From: Mark Seger List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============6235785808769716861==" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Bernd Schubert Cc: Laurence Oberman , Linux fs XFS --===============6235785808769716861== Content-Type: multipart/alternative; boundary=001a113c808c1dc745052a2d30e1 --001a113c808c1dc745052a2d30e1 Content-Type: text/plain; charset=UTF-8 hey bernd, long time no chat. it turns out you don't have to know what swift is because I've been able to demonstrate this behavior with a very simple python script that simply creates files in a 3-tier hierarchy. the third level directories each contain a single file which for my testing are all 1K. I have played wiht cache_pressure and it doesn't seem to make a difference, though that was awhlle ago and perhaps it is worth revisiting. one thing you may get a hoot out of, being a collectl user, is I have an xfs plugin that lets you look at a ton of xfs stats either in realtime or after the fact just like any other collectl stat. I just havent' added it to the kit yet. -mark On Mon, Jan 25, 2016 at 1:24 PM, Bernd Schubert wrote: > Hi Mark! > > On 01/06/2016 04:15 PM, Mark Seger wrote: > > I've recently found the performance our development swift system is > > degrading over time as the number of objects/files increases. This is a > > relatively small system, each server has 3 400GB disks. The system I'm > > currently looking at has about 70GB tied up in slabs alone, close to 55GB > > in xfs inodes and ili, and about 2GB free. The kernel > > is 3.14.57-1-amd64-hlinux. > > > > Here's the way the filesystems are mounted: > > > > /dev/sdb1 on /srv/node/disk0 type xfs > > > (rw,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=1536,noquota) > > > > I can do about 2000 1K file creates/sec when running 2 minute PUT tests > at > > 100 threads. If I repeat that tests for multiple hours, I see the number > > of IOPS steadily decreasing to about 770 and the very next run it drops > to > > 260 and continues to fall from there. This happens at about 12M files. > > > > The directory structure is 2 tiered, with 1000 directories per tier so we > > can have about 1M of them, though they don't currently all exist. > > This sounds pretty much like hash directories as used by some parallel > file systems (Lustre and in the past BeeGFS). For us the file create > slow down was due to lookup in directories if a file with the same name > already exists. At least for ext4 it was rather easy to demonstrate that > simply caching directory blocks would eliminate that issue. > We then considered working on a better kernel cache, but in the end > simply found a way to get rid of such a simple directory structure in > BeeGFS and changed it to a more complex layout, but with less random > access and so we could eliminate the main reason for the slow down. > > Now I have no idea what a "swift system" is and in which order it > creates and accesses those files and if it would be possible to change > the access pattern. One thing you might try and which should work much > better since 3.11 is the vfs_cache_pressure setting. The lower it is the > less dentries/inodes are dropped from cache when pages are needed for > file data. > > > > Cheers, > Bernd --001a113c808c1dc745052a2d30e1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
hey bernd, long time no chat. =C2=A0it turns out you don&#= 39;t have to know what swift is because I've been able to demonstrate t= his behavior with a very simple python script that simply creates files in = a 3-tier hierarchy. =C2=A0the third level directories each contain a single= file which for my testing are all 1K.

I have played wih= t cache_pressure and it doesn't seem to make a difference, though that = was awhlle ago and perhaps it is worth revisiting. one thing you may get a = hoot out of, being a collectl user, is I have an xfs plugin that lets you l= ook at a ton of xfs stats either in realtime or after the fact just like an= y other collectl stat.=C2=A0 I just havent' added it to the kit yet.

-mark

On Mon, Jan 25, 2016 at 1:24 PM, Bernd Schubert <bs= chubert@ddn.com> wrote:
Hi = Mark!

On 01/06/2016 04:15 PM, Mark Seger wrote:
> I've recently found the performance our development swift system i= s
> degrading over time as the number of objects/files increases.=C2=A0 Th= is is a
> relatively small system, each server has 3 400GB disks.=C2=A0 The syst= em I'm
> currently looking at has about 70GB tied up in slabs alone, close to 5= 5GB
> in xfs inodes and ili, and about 2GB free.=C2=A0 The kernel
> is 3.14.57-1-amd64-hlinux.
>
> Here's the way the filesystems are mounted:
>
> /dev/sdb1 on /srv/node/disk0 type xfs
> (rw,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=3D8,logbsize=3D= 256k,sunit=3D512,swidth=3D1536,noquota)
>
> I can do about 2000 1K file creates/sec when running 2 minute PUT test= s at
> 100 threads.=C2=A0 If I repeat that tests for multiple hours, I see th= e number
> of IOPS steadily decreasing to about 770 and the very next run it drop= s to
> 260 and continues to fall from there.=C2=A0 This happens at about 12M = files.
>
> The directory structure is 2 tiered, with 1000 directories per tier so= we
> can have about 1M of them, though they don't currently all exist.<= br>
This sounds pretty much like hash directories as used by some parall= el
file systems (Lustre and in the past BeeGFS). For us the file create
slow down was due to lookup in directories if a file with the same name
already exists. At least for ext4 it was rather easy to demonstrate that simply caching directory blocks would eliminate that issue.
We then considered working on a better kernel cache, but in the end
simply found a way to get rid of such a simple directory structure in
BeeGFS and changed it to a more complex layout, but with less random
access and so we could eliminate the main reason for the slow down.

Now I have no idea what a "swift system" is and in which order it=
creates and accesses those files and if it would be possible to change
the access pattern. One thing you might try and which should work much
better since 3.11 is the vfs_cache_pressure setting. The lower it is the less dentries/inodes are dropped from cache when pages are needed for
file data.



Cheers,
Bernd

--001a113c808c1dc745052a2d30e1-- --===============6235785808769716861== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs --===============6235785808769716861==--