All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: David Casier <david.casier@aevoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>,
	Ric Wheeler <rwheeler@redhat.com>, Sage Weil <sage@newdream.net>,
	Ceph Development <ceph-devel@vger.kernel.org>,
	Brian Foster <bfoster@redhat.com>
Subject: Re: Fwd: Fwd: [newstore (again)] how disable double write WAL
Date: Mon, 22 Feb 2016 10:16:43 -0600	[thread overview]
Message-ID: <56CB346B.50200@redhat.com> (raw)
In-Reply-To: <CA+gn+z=7N+uaNXQ3DU65ZFiNhW-VPjwf+ppaMLxswgoep5EJ3g@mail.gmail.com>

On 2/22/16 10:12 AM, David Casier wrote:
> I have carried out tests very quickly and I have not had time to
> concentrate fully on XFS.
>  maxpct =0.2 => 0.2% of 4To = 8Go
> Because my existing ssd partitions are small
> 
> If i'm not mistaken, and with what Dave says :
> By default, data is written to 2^32 inodes of 256 bytes (= 1TiB).
> With maxpct, you set the maximum size used by inodes, depending on the
> percentage of disk

Yes, that's reasonable, I just wanted to be sure.  I hadn't seen
it stated that your SSD was that small.

Thanks,
-Eric

> 2016-02-22 16:56 GMT+01:00 Eric Sandeen <sandeen@redhat.com>:
>> On 2/21/16 4:56 AM, David Casier wrote:
>>> I made a simple test with XFS
>>>
>>> dm-sdf6-sdg1 :
>>> -------------------------------------------------------------------------------------------
>>> ||  sdf6 : SSD part ||           sdg1 : HDD (4TB)                         ||
>>> -------------------------------------------------------------------------------------------
>>
>> If this is in response to my concern about not working on small
>> filesystems, the above is sufficiently large that inode32
>> won't be ignored.
>>
>>> [root@aotest ~]# mkfs.xfs -f -i maxpct=0.2 /dev/mapper/dm-sdf6-sdg1
>>
>> Hm, why set maxpct?  This does affect how the inode32 allocator
>> works, but I'm wondering if that's why you set it.  How did you arrive
>> at 0.2%?  Just want to be sure you understand what you're tuning.
>>
>> Thanks,
>> -Eric
>>
>>> [root@aotest ~]# mount -o inode32 /dev/mapper/dm-sdf6-sdg1 /mnt
>>>
>>> 8 directory with 16, 32, ..., 128 sub-directory and 16, 32, ..., 128
>>> files (82 bytes)
>>> 1 xattr per dir and 3 xattr per file (user.cephosd...)
>>>
>>> 3 800 000 files and directory
>>> 16 GiB was written on SSD
>>>
>>> ------------------------------------------------------
>>> ||                 find | wc -l                   ||
>>> ------------------------------------------------------
>>> || Objects per dir || % IOPS on SSD ||
>>> ------------------------------------------------------
>>> ||           16         ||            99           ||
>>> ||           32         ||           100          ||
>>> ||           48         ||            93           ||
>>> ||           64         ||            88           ||
>>> ||           80         ||            88           ||
>>> ||           96         ||            86           ||
>>> ||          112        ||            87           ||
>>> ||          128        ||            88           ||
>>> -----------------------------------------------------
>>>
>>> ------------------------------------------------------
>>> ||           find -exec getfattr '{}' \;         ||
>>> ------------------------------------------------------
>>> || Objects per dir || % IOPS on SSD ||
>>> ------------------------------------------------------
>>> ||           16         ||            96           ||
>>> ||           32         ||            97           ||
>>> ||           48         ||            96           ||
>>> ||           64         ||            95           ||
>>> ||           80         ||            94           ||
>>> ||           96         ||            93           ||
>>> ||          112        ||            94           ||
>>> ||          128        ||            95           ||
>>> -----------------------------------------------------
>>>
>>> It is true that filestore is not designed to make Big Data and the
>>> cache must work inode / xattr
>>>
>>> I hope to see quiclky Bluestore in production :)
>>>
>>> 2016-02-19 18:06 GMT+01:00 Eric Sandeen <esandeen@redhat.com>:
>>>>
>>>>
>>>> On 2/15/16 9:35 PM, Dave Chinner wrote:
>>>>> On Mon, Feb 15, 2016 at 04:18:28PM +0100, David Casier wrote:
>>>>>> Hi Dave,
>>>>>> 1TB is very wide for SSD.
>>>>>
>>>>> It fills from the bottom, so you don't need 1TB to make it work
>>>>> in a similar manner to the ext4 hack being described.
>>>>
>>>> I'm not sure it will work for smaller filesystems, though - we essentially
>>>> ignore the inode32 mount option for sufficiently small filesystems.
>>>>
>>>> i.e. if inode numbers > 32 bits can't exist, we don't change the allocator,
>>>> at least not until the filesystem (possibly) gets grown later.
>>>>
>>>> So for inode32 to impact behavior, it needs to be on a filesystem
>>>> of sufficient size (at least 1 or 2T, depending on block size, inode
>>>> size, etc). Otherwise it will have no effect today.
>>>>
>>>> Dave, I wonder if we need another mount option to essentially mean
>>>> "invoke the inode32 allocator regardless of filesystem size?"
>>>>
>>>> -Eric
>>>>
>>>>>> Exemple with only 10GiB :
>>>>>> https://www.aevoo.fr/2016/02/14/ceph-ext4-optimisation-for-filestore/
>>>>>
>>>>> It's a nice toy, but it's not something that is going scale reliably
>>>>> for production.  That caveat at the end:
>>>>>
>>>>>       "With this model, filestore rearrange the tree very
>>>>>       frequently : + 40 I/O every 32 objects link/unlink."
>>>>>
>>>>> Indicates how bad the IO patterns will be when modifying the
>>>>> directory structure, and says to me that it's not a useful
>>>>> optimisation at all when you might be creating several thousand
>>>>> files/s on a filesystem. That will end up IO bound, SSD or not.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Dave.
>>>>>
>>>
>>>
>>>
>>
> 
> 
> 


  reply	other threads:[~2016-02-22 16:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9D046674-EA8B-4CB5-B049-3CF665D4ED64@aevoo.fr>
2015-11-24 20:42 ` Fwd: [newstore (again)] how disable double write WAL Sage Weil
     [not found]   ` <CA+gn+znHyioZhOvuidN1pvMgRMOMvjbjcues_+uayYVadetz=A@mail.gmail.com>
2015-12-01 20:34     ` Fwd: " David Casier
2015-12-01 22:02       ` Sage Weil
2015-12-04 20:12         ` Ric Wheeler
2015-12-04 20:20           ` Eric Sandeen
2015-12-08  4:46           ` Dave Chinner
2016-02-15 15:18             ` David Casier
2016-02-15 16:21               ` Eric Sandeen
2016-02-16  3:35               ` Dave Chinner
2016-02-16  8:14                 ` David Casier
2016-02-16  8:39                   ` David Casier
2016-02-19  5:26                     ` Dave Chinner
2016-02-19 11:28                       ` Blair Bethwaite
2016-02-19 12:57                         ` Mark Nelson
2016-02-22 12:01                       ` Sage Weil
2016-02-22 17:09                         ` David Casier
2016-02-22 17:16                           ` Sage Weil
2016-02-18 17:54                 ` David Casier
2016-02-19 17:06                 ` Eric Sandeen
2016-02-21 10:56                   ` David Casier
2016-02-22 15:56                     ` Eric Sandeen
2016-02-22 16:12                       ` David Casier
2016-02-22 16:16                         ` Eric Sandeen [this message]
2016-02-22 17:17                           ` Howard Chu
2016-02-23  5:20                           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56CB346B.50200@redhat.com \
    --to=sandeen@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=david.casier@aevoo.fr \
    --cc=dchinner@redhat.com \
    --cc=rwheeler@redhat.com \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.