All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Sage Weil <sage@newdream.net>, David Casier <david.casier@aevoo.fr>
Cc: Ceph Development <ceph-devel@vger.kernel.org>,
	Dave Chinner <dchinner@redhat.com>,
	Brian Foster <bfoster@redhat.com>,
	Eric Sandeen <esandeen@redhat.com>
Subject: Re: Fwd: Fwd: [newstore (again)] how disable double write WAL
Date: Fri, 4 Dec 2015 15:12:25 -0500	[thread overview]
Message-ID: <5661F3A9.8070703@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1512011357340.19170@cobra.newdream.net>

On 12/01/2015 05:02 PM, Sage Weil wrote:
> Hi David,
>
> On Tue, 1 Dec 2015, David Casier wrote:
>> Hi Sage,
>> With a standard disk (4 to 6 TB), and a small flash drive, it's easy
>> to create an ext4 FS with metadata on flash
>>
>> Example with sdg1 on flash and sdb on hdd :
>>
>> size_of() {
>>    blockdev --getsize $1
>> }
>>
>> mkdmsetup() {
>>    _ssd=/dev/$1
>>    _hdd=/dev/$2
>>    _size_of_ssd=$(size_of $_ssd)
>>    echo """0 $_size_of_ssd linear $_ssd 0
>>    $_size_of_ssd $(size_of $_hdd) linear $_hdd 0" | dmsetup create dm-${1}-${2}
>> }
>>
>> mkdmsetup sdg1 sdb
>>
>> mkfs.ext4 -O ^has_journal,flex_bg,^uninit_bg,^sparse_super,sparse_super2,^extra_isize,^dir_nlink,^resize_inode
>> -E packed_meta_blocks=1,lazy_itable_init=0 -G 32768 -I 128 -i
>> $((1024*512)) /dev/mapper/dm-sdg1-sdb
>>
>> With that, all meta_blocks are on the SSD
>>
>> If omap are on SSD, there are almost no metadata on HDD
>>
>> Consequence : performance Ceph (with hack on filestore without journal
>> and directIO) are almost same that performance of the HDD.
>>
>> With cache-tier, it's very cool !
> Cool!  I know XFS lets you do that with the journal, but I'm not sure if
> you can push the fs metadata onto a different device too.. I'm guessing
> not?
>
>> That is why we are working on a hybrid approach HDD / Flash on ARM or Intel
>>
>> With newstore, it's much more difficult to control the I/O profil.
>> Because rocksDB embedded its own intelligence
> This is coincidentally what I've been working on today.  So far I've just
> added the ability to put the rocksdb WAL on a second device, but it's
> super easy to push rocksdb data there as well (and have it spill over onto
> the larger, slower device if it fills up).  Or to put the rocksdb WAL on a
> third device (e.g., expensive NVMe or NVRAM).
>
> See this ticket for the ceph-disk tooling that's needed:
>
> 	http://tracker.ceph.com/issues/13942
>
> I expect this will be more flexible and perform better than the ext4
> metadata option, but we'll need to test on your hardware to confirm!
>
> sage

I think that XFS "realtime" subvolumes are the thing that does this -  the 
second volume contains only the data (no metadata).

Seem to recall that it is popular historically with video appliances, etc but it 
is not commonly used.

Some of the XFS crew cc'ed above would have more information on this,

Ric



  reply	other threads:[~2015-12-04 20:12 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9D046674-EA8B-4CB5-B049-3CF665D4ED64@aevoo.fr>
2015-11-24 20:42 ` Fwd: [newstore (again)] how disable double write WAL Sage Weil
     [not found]   ` <CA+gn+znHyioZhOvuidN1pvMgRMOMvjbjcues_+uayYVadetz=A@mail.gmail.com>
2015-12-01 20:34     ` Fwd: " David Casier
2015-12-01 22:02       ` Sage Weil
2015-12-04 20:12         ` Ric Wheeler [this message]
2015-12-04 20:20           ` Eric Sandeen
2015-12-08  4:46           ` Dave Chinner
2016-02-15 15:18             ` David Casier
2016-02-15 16:21               ` Eric Sandeen
2016-02-16  3:35               ` Dave Chinner
2016-02-16  8:14                 ` David Casier
2016-02-16  8:39                   ` David Casier
2016-02-19  5:26                     ` Dave Chinner
2016-02-19 11:28                       ` Blair Bethwaite
2016-02-19 12:57                         ` Mark Nelson
2016-02-22 12:01                       ` Sage Weil
2016-02-22 17:09                         ` David Casier
2016-02-22 17:16                           ` Sage Weil
2016-02-18 17:54                 ` David Casier
2016-02-19 17:06                 ` Eric Sandeen
2016-02-21 10:56                   ` David Casier
2016-02-22 15:56                     ` Eric Sandeen
2016-02-22 16:12                       ` David Casier
2016-02-22 16:16                         ` Eric Sandeen
2016-02-22 17:17                           ` Howard Chu
2016-02-23  5:20                           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5661F3A9.8070703@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=david.casier@aevoo.fr \
    --cc=dchinner@redhat.com \
    --cc=esandeen@redhat.com \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.