All of lore.kernel.org
 help / color / mirror / Atom feed
* Containers, Btrfs vs Btrfs + overlayfs
@ 2017-07-13 20:49 Chris Murphy
  2017-07-13 22:32 ` Liu Bo
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2017-07-13 20:49 UTC (permalink / raw)
  To: Btrfs BTRFS

Has anyone been working with Docker and Btrfs + overlayfs? It seems
superfluous or unnecessary to use overlayfs, but the shared page cache
aspect and avoiding some of the problems with large numbers of Btrfs
snapshots, might make it a useful combination. But I'm not finding
useful information with searches. Typically it's Btrfs alone vs
ext4/XFS + overlayfs.

?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Containers, Btrfs vs Btrfs + overlayfs
  2017-07-13 20:49 Containers, Btrfs vs Btrfs + overlayfs Chris Murphy
@ 2017-07-13 22:32 ` Liu Bo
  2017-07-13 23:26   ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Liu Bo @ 2017-07-13 22:32 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Thu, Jul 13, 2017 at 02:49:27PM -0600, Chris Murphy wrote:
> Has anyone been working with Docker and Btrfs + overlayfs? It seems
> superfluous or unnecessary to use overlayfs, but the shared page cache
> aspect and avoiding some of the problems with large numbers of Btrfs
> snapshots, might make it a useful combination. But I'm not finding
> useful information with searches. Typically it's Btrfs alone vs
> ext4/XFS + overlayfs.
> 
> ?

Is there a reproducer for problems with large number of btrfs
snapshots?

Btrfs + overlayfs?  The copy-up coperation in overlayfs can take
advantage of btrfs's clone, but this benefit applies for xfs, too.

thanks,

-liubo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Containers, Btrfs vs Btrfs + overlayfs
  2017-07-13 22:32 ` Liu Bo
@ 2017-07-13 23:26   ` Chris Murphy
  2017-07-14  2:01     ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2017-07-13 23:26 UTC (permalink / raw)
  To: bo.li.liu; +Cc: Chris Murphy, Btrfs BTRFS

On Thu, Jul 13, 2017 at 4:32 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Thu, Jul 13, 2017 at 02:49:27PM -0600, Chris Murphy wrote:
>> Has anyone been working with Docker and Btrfs + overlayfs? It seems
>> superfluous or unnecessary to use overlayfs, but the shared page cache
>> aspect and avoiding some of the problems with large numbers of Btrfs
>> snapshots, might make it a useful combination. But I'm not finding
>> useful information with searches. Typically it's Btrfs alone vs
>> ext4/XFS + overlayfs.
>>
>> ?
>
> Is there a reproducer for problems with large number of btrfs
> snapshots?

No benchmarking comparison but it's known that deletion of snapshots
gets more expensive when there are many snapshots due to backref
search and metadata updates. I have no idea how it compares to
overlayfs. But then also some use cases I guess it's non-trivial
benefit to leverage a shared page cache.

> Btrfs + overlayfs?  The copy-up coperation in overlayfs can take
> advantage of btrfs's clone, but this benefit applies for xfs, too.

Btrfs supports fs shrink, and also multiple device add/remove so it's
pretty nice for managing its storage in the cloud. And also seed
device might have uses. Some of it is doable with LVM but it's much
simpler, faster and safer with Btrfs.

And that's why I'm kinda curious about the combination of Btrfs and
overlayfs. Overlayfs managed by Docker. And Btrfs for simpler and more
flexible storage management.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Containers, Btrfs vs Btrfs + overlayfs
  2017-07-13 23:26   ` Chris Murphy
@ 2017-07-14  2:01     ` Qu Wenruo
  2017-07-14  2:24       ` Sargun Dhillon
  2017-07-14  2:33       ` Chris Murphy
  0 siblings, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2017-07-14  2:01 UTC (permalink / raw)
  To: Chris Murphy, bo.li.liu; +Cc: Btrfs BTRFS



On 2017年07月14日 07:26, Chris Murphy wrote:
> On Thu, Jul 13, 2017 at 4:32 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
>> On Thu, Jul 13, 2017 at 02:49:27PM -0600, Chris Murphy wrote:
>>> Has anyone been working with Docker and Btrfs + overlayfs? It seems
>>> superfluous or unnecessary to use overlayfs, but the shared page cache
>>> aspect and avoiding some of the problems with large numbers of Btrfs
>>> snapshots, might make it a useful combination. But I'm not finding
>>> useful information with searches. Typically it's Btrfs alone vs
>>> ext4/XFS + overlayfs.
>>>
>>> ?
>>
>> Is there a reproducer for problems with large number of btrfs
>> snapshots?
> 
> No benchmarking comparison but it's known that deletion of snapshots
> gets more expensive when there are many snapshots due to backref
> search and metadata updates. I have no idea how it compares to
> overlayfs. But then also some use cases I guess it's non-trivial
> benefit to leverage a shared page cache.

In fact, except balance and quota, I can't see much extra performance 
impact from backref walk.

And if it's not snapshots, but subvolumes, then more subvolumes means 
smaller subvolume trees, and less race to lock subvolume trees.
So, more (evenly distributed) subvolumes should in fact lead to higher 
performance.

> 
>> Btrfs + overlayfs?  The copy-up coperation in overlayfs can take
>> advantage of btrfs's clone, but this benefit applies for xfs, too.
> 
> Btrfs supports fs shrink, and also multiple device add/remove so it's
> pretty nice for managing its storage in the cloud. And also seed
> device might have uses. Some of it is doable with LVM but it's much
> simpler, faster and safer with Btrfs.

Faster? Not really.
For metadata operation, btrfs is slower than traditional FSes.

Due to metadata CoW, any metadata update will lead to superblock update.
Such extra FUA for superblock is specially obvious for fsync heavy load 
but low concurrency case.
Not to mention its default data CoW will lead to metadata CoW, making 
things even slower.

And race to lock fs/subvolume trees makes metadata operation even 
slower, especially for multi-thread IO.
Unlike other FSes which use one-tree-one-inode, btrfs uses 
one-tree-one-subvoume, which makes race much hotter.

Extent tree used to have the same problem, but delayed-ref (no matter 
you like it or not) did reduce race and improved performance.

IIRC, some postgresql benchmark shows that XFS/Ext4 with LVM-thin 
provide much better performance than Btrfs, even ZFS-on-Linux 
out-performs btrfs.

> 
> And that's why I'm kinda curious about the combination of Btrfs and
> overlayfs. Overlayfs managed by Docker. And Btrfs for simpler and more
> flexible storage management.
Despite the performance problem, (working) btrfs does provide flex and 
unified management.

So implementing shared page cache in btrfs will eliminate the necessary 
for overlayfs. :)
Just kidding, such support need quite a lot of VFS and MM modification, 
and I don't know if we will be able to implement it at all.

Thanks,
Qu

> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Containers, Btrfs vs Btrfs + overlayfs
  2017-07-14  2:01     ` Qu Wenruo
@ 2017-07-14  2:24       ` Sargun Dhillon
  2017-07-14  2:52         ` Qu Wenruo
  2017-07-24 20:43         ` Chris Murphy
  2017-07-14  2:33       ` Chris Murphy
  1 sibling, 2 replies; 9+ messages in thread
From: Sargun Dhillon @ 2017-07-14  2:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Chris Murphy, bo.li.liu, Btrfs BTRFS

On Thu, Jul 13, 2017 at 7:01 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年07月14日 07:26, Chris Murphy wrote:
>>
>> On Thu, Jul 13, 2017 at 4:32 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
>>>
>>> On Thu, Jul 13, 2017 at 02:49:27PM -0600, Chris Murphy wrote:
>>>>
>>>> Has anyone been working with Docker and Btrfs + overlayfs? It seems
>>>> superfluous or unnecessary to use overlayfs, but the shared page cache
>>>> aspect and avoiding some of the problems with large numbers of Btrfs
>>>> snapshots, might make it a useful combination. But I'm not finding
>>>> useful information with searches. Typically it's Btrfs alone vs
>>>> ext4/XFS + overlayfs.
>>>>
>>>> ?
We've been running Btrfs with Docker at appreciable scale for a few
months now (100-200k containers  / day ). We originally looked at the
Overlay FS route, but it turns out that one of the downsides the
shared page cache is it breaks cgroup accounting. If you want to
properly allow people to ensure their container never touches disk, it
may get complicated.
>>>
>>>
>>> Is there a reproducer for problems with large number of btrfs
>>> snapshots?
>>
>>
>> No benchmarking comparison but it's known that deletion of snapshots
>> gets more expensive when there are many snapshots due to backref
>> search and metadata updates. I have no idea how it compares to
>> overlayfs. But then also some use cases I guess it's non-trivial
>> benefit to leverage a shared page cache.
We churn through ~80 containers per instance (over a day or so), and
each container's image has 20 layers. The deletion if very expensive,
and it would be nice to be able to throttle it, but ~100GB subvolumes
(on SSD) with 10000+ files are typically removed in <5s. Qgroups turn
out to have a lot of overhead here -- even with a single level.  At
least in our testing, even with qgroups, there's lower latency for I/O
and metadata during build jobs (Java or C compilation) as compared to
OverlayFS on BtrFS or AUFS on ZFS (on Linux). Without qgroups, it's
almost certainly "faster". YMMV though, because we're already paying
the network storage latency cost.

We've investigating using the blkio controller to isolate I/O per
container to avoid I/O stalls, and restrict I/O during snapshot
cleanup, but that's been unsuccessful.
>
>
> In fact, except balance and quota, I can't see much extra performance impact
> from backref walk.
>
> And if it's not snapshots, but subvolumes, then more subvolumes means
> smaller subvolume trees, and less race to lock subvolume trees.
> So, more (evenly distributed) subvolumes should in fact lead to higher
> performance.
>
>>
>>> Btrfs + overlayfs?  The copy-up coperation in overlayfs can take
>>> advantage of btrfs's clone, but this benefit applies for xfs, too.
>>
>>
>> Btrfs supports fs shrink, and also multiple device add/remove so it's
>> pretty nice for managing its storage in the cloud. And also seed
>> device might have uses. Some of it is doable with LVM but it's much
>> simpler, faster and safer with Btrfs.
>
>
> Faster? Not really.
> For metadata operation, btrfs is slower than traditional FSes.
>
> Due to metadata CoW, any metadata update will lead to superblock update.
> Such extra FUA for superblock is specially obvious for fsync heavy load but
> low concurrency case.
> Not to mention its default data CoW will lead to metadata CoW, making things
> even slower.
Since containers are ephemeral, they really shouldn't fsync. One of
the biggest (recent) problems has been workloads that use O_SYNC, or
sync after a large number of operations -- this stalls out all of the
containers (subvolumes) on the machine because the transaction lock is
under hold. This, in turn, manifests itself in soft lockups, and
operational trouble. Our plan to work around it is patch the VFS
layer, and stub out sync for certain cgroups.

>
> And race to lock fs/subvolume trees makes metadata operation even slower,
> especially for multi-thread IO.
> Unlike other FSes which use one-tree-one-inode, btrfs uses
> one-tree-one-subvoume, which makes race much hotter.
>
> Extent tree used to have the same problem, but delayed-ref (no matter you
> like it or not) did reduce race and improved performance.
>
> IIRC, some postgresql benchmark shows that XFS/Ext4 with LVM-thin provide
> much better performance than Btrfs, even ZFS-on-Linux out-performs btrfs.
>
At least in our testing, AUFS + ZFS-on-Linux did not have lower
latency than BtrFS. Stability is decent, bar the occasional soft
lockup, or hung transaction. One of the experiments that I've been
wanting to run is a custom graph driver which has XFS images in
snapshots / subvolumes on BtrFS, and mounts them over loopback -- This
makes things like limiting threads easier, and short-circuiting sync
logic per container.

>>
>> And that's why I'm kinda curious about the combination of Btrfs and
>> overlayfs. Overlayfs managed by Docker. And Btrfs for simpler and more
>> flexible storage management.
>
> Despite the performance problem, (working) btrfs does provide flex and
> unified management.
>
> So implementing shared page cache in btrfs will eliminate the necessary for
> overlayfs. :)
> Just kidding, such support need quite a lot of VFS and MM modification, and
> I don't know if we will be able to implement it at all.
>
> Thanks,
> Qu
>
>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Containers, Btrfs vs Btrfs + overlayfs
  2017-07-14  2:01     ` Qu Wenruo
  2017-07-14  2:24       ` Sargun Dhillon
@ 2017-07-14  2:33       ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2017-07-14  2:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Chris Murphy, bo.li.liu, Btrfs BTRFS

On Thu, Jul 13, 2017 at 8:01 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年07月14日 07:26, Chris Murphy wrote:

>> No benchmarking comparison but it's known that deletion of snapshots
>> gets more expensive when there are many snapshots due to backref
>> search and metadata updates. I have no idea how it compares to
>> overlayfs. But then also some use cases I guess it's non-trivial
>> benefit to leverage a shared page cache.
>
>
> In fact, except balance and quota, I can't see much extra performance impact
> from backref walk.
>
> And if it's not snapshots, but subvolumes, then more subvolumes means
> smaller subvolume trees, and less race to lock subvolume trees.
> So, more (evenly distributed) subvolumes should in fact lead to higher
> performance.

Interesting.


>
>>
>>> Btrfs + overlayfs?  The copy-up coperation in overlayfs can take
>>> advantage of btrfs's clone, but this benefit applies for xfs, too.
>>
>>
>> Btrfs supports fs shrink, and also multiple device add/remove so it's
>> pretty nice for managing its storage in the cloud. And also seed
>> device might have uses. Some of it is doable with LVM but it's much
>> simpler, faster and safer with Btrfs.
>
>
> Faster? Not really.
> For metadata operation, btrfs is slower than traditional FSes.

The equivalent of Btrfs multiple device 'btrfs dev delete' to remove a
device, and migrate bg's to remaining devices, is really slow on LVM
using pvmove. Plus you are allowed to delete devices that haven't had
pvmove applied, so data loss is possible (user induced data loss).


> Due to metadata CoW, any metadata update will lead to superblock update.
> Such extra FUA for superblock is specially obvious for fsync heavy load but
> low concurrency case.
> Not to mention its default data CoW will lead to metadata CoW, making things
> even slower.
>
> And race to lock fs/subvolume trees makes metadata operation even slower,
> especially for multi-thread IO.
> Unlike other FSes which use one-tree-one-inode, btrfs uses
> one-tree-one-subvoume, which makes race much hotter.

OK so this possibly means overlayfs might make things slower since all
I/O ends up getting dumped into one Btrfs fstree; whereas with Docker
using Btrfs rw snapshots, it puts each container's I/O into its own
subvolume.


>
> Extent tree used to have the same problem, but delayed-ref (no matter you
> like it or not) did reduce race and improved performance.
>
> IIRC, some postgresql benchmark shows that XFS/Ext4 with LVM-thin provide
> much better performance than Btrfs, even ZFS-on-Linux out-performs btrfs.

OK.


>> And that's why I'm kinda curious about the combination of Btrfs and
>> overlayfs. Overlayfs managed by Docker. And Btrfs for simpler and more
>> flexible storage management.
>
> Despite the performance problem, (working) btrfs does provide flex and
> unified management.
>
> So implementing shared page cache in btrfs will eliminate the necessary for
> overlayfs. :)
> Just kidding, such support need quite a lot of VFS and MM modification, and
> I don't know if we will be able to implement it at all.

Yeah I've read it's complicated for everyone, even overlayfs folks
have had growing pains.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Containers, Btrfs vs Btrfs + overlayfs
  2017-07-14  2:24       ` Sargun Dhillon
@ 2017-07-14  2:52         ` Qu Wenruo
  2017-07-14  3:18           ` Chris Murphy
  2017-07-24 20:43         ` Chris Murphy
  1 sibling, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2017-07-14  2:52 UTC (permalink / raw)
  To: Sargun Dhillon; +Cc: Chris Murphy, bo.li.liu, Btrfs BTRFS



On 2017年07月14日 10:24, Sargun Dhillon wrote:
> On Thu, Jul 13, 2017 at 7:01 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2017年07月14日 07:26, Chris Murphy wrote:
>>>
>>> On Thu, Jul 13, 2017 at 4:32 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
>>>>
>>>> On Thu, Jul 13, 2017 at 02:49:27PM -0600, Chris Murphy wrote:
>>>>>
>>>>> Has anyone been working with Docker and Btrfs + overlayfs? It seems
>>>>> superfluous or unnecessary to use overlayfs, but the shared page cache
>>>>> aspect and avoiding some of the problems with large numbers of Btrfs
>>>>> snapshots, might make it a useful combination. But I'm not finding
>>>>> useful information with searches. Typically it's Btrfs alone vs
>>>>> ext4/XFS + overlayfs.
>>>>>
>>>>> ?
> We've been running Btrfs with Docker at appreciable scale for a few
> months now (100-200k containers  / day ). We originally looked at the
> Overlay FS route, but it turns out that one of the downsides the
> shared page cache is it breaks cgroup accounting. If you want to
> properly allow people to ensure their container never touches disk, it
> may get complicated.
>>>>
>>>>
>>>> Is there a reproducer for problems with large number of btrfs
>>>> snapshots?
>>>
>>>
>>> No benchmarking comparison but it's known that deletion of snapshots
>>> gets more expensive when there are many snapshots due to backref
>>> search and metadata updates. I have no idea how it compares to
>>> overlayfs. But then also some use cases I guess it's non-trivial
>>> benefit to leverage a shared page cache.
> We churn through ~80 containers per instance (over a day or so), and
> each container's image has 20 layers. The deletion if very expensive,

Well, the subvolume deletion itself has already been optimized.

Instead of deleting items one by one and triggering tree re-balance (not 
btrfs balance) everytime, it skips tree re-balance and delete leaf by 
leaf, which speeds up the whole thing.

> and it would be nice to be able to throttle it, but ~100GB subvolumes
> (on SSD) with 10000+ files are typically removed in <5s. Qgroups turn

Thanks for mentioning the underlying storage.
SSD makes FUA overhead smaller, so with SSD the metadata CoW is less 
obvious.

Anyway such benefit is not obvious when concurrency rise.

> out to have a lot of overhead here -- even with a single level.  At
> least in our testing, even with qgroups, there's lower latency for I/O
> and metadata during build jobs (Java or C compilation) as compared to
> OverlayFS on BtrFS or AUFS on ZFS (on Linux). Without qgroups, it's

I didn't realize overlayfs could cause extra latency when doing IO.
This indeed provides some interesting result.

> almost certainly "faster". YMMV though, because we're already paying
> the network storage latency cost.
> 
> We've investigating using the blkio controller to isolate I/O per
> container to avoid I/O stalls, and restrict I/O during snapshot
> cleanup, but that's been unsuccessful.
>>
>>
>> In fact, except balance and quota, I can't see much extra performance impact
>> from backref walk.
>>
>> And if it's not snapshots, but subvolumes, then more subvolumes means
>> smaller subvolume trees, and less race to lock subvolume trees.
>> So, more (evenly distributed) subvolumes should in fact lead to higher
>> performance.
>>
>>>
>>>> Btrfs + overlayfs?  The copy-up coperation in overlayfs can take
>>>> advantage of btrfs's clone, but this benefit applies for xfs, too.
>>>
>>>
>>> Btrfs supports fs shrink, and also multiple device add/remove so it's
>>> pretty nice for managing its storage in the cloud. And also seed
>>> device might have uses. Some of it is doable with LVM but it's much
>>> simpler, faster and safer with Btrfs.
>>
>>
>> Faster? Not really.
>> For metadata operation, btrfs is slower than traditional FSes.
>>
>> Due to metadata CoW, any metadata update will lead to superblock update.
>> Such extra FUA for superblock is specially obvious for fsync heavy load but
>> low concurrency case.
>> Not to mention its default data CoW will lead to metadata CoW, making things
>> even slower.
> Since containers are ephemeral, they really shouldn't fsync. One of
> the biggest (recent) problems has been workloads that use O_SYNC, or
> sync after a large number of operations -- this stalls out all of the
> containers (subvolumes) on the machine because the transaction lock is
> under hold. This, in turn, manifests itself in soft lockups, and
> operational trouble. Our plan to work around it is patch the VFS
> layer, and stub out sync for certain cgroups.

This makes sense.

As all fs, even btrfs, they share one superblock and journal per fs.

So fsync/sync will break the resource share and cause performance drop.

> 
>>
>> And race to lock fs/subvolume trees makes metadata operation even slower,
>> especially for multi-thread IO.
>> Unlike other FSes which use one-tree-one-inode, btrfs uses
>> one-tree-one-subvoume, which makes race much hotter.
>>
>> Extent tree used to have the same problem, but delayed-ref (no matter you
>> like it or not) did reduce race and improved performance.
>>
>> IIRC, some postgresql benchmark shows that XFS/Ext4 with LVM-thin provide
>> much better performance than Btrfs, even ZFS-on-Linux out-performs btrfs.
>>
> At least in our testing, AUFS + ZFS-on-Linux did not have lower
> latency than BtrFS. Stability is decent, bar the occasional soft
> lockup, or hung transaction. One of the experiments that I've been
> wanting to run is a custom graph driver which has XFS images in
> snapshots / subvolumes on BtrFS, and mounts them over loopback -- This
> makes things like limiting threads easier, and short-circuiting sync
> logic per container.

Latency wise, the AUFS/Overlayfs seems to be the problem.

BTW, why not just ZFS-on-Linux? As ZFS also supports snapshot, maybe it 
will has similar latency compared to btrfs.

Thanks,
Qu

> 
>>>
>>> And that's why I'm kinda curious about the combination of Btrfs and
>>> overlayfs. Overlayfs managed by Docker. And Btrfs for simpler and more
>>> flexible storage management.
>>
>> Despite the performance problem, (working) btrfs does provide flex and
>> unified management.
>>
>> So implementing shared page cache in btrfs will eliminate the necessary for
>> overlayfs. :)
>> Just kidding, such support need quite a lot of VFS and MM modification, and
>> I don't know if we will be able to implement it at all.
>>
>> Thanks,
>> Qu
>>
>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Containers, Btrfs vs Btrfs + overlayfs
  2017-07-14  2:52         ` Qu Wenruo
@ 2017-07-14  3:18           ` Chris Murphy
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2017-07-14  3:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Sargun Dhillon, Chris Murphy, bo.li.liu, Btrfs BTRFS

On Thu, Jul 13, 2017 at 8:52 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

>
> Thanks for mentioning the underlying storage.
> SSD makes FUA overhead smaller, so with SSD the metadata CoW is less
> obvious.

Typically there are 2 or 3 superblocks. SSD mount option causes
rotation of superblock updates. I wonder the effect this rotation
would have on HDD performance rather than FUA causing all supers being
updated.



> Latency wise, the AUFS/Overlayfs seems to be the proble
>
> BTW, why not just ZFS-on-Linux? As ZFS also supports snapshot, maybe it will
> has similar latency compared to btrfs.

Maybe. But the volume management it's not as flexible as Btrfs. No
shrink, no device removal, no migration to mixed block device sizes so
consolidation of backing devices in cloud is a bigger hassle if you
want to stay online during migrate. Not a big deal but they're useful
features.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Containers, Btrfs vs Btrfs + overlayfs
  2017-07-14  2:24       ` Sargun Dhillon
  2017-07-14  2:52         ` Qu Wenruo
@ 2017-07-24 20:43         ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2017-07-24 20:43 UTC (permalink / raw)
  To: Sargun Dhillon; +Cc: Qu Wenruo, Chris Murphy, bo.li.liu, Btrfs BTRFS

On Thu, Jul 13, 2017 at 8:24 PM, Sargun Dhillon <sargun@sargun.me> wrote:

> We've been running Btrfs with Docker at appreciable scale for a few
> months now (100-200k containers  / day ).

Is this on a single Btrfs file system? Or is it distributed among
multiple Btrfs file systems?

I'm curious how many containers or more specifically how many
snapshots you've typically accumulated before doing cleanups (deleting
containers and their snapshots).


> Since containers are ephemeral, they really shouldn't fsync. One of
> the biggest (recent) problems has been workloads that use O_SYNC, or
> sync after a large number of operations -- this stalls out all of the
> containers (subvolumes) on the machine because the transaction lock is
> under hold. This, in turn, manifests itself in soft lockups, and
> operational trouble. Our plan to work around it is patch the VFS
> layer, and stub out sync for certain cgroups.

This could even be useful for out of band OS updates. That use case is
a much smaller scale, so it's not such a big problem. But I see heavy
fsyncing for OS/application updates in the RPM world, and it's just
not necessary if the update is happening on its own tree. If there's a
problem with the update, just blow away the partially updated snapshot
and start over. The fsyncing throughout gains nothing but slow downs.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-07-24 20:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-13 20:49 Containers, Btrfs vs Btrfs + overlayfs Chris Murphy
2017-07-13 22:32 ` Liu Bo
2017-07-13 23:26   ` Chris Murphy
2017-07-14  2:01     ` Qu Wenruo
2017-07-14  2:24       ` Sargun Dhillon
2017-07-14  2:52         ` Qu Wenruo
2017-07-14  3:18           ` Chris Murphy
2017-07-24 20:43         ` Chris Murphy
2017-07-14  2:33       ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.