Re: File system robustness

linux-embedded.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: File system robustness
       [not found] <20230717075035.GA9549@tomerius.de>
@ 2023-07-17  9:08 ` Geert Uytterhoeven
       [not found] ` <CAG4Y6eTU=WsTaSowjkKT-snuvZwqWqnH3cdgGoCkToH02qEkgg@mail.gmail.com>
  1 sibling, 0 replies; 12+ messages in thread
From: Geert Uytterhoeven @ 2023-07-17  9:08 UTC (permalink / raw)
  To: Kai Tomerius; +Cc: linux-embedded, Ext4 Developers List, dm-devel

CC linux-ext4, dm-devel

On Mon, Jul 17, 2023 at 10:13â€¯AM Kai Tomerius <kai@tomerius.de> wrote:
>
> Hi,
>
> let's suppose an embedded system with a read-only squashfs root file
> system, and a writable ext4 data partition with data=journal.
> Furthermore, the data partition shall be protected with dm-integrity.
>
> Normally, I'd umount the data partition while shutting down the
> system. There might be cases though where power is cut. In such a
> case, there'll be ext4 recoveries, which is ok.
>
> How robust would such a setup be? Are there chances that the ext4
> requires a fsck? What might happen if fsck is not run, ever? Is there
> a chance that the data partition can't be mounted at all? How often
> might that happen?
>
> Thx
> regards
> Kai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: File system robustness
       [not found]   ` <20230718053017.GB6042@tomerius.de>
@ 2023-07-18 12:56     ` Alan C. Assis
       [not found]     ` <CAEYzJUGC8Yj1dQGsLADT+pB-mkac0TAC-typAORtX7SQ1kVt+g@mail.gmail.com>
  1 sibling, 0 replies; 12+ messages in thread
From: Alan C. Assis @ 2023-07-18 12:56 UTC (permalink / raw)
  To: Kai Tomerius; +Cc: linux-embedded, Ext4 Developers List, dm-devel

Hi Kai,

I never used that, but please take a look at F2FS too.

BR,

Alan

On 7/18/23, Kai Tomerius <kai@tomerius.de> wrote:
> Hi Alan,
>
> thx a lot.
>
> I should have mentioned that I'll have a large NAND flash, so ext4
> might still be the file system of choice. The other ones you mentioned
> are interesting to consider, but seem to be more fitting for a smaller
> NOR flash.
>
> Regards
> Kai
>
>
>
> On Mon, Jul 17, 2023 at 10:50:50AM -0300, Alan C. Assis wrote:
>> Hi Kai,
>>
>> On 7/17/23, Kai Tomerius <kai@tomerius.de> wrote:
>> > Hi,
>> >
>> > let's suppose an embedded system with a read-only squashfs root file
>> > system, and a writable ext4 data partition with data=journal.
>> > Furthermore, the data partition shall be protected with dm-integrity.
>> >
>> > Normally, I'd umount the data partition while shutting down the
>> > system. There might be cases though where power is cut. In such a
>> > case, there'll be ext4 recoveries, which is ok.
>> >
>> > How robust would such a setup be? Are there chances that the ext4
>> > requires a fsck? What might happen if fsck is not run, ever? Is there
>> > a chance that the data partition can't be mounted at all? How often
>> > might that happen?
>> >
>>
>> Please take a look at this document:
>>
>> https://elinux.org/images/0/02/Filesystem_Considerations_for_Embedded_Devices.pdf
>>
>> In general EXT4 is fine, but it has some limitation, more info here:
>> https://opensource.com/article/18/4/ext4-filesystem
>>
>> I think Linux users suffer from the same problem we have with NuttX (a
>> Linux-like RTOS): which FS to use?
>>
>> So for deep embedded systems running NuttX I follow this logic:
>>
>> I need better performance and wear leveling, but I don't need to worry
>> about power loss: I choose SmartFS
>>
>> I need good performance, wear leveling and some power loss protection:
>> SPIFFS
>>
>> I need good performance, wear leveling and good protection for
>> frequent power loss: LittleFS
>>
>> In a NuttShell: There is no FS that 100% meets all user needs, select
>> the FS that meets your core needs and do lots of field testing to
>> confirm it works as expected.
>>
>> BR,
>>
>> Alan
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: File system robustness
       [not found]     ` <CAEYzJUGC8Yj1dQGsLADT+pB-mkac0TAC-typAORtX7SQ1kVt+g@mail.gmail.com>
@ 2023-07-18 13:04       ` Alan C. Assis
  2023-07-18 14:47         ` Chris
  2023-07-18 21:32         ` Theodore Ts'o
  0 siblings, 2 replies; 12+ messages in thread
From: Alan C. Assis @ 2023-07-18 13:04 UTC (permalink / raw)
  To: Bjørn Forsman
  Cc: Kai Tomerius, linux-embedded, Ext4 Developers List, dm-devel

Hi Bjørn,

On 7/18/23, Bjørn Forsman <bjorn.forsman@gmail.com> wrote:
> On Tue, 18 Jul 2023 at 08:03, Kai Tomerius <kai@tomerius.de> wrote:
>> I should have mentioned that I'll have a large NAND flash, so ext4
>> might still be the file system of choice. The other ones you mentioned
>> are interesting to consider, but seem to be more fitting for a smaller
>> NOR flash.
>
> If you mean raw NAND flash I would think UBIFS is still the way to go?
> (It's been several years since I was into embedded Linux systems.)
>
> https://elinux.org/images/0/02/Filesystem_Considerations_for_Embedded_Devices.pdf
> is focused on eMMC/SD Cards, which have built-in controllers that
> enable them to present a block device interface, which is very unlike
> what raw NAND devices have.
>
> Please see https://www.kernel.org/doc/html/latest/filesystems/ubifs.html
> for more info.
>

You are right, for NAND there is an old (but gold) presentation here:

https://elinux.org/images/7/7e/ELC2009-FlashFS-Toshiba.pdf

UBIFS and YAFFS2 are the way to go.

But please note that YAFFS2 needs license payment for commercial
application (something that I only discovered recently when Xiaomi
integrated it into NuttX mainline, bad surprise).

BR,

Alan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: File system robustness
  2023-07-18 13:04       ` Alan C. Assis
@ 2023-07-18 14:47         ` Chris
  2023-07-18 21:32         ` Theodore Ts'o
  1 sibling, 0 replies; 12+ messages in thread
From: Chris @ 2023-07-18 14:47 UTC (permalink / raw)
  To: Alan C. Assis, Bjørn Forsman
  Cc: dm-devel, Ext4 Developers List, linux-embedded, Kai Tomerius


[-- Attachment #1.1: Type: text/plain, Size: 1634 bytes --]

Hi Bjørn,

If I may summarize, for Linux with raw NAND flash, your main option is UBIFS. You can also use UBI + squashfs if you really want to save space

For Linux with managed flash (e.g. eMMC or UFS), most people go with EXT4 or F2FS

HTH,
Chris

On 18 July 2023 14:04:55 BST, "Alan C. Assis" <acassis@gmail.com> wrote:
>Hi Bjørn,
>
>On 7/18/23, Bjørn Forsman <bjorn.forsman@gmail.com> wrote:
>> On Tue, 18 Jul 2023 at 08:03, Kai Tomerius <kai@tomerius.de> wrote:
>>> I should have mentioned that I'll have a large NAND flash, so ext4
>>> might still be the file system of choice. The other ones you mentioned
>>> are interesting to consider, but seem to be more fitting for a smaller
>>> NOR flash.
>>
>> If you mean raw NAND flash I would think UBIFS is still the way to go?
>> (It's been several years since I was into embedded Linux systems.)
>>
>> https://elinux.org/images/0/02/Filesystem_Considerations_for_Embedded_Devices.pdf
>> is focused on eMMC/SD Cards, which have built-in controllers that
>> enable them to present a block device interface, which is very unlike
>> what raw NAND devices have.
>>
>> Please see https://www.kernel.org/doc/html/latest/filesystems/ubifs.html
>> for more info.
>>
>
>You are right, for NAND there is an old (but gold) presentation here:
>
>https://elinux.org/images/7/7e/ELC2009-FlashFS-Toshiba.pdf
>
>UBIFS and YAFFS2 are the way to go.
>
>But please note that YAFFS2 needs license payment for commercial
>application (something that I only discovered recently when Xiaomi
>integrated it into NuttX mainline, bad surprise).
>
>BR,
>
>Alan

[-- Attachment #1.2: Type: text/html, Size: 2577 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: File system robustness
  2023-07-18 13:04       ` Alan C. Assis
  2023-07-18 14:47         ` Chris
@ 2023-07-18 21:32         ` Theodore Ts'o
  2023-07-19  6:22           ` Martin Steigerwald
  2023-07-19 10:51           ` File system robustness Kai Tomerius
  1 sibling, 2 replies; 12+ messages in thread
From: Theodore Ts'o @ 2023-07-18 21:32 UTC (permalink / raw)
  To: Alan C. Assis
  Cc: Bjørn Forsman, Kai Tomerius, linux-embedded,
	Ext4 Developers List, dm-devel

On Tue, Jul 18, 2023 at 10:04:55AM -0300, Alan C. Assis wrote:
> 
> You are right, for NAND there is an old (but gold) presentation here:
> 
> https://elinux.org/images/7/7e/ELC2009-FlashFS-Toshiba.pdf
> 
> UBIFS and YAFFS2 are the way to go.

This presentation is specifically talking about flash devices that do
not have a flash translation layer (that is, they are using the MTD
interface).

There are multiple kinds of flash devices, that can be exported via
different interfaces: MTD, USB Storage, eMMC, UFS, SATA, SCSI, NVMe,
etc.  There are also differences in terms of the sophistication of the
Flash Translation Layer in terms of how powerful is the
microcontroller, how much memory and persistant storage for flash
metadata is available to the FTL, etc.

F2FS is a good choice for "low end flash", especially those flash
devices that use a very simplistic mapping between LBA (block/sector
numbers) and the physical flash to be used, and may have a very
limited number of flash blocks that can be open for modification at a
time.  For more sophiscated flash storage devices (e.g., SSD's and
higher end flash devices), this consideration won't matter, and then
the best file system to use will be very dependant on your workload.

In answer to Kai's original question, the setup that was described
should be fine --- assuming high quality hardware.  There are some
flash devices that designed to handle power failures correctly; which
is to say, if power is cut suddenly, the data used by the Flash
Translation Layer can be corrupted, in which case data written months
or years ago (not just recent data) could be lost.  There have been
horror stories about wedding photographers who dropped their camera,
and the SD Card came shooting out, and *all* of the data that was shot
on some couple's special day was completely *gone*.

Assuming that you have valid, power drop safe hardware, running fsck
after a power cut is not necessary, at least as far as file system
consistency is concerned.  If you have badly written userspace
application code, then all bets can be off.  For example, consider the
following sequence of events:

1)  An application like Tuxracer truncates the top-ten score file
2)  It then writes a new top-ten score file
3)  <Fail to call fsync, or write the file to a foo.new and then
       rename on top of the old version of the file>
4)  Ut then closes the Open GL library, triggering a bug in the cruddy
    proprietary binary-only kernel module video driver,
    leading to an immediate system crash.
5)  Complain to the file system developers that users' top-ten score
    file was lost, and what are the file system developers going to
    do about it?
6)  File system developers start creating T-shirts saying that what userspace
    applications really are asking for is a new open(2) flag, O_PONIES[1]

[1] https://blahg.josefsipek.net/?p=364

So when you talk about overall system robustness, you need robust
hardware, you need a robust file aystem, you need to use the file
system correctly, and you have robust userspace applications.

If you get it all right, you'll be fine.  On the other hand, if you
have crappy hardware (such as might be found for cheap in the checkout
counter of the local Micro Center, or in a back alley vendor in
Shenzhen, China), or if you do something like misconfigure the file
system such as using the "nobarrier" mount option "to speed things
up", or if you have applications that update files in an unsafe
manner, then you will have problems.

Welcome to systems engineering.  :-)

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: File system robustness
  2023-07-18 21:32         ` Theodore Ts'o
@ 2023-07-19  6:22           ` Martin Steigerwald
  2023-07-20  4:20             ` Theodore Ts'o
  2023-07-19 10:51           ` File system robustness Kai Tomerius
  1 sibling, 1 reply; 12+ messages in thread
From: Martin Steigerwald @ 2023-07-19  6:22 UTC (permalink / raw)
  To: Alan C. Assis, Theodore Ts'o
  Cc: Bjørn Forsman, Kai Tomerius, linux-embedded,
	Ext4 Developers List, dm-devel

Theodore Ts'o - 18.07.23, 23:32:12 CEST:
> If you get it all right, you'll be fine.  On the other hand, if you
> have crappy hardware (such as might be found for cheap in the checkout
> counter of the local Micro Center, or in a back alley vendor in
> Shenzhen, China), or if you do something like misconfigure the file
> system such as using the "nobarrier" mount option "to speed things
> up", or if you have applications that update files in an unsafe
> manner, then you will have problems.

Is "nobarrier" mount option still a thing? I thought those mount options 
have been deprecated or even removed with the introduction of cache flush 
handling in kernel 2.6.37?

Hmm, the mount option has been removed from XFS in in kernel 4.19 
according to manpage, however no mention of any deprecation or removal 
in ext4 manpage. It also does not seem to be removed in BTRFS at least 
according to manpage btrfs(5).

-- 
Martin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: File system robustness
  2023-07-18 21:32         ` Theodore Ts'o
  2023-07-19  6:22           ` Martin Steigerwald
@ 2023-07-19 10:51           ` Kai Tomerius
  2023-07-20  4:41             ` Theodore Ts'o
  1 sibling, 1 reply; 12+ messages in thread
From: Kai Tomerius @ 2023-07-19 10:51 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Alan C. Assis, Bjørn Forsman, linux-embedded,
	Ext4 Developers List, dm-devel

> In answer to Kai's original question, the setup that was described
> should be fine --- assuming high quality hardware.

I wonder how to judge that ... it's an eMMC supposedly complying to
some JEDEC standard, so it *should* be ok.

> ... if power is cut suddenly, the data used by the Flash
> Translation Layer can be corrupted, in which case data written months
> or years ago (not just recent data) could be lost.

At least I haven't observed anything like that up to now.

But on another aspect: how about the interaction between dm-integrity
and ext4? Sure, they each have their own journal, and they're
independent layers. Is there anything that could go wrong, say a block
that can't be recovered in the dm-integrity layer, causing ext4 to run
into trouble, e.g., an I/O error that prevents ext4 from mounting?

I assume tne answer is "No", but can I be sure?

Thx
regards
Kai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: File system robustness
  2023-07-19  6:22           ` Martin Steigerwald
@ 2023-07-20  4:20             ` Theodore Ts'o
  2023-07-20  7:55               ` Nobarrier mount option (was: Re: File system robustness) Martin Steigerwald
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2023-07-20  4:20 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Alan C. Assis, Bjørn Forsman, Kai Tomerius, linux-embedded,
	Ext4 Developers List, dm-devel

On Wed, Jul 19, 2023 at 08:22:43AM +0200, Martin Steigerwald wrote:
> 
> Is "nobarrier" mount option still a thing? I thought those mount options 
> have been deprecated or even removed with the introduction of cache flush 
> handling in kernel 2.6.37?

Yes, it's a thing, and if your server has a UPS with a reliable power
failure / low battery feedback, it's *possible* to engineer a reliable
system.  Or, for example, if you have a phone with an integrated
battery, so when you drop it the battery compartment won't open and
the battery won't go flying out, *and* the baseboard management
controller (BMC) will halt the CPU before the battery complete dies,
and gives a chance for the flash storage device to commit everything
before shutdown, *and* the BMC arranges to make sure the same thing
happens when the user pushes and holds the power button for 30
seconds, then it could be safe.

We also use nobarrier for a scratch file systems which by definition
go away when the borg/kubernetes job dies, and which will *never*
survive a reboot, let alone a power failure.  In such a situation,
there's no point sending the cache flush, because the partition will
be mkfs'ed on reboot.  Or, in if the iSCSI or Cloud Persistent Disk
will *always* go away when the VM dies, because any persistent state
is saved to some cluster or distributed file store (e.g., to the MySQL
server, or Big Table, or Spanner, etc.  In these cases, you don't
*want* the Cache Flush operation, since skipping it reduce I/O
overhead.

So if you know what you are doing, in certain specialized use cases,
nobarrier can make sense, and it is used today at my $WORK's data
center for production jobs *all* the time.  So we won't be making
ext4's nobarrier mount option go away; it has users.  :-)

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: File system robustness
  2023-07-19 10:51           ` File system robustness Kai Tomerius
@ 2023-07-20  4:41             ` Theodore Ts'o
  0 siblings, 0 replies; 12+ messages in thread
From: Theodore Ts'o @ 2023-07-20  4:41 UTC (permalink / raw)
  To: Kai Tomerius
  Cc: Alan C. Assis, Bjørn Forsman, linux-embedded,
	Ext4 Developers List, dm-devel

On Wed, Jul 19, 2023 at 12:51:39PM +0200, Kai Tomerius wrote:
> > In answer to Kai's original question, the setup that was described
> > should be fine --- assuming high quality hardware.
> 
> I wonder how to judge that ... it's an eMMC supposedly complying to
> some JEDEC standard, so it *should* be ok.

JEDEC promulgates the eMMC interface specification.  That's the
interface used to talk to the device, much like SATA and SCSI and
NVMe.  The JEDEC eMMC specification says nothing about the quality of
the implementation of the FTL, or whether it is safe from power drops,
or how many wirte cycles are supported before the eMMC soldered on the
$2000 MCU would expire.

If you're a cell phone manufacturer, the way you judge it is *before*
you buy a few million of the eMMC devices, you subject the samples to
a huge amount of power drops and other torture tests (including
verifying the claimed number of write cycles in spec sheet), before
the device is qualified for use in your product.

> But on another aspect: how about the interaction between dm-integrity
> and ext4? Sure, they each have their own journal, and they're
> independent layers. Is there anything that could go wrong, say a block
> that can't be recovered in the dm-integrity layer, causing ext4 to run
> into trouble, e.g., an I/O error that prevents ext4 from mounting?
> 
> I assume tne answer is "No", but can I be sure?

If there are I/O errors, with or without dm-integrity, you can have
problems.  dm-integrity will turn bit-flips into hard I/O errors, but
a bit-flip might cause silent file system cocrruption (at least at
first), such that when you finally notice that there's a problem,
several days or weeks or months may have passed, the data loss might
be far worse.  So turning an innocous bit flip into a hard I/O error
can be a feature, assuming that you've allowed for it in your system
architecture.

If you assume that the hardware doesn't introduce I/O errors or bit
flips, and if you assume you don't have any attackers trying to
corrupt the block device with bit flips, then sure, nothing will go
wrong.  You can buy perfect hardware from the same supply store where
high school physics teachers buy frictionless pulleys and massless
ropes.  :-)

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Nobarrier mount option (was: Re: File system robustness)
  2023-07-20  4:20             ` Theodore Ts'o
@ 2023-07-20  7:55               ` Martin Steigerwald
  2023-07-21 13:35                 ` Theodore Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Steigerwald @ 2023-07-20  7:55 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Alan C. Assis, Bjørn Forsman, Kai Tomerius, linux-embedded,
	Ext4 Developers List, dm-devel

Theodore Ts'o - 20.07.23, 06:20:34 CEST:
> On Wed, Jul 19, 2023 at 08:22:43AM +0200, Martin Steigerwald wrote:
> > Is "nobarrier" mount option still a thing? I thought those mount
> > options have been deprecated or even removed with the introduction
> > of cache flush handling in kernel 2.6.37?
> 
> Yes, it's a thing, and if your server has a UPS with a reliable power
> failure / low battery feedback, it's *possible* to engineer a reliable
> system.  Or, for example, if you have a phone with an integrated
> battery, so when you drop it the battery compartment won't open and
> the battery won't go flying out, *and* the baseboard management
> controller (BMC) will halt the CPU before the battery complete dies,
> and gives a chance for the flash storage device to commit everything
> before shutdown, *and* the BMC arranges to make sure the same thing
> happens when the user pushes and holds the power button for 30
> seconds, then it could be safe.

Thanks for clarification. I am aware that something like this can be 
done. But I did not think that is would be necessary to explicitly 
disable barriers, or should I more accurately write cache flushes, in 
such a case:

I thought that nowadays a cache flush would be (almost) a no-op in the 
case the storage receiving it is backed by such reliability measures. 
I.e. that the hardware just says "I am ready" when having the I/O 
request in stable storage whatever that would be, even in case that 
would be battery backed NVRAM and/or temporary flash.

At least that is what I thought was the background for not doing the 
"nobarrier" thing anymore: Let the storage below decide whether it is 
safe to basically ignore cache flushes by answering them (almost) 
immediately.

However, not sending the cache flushes in the first place would likely 
still be more efficient although as far as I am aware block layer does not 
return back a success / failure information to the upper layers anymore 
since kernel 2.6.37.

Seems I got to update my Linux Performance tuning slides about this once 
again.

> We also use nobarrier for a scratch file systems which by definition
> go away when the borg/kubernetes job dies, and which will *never*
> survive a reboot, let alone a power failure.  In such a situation,
> there's no point sending the cache flush, because the partition will
> be mkfs'ed on reboot.  Or, in if the iSCSI or Cloud Persistent Disk
> will *always* go away when the VM dies, because any persistent state
> is saved to some cluster or distributed file store (e.g., to the MySQL
> server, or Big Table, or Spanner, etc.  In these cases, you don't
> *want* the Cache Flush operation, since skipping it reduce I/O
> overhead.

Hmm, right.

> So if you know what you are doing, in certain specialized use cases,
> nobarrier can make sense, and it is used today at my $WORK's data
> center for production jobs *all* the time.  So we won't be making
> ext4's nobarrier mount option go away; it has users.  :-)

I now wonder why XFS people deprecated and even removed those mount 
options. But maybe I better ask them separately instead of adding their 
list in CC. Probably by forwarding this mail to the XFS mailing list 
later on.

Best,
-- 
Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nobarrier mount option (was: Re: File system robustness)
  2023-07-20  7:55               ` Nobarrier mount option (was: Re: File system robustness) Martin Steigerwald
@ 2023-07-21 13:35                 ` Theodore Ts'o
  2023-07-21 14:51                   ` Martin Steigerwald
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2023-07-21 13:35 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Alan C. Assis, Bjørn Forsman, Kai Tomerius, linux-embedded,
	Ext4 Developers List, dm-devel

On Thu, Jul 20, 2023 at 09:55:22AM +0200, Martin Steigerwald wrote:
> 
> I thought that nowadays a cache flush would be (almost) a no-op in the 
> case the storage receiving it is backed by such reliability measures. 
> I.e. that the hardware just says "I am ready" when having the I/O 
> request in stable storage whatever that would be, even in case that 
> would be battery backed NVRAM and/or temporary flash.

That *can* be true if the storage subsystem has the reliability
measures.  For example, if have a $$$ EMC storage array, then sure, it
has an internal UPS backup and it will know that it can ignore that
CACHE FLUSH request.

However, if you have *building* a storage system, the storage device
might be a HDD who has no idea that that it doesn't need to worry
about power drops.  Consider if you will, a rack of servers, each with
a dozen or more HDD's.  There is a rack-level battery backup, and the
rack is located in a data center with diesel generators with enough
fuel supply to keep the entire data center, plus cooling, going for
days.  The rack of servers is part of a cluster file system.  So when
a file write to a cluster file system is performed, the cluster file
system will pick three servers, each in a different rack, and each
rack is in a different power distribution domain.  That way, even the
entry-level switch on the rack dies, or the Power Distribution Unit
(PDU) servicing a group of racks blows up, the data will be available
on the other two servers.

> At least that is what I thought was the background for not doing the 
> "nobarrier" thing anymore: Let the storage below decide whether it is 
> safe to basically ignore cache flushes by answering them (almost) 
> immediately.

The problem is that the storage below (e.g., the HDD) has no idea that
all of this redundancy exists.  Only the system adminsitrator who is
configuring the file sysetm will know.  And if you are runninig a
hyper-scale cloud system, this kind of custom made system will be
much, MUCH, cheaper than buying a huge number of $$$ EMC storage
arrays.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nobarrier mount option (was: Re: File system robustness)
  2023-07-21 13:35                 ` Theodore Ts'o
@ 2023-07-21 14:51                   ` Martin Steigerwald
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Steigerwald @ 2023-07-21 14:51 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Alan C. Assis, Bjørn Forsman, Kai Tomerius, linux-embedded,
	Ext4 Developers List, dm-devel

Theodore Ts'o - 21.07.23, 15:35:26 CEST:
> > At least that is what I thought was the background for not doing the
> > "nobarrier" thing anymore: Let the storage below decide whether it
> > is safe to basically ignore cache flushes by answering them (almost)
> > immediately.
> 
> The problem is that the storage below (e.g., the HDD) has no idea that
> all of this redundancy exists.  Only the system adminsitrator who is
> configuring the file sysetm will know.  And if you are runninig a
> hyper-scale cloud system, this kind of custom made system will be
> much, MUCH, cheaper than buying a huge number of $$$ EMC storage
> arrays.

Okay, that is reasonable.

Thanks for explaining.

-- 
Martin



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-07-21 14:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230717075035.GA9549@tomerius.de>
2023-07-17  9:08 ` File system robustness Geert Uytterhoeven
     [not found] ` <CAG4Y6eTU=WsTaSowjkKT-snuvZwqWqnH3cdgGoCkToH02qEkgg@mail.gmail.com>
     [not found]   ` <20230718053017.GB6042@tomerius.de>
2023-07-18 12:56     ` Alan C. Assis
     [not found]     ` <CAEYzJUGC8Yj1dQGsLADT+pB-mkac0TAC-typAORtX7SQ1kVt+g@mail.gmail.com>
2023-07-18 13:04       ` Alan C. Assis
2023-07-18 14:47         ` Chris
2023-07-18 21:32         ` Theodore Ts'o
2023-07-19  6:22           ` Martin Steigerwald
2023-07-20  4:20             ` Theodore Ts'o
2023-07-20  7:55               ` Nobarrier mount option (was: Re: File system robustness) Martin Steigerwald
2023-07-21 13:35                 ` Theodore Ts'o
2023-07-21 14:51                   ` Martin Steigerwald
2023-07-19 10:51           ` File system robustness Kai Tomerius
2023-07-20  4:41             ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).