All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] about the lying nature of thin
@ 2016-04-28 22:37 Xen
  2016-04-29  8:44 ` Marek Podmaka
  2016-05-10 21:47 ` [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin) Linda A. Walsh
  0 siblings, 2 replies; 11+ messages in thread
From: Xen @ 2016-04-28 22:37 UTC (permalink / raw)
  To: Linux lvm

You know mr. Patton made the interesting allusion that thin provisioning 
is designed to lie and is meant to lie, and I beg to differ.

Under normal operating conditions any thin volume should be allowed to 
grow to its maximum V-size provided not everyone is doing that at the 
same time.


There is nowhere in the thin contract something that says "this space I 
have available to you, you don't have to share it".

That is like saying the basement container room used as bike and motor 
space in my apartment complex is a like because if I were to fill it up, 
other people couldn't use it again anymore.

The visuals clearly indicate available physical space, but I know that 
if I use it, others won't be able to. It's called sharing.

In practical matters a thin volume only starts to lie when "real space" 
< "virtual space" -- a condition you are normally trying to avoid.

So I would not even say that by definition a thin volume or thin volume 
manager lies.

It only starts "lying" the moment real available space goes below 
virtual available space, something you would normally be trying to 
avoid.

Since your guarantee to your customers (for instance) is that this space 
IS going to be available, you're actually lying to them by not informing 
them of the condition that this guarantee can not actually be met in 
some instance of time.

Thin pools do not lie by default. They lie when they cannot fulfill 
their obligations, and this is precisely the reason for the idea I 
suggested: to stop the lie, to be honest.

It was said (by Marek Podmaka) that you don't want customers / users to 
know about the reality behind the thin pool, in some or many use cases 
(liberally interpreted). That there are use cases where you don't want 
the client to know about the thin nature.

But if you don't do your job right and the thin pool does start to fill 
up, that starts to sound like lying to your client and saying 
"everything is all right" while behind the scenes everyone is in 
calamity mode.

"Is something wrong? No no, not@all".

You're usually aware that you're being lied to ;-) if you are talking to 
a real human.

So basically:
* either you do your job right and nothing is the matter
* you don't do your job right but you don't tell anyone
* you don't do your job right and you own up.

Saying that thin pools habitually lie is not right. The question is not 
what happens or what you do while the system is functioning as intended. 
The question is what you do when that is no longer the case:

* do you inform the guest system?
* do you keep silent until shit breaks loose?

IF you had an autoextend mechanism present you could also equally well 
decide to not "inform" clients as long as that was the case. After all, 
if you have automatic extending configured and it is operational, then 
the "real size" is actually larger than what you currently have.

In that case "virtual size < real size" does not hold or does not 
happen, and there is no need to communicate anything. This is also a 
question about ethics, perhaps.

Personally I like to be informed. I don't know what you do or want.

But I can think of any number of analogies or life situations where I 
would definitely choose to be informed instead of being lied to.

Thin LVM does not lie by default. It may only start to lie when 
conditions are no longer met.

Regards, Xen.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-28 22:37 [linux-lvm] about the lying nature of thin Xen
@ 2016-04-29  8:44 ` Marek Podmaka
  2016-04-29 10:06   ` Gionatan Danti
                     ` (2 more replies)
  2016-05-10 21:47 ` [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin) Linda A. Walsh
  1 sibling, 3 replies; 11+ messages in thread
From: Marek Podmaka @ 2016-04-29  8:44 UTC (permalink / raw)
  To: Xen; +Cc: Linux lvm

Hello Xen,

Friday, April 29, 2016, 0:37:23, you wrote:

> In practical matters a thin volume only starts to lie when "real space"
> < "virtual space" -- a condition you are normally trying to avoid.

> Thin pools do not lie by default. They lie when they cannot fulfill
> their obligations, and this is precisely the reason for the idea I 
> suggested: to stop the lie, to be honest.

I would say that thin provisioning is designed to lie about the
available space. This is what it was invented for. As long as the used
space (not virtual space) is not greater then real space, everything
is ok. Your analogy with customers still applies and whole IT business
is based on it (over-provisioning home internet connection speed,
"guaranteed" webhosting disk space). It seems to me that disk space
was the last thing to get over- (or thin-) provisioned :)

Now I'm not sure what your use-case for thin pools is.

I don't see it much useful if the presented space is smaller than
available physical space. In that case I can just use plain LVM with
PV/VG/LV. For snaphosts you don't care much as if the snapshot
overfills, it just becomes invalid, but won't influence the original
LV.

But their use case is to simplify the complexity of adding storage.
Traditionally you need to add new physical disks to the storage /
server, add it to LVM as new PV, add this PV to VG, extend LV and
finally extend filesystem. Usually the storage part and server (LVM)
part is done by different people / teams. By using thinp, you create
big enough VG, LV and filesystem. Then as it is needed you just add
physical disks and you're done.

Another benefit is disk space saving. Traditionally you need to have
some reserve as free space in each filesystem for growth. With many
filesystems you just wasted a lot of space. With thinp, this free
space is "shared".

And regarding your other mail about presenting parts / chunks of
blocks from block layer... This is what device mapper (and LVM built
on top of it) does - it takes many parts of many block devices and
creates new linear block device out of them (whether it is stripped
LV, mirrored LV, dm-crypt or just concatenation of 2 disks).

-- 
  bYE, Marki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29  8:44 ` Marek Podmaka
@ 2016-04-29 10:06   ` Gionatan Danti
  2016-04-29 13:16     ` Xen
  2016-04-29 11:53   ` Xen
  2016-04-29 20:37   ` Chris Friesen
  2 siblings, 1 reply; 11+ messages in thread
From: Gionatan Danti @ 2016-04-29 10:06 UTC (permalink / raw)
  To: Marek Podmaka, LVM general discussion and development, Xen



On 29/04/2016 10:44, Marek Podmaka wrote:
> Hello Xen,
> Now I'm not sure what your use-case for thin pools is.
>
> I don't see it much useful if the presented space is smaller than
> available physical space. In that case I can just use plain LVM with
> PV/VG/LV. For snaphosts you don't care much as if the snapshot
> overfills, it just becomes invalid, but won't influence the original
> LV.
>

Let me add one important use case: have fast, flexible snapshots.

In the past I used classic LVM to build our virtualization servers, but 
this means I was basically forced to use a separate volume for each VM: 
using a single big volume and filesystem for all the VMs means that, 
while snapshotting it for backup purpose, I/O become VERY slow on ALL 
virtual machines.

On the other hand, thin pools provide much faster snapshots. On the 
latest builds, I begin using a single large thin volume, on top of a 
single large thin pool, to host a single filesystem that can be 
snapshotted with no big slowdown on the I/O part.

I understand that it is a tradeoff - classic LVM mostly provides 
contiguous blocks, so fragmentation remain quite low, while thin 
pools/volumes are much more prone to fragament, but with large enough 
chunks it is not such a big problem.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29  8:44 ` Marek Podmaka
  2016-04-29 10:06   ` Gionatan Danti
@ 2016-04-29 11:53   ` Xen
  2016-04-29 20:37   ` Chris Friesen
  2 siblings, 0 replies; 11+ messages in thread
From: Xen @ 2016-04-29 11:53 UTC (permalink / raw)
  To: Linux lvm

Marek Podmaka schreef op 29-04-2016 10:44:

> I would say that thin provisioning is designed to lie about the
> available space. This is what it was invented for. As long as the used
> space (not virtual space) is not greater then real space, everything
> is ok. Your analogy with customers still applies and whole IT business
> is based on it (over-provisioning home internet connection speed,
> "guaranteed" webhosting disk space). It seems to me that disk space
> was the last thing to get over- (or thin-) provisioned :)

But you see if my landlord tells me I can use the entire container room, 
except that I have to share it with others, does he lie?

I *can* use the entire container room. I just have to ensure it is empty 
again by the end of the day (or even sooner).

Those ISPs do not say "Every client can use the full bandwidth all at 
the same time." They don't say that. They say "Fair use policies apply". 
That's what they say. And they mean that no, you can't do that stuff 
24/7/365.

So let's talk then about two things you can lie about:
* available space
* the thought that all of the space is available to everyone at all 
times.

In a normal use case, only the latter would be a lie. But that's not 
what companies tell their clients. Maybe implicitly, at times. But not 
explicitly at all (hence fair use policy).

The former is not a lie. If you have a 1000 customers, and each has 50GB 
available total, and the average use at this point is 25GB, and you have 
provisioned for ~35GB each, meaning 35000 GB is available and 25000 is 
in use, then it is not a lie to say to any individual customer: you can 
use 50GB if you want.

The guarantee that everyone can do it all at the same time, just doesn't 
hold, but that is never communicated.

As a customer you are not aware of how many other clients there are, or 
how many other thin volumes (ordinarily) or what the max capacity is 
across all the volumes. So you are not being lied to.

For it to be a lie, you would have to be concerned about the total 
picture. You would have to have an awareness of other clients and then 
you would need to make the assumption that all of these clients at the 
same time can use all of that bandwidth/data/space.

But your personal scenario doesn't extend that far.

Just as a funny example. Nearby there was a supermarket that advertized 
with that (to my mind) stupid thought "if there are more than 4 
customers in line, and you are the 5th, you get your groceries for 
free".

What did a local student's house do? They went to the supermarket with 
about 20 people and got a lot of stuff free.

I mean in statistics you have queue calculations too but it gets 
defeated if people start doing that stuff (thwarting the mechanism on 
purpose). For example, the traditional statistics example is that of 
customers at a hairsalon. Based on a certain distribution and an average 
number of new arrivals, a conclusion is reached and certain data is 
found.

But this data is thwarted the moment customers on purpose start to pile 
up just to thwart this data, you get what I mean?

Any /intentional/ purpose to thwart the average, means it is no longer 
the average.

Normal people wanting a haircut do not show up at a salon to thwart the 
salons calculations. Ordinary use cases do not apply to this.

If you can expect a command normal amount of use, then there is no 
"intent" with those clients to be doing anything out of the ordinary.

Just like that "hairsalon" can normally depend on those "calculations" 
(you could, you know) and provision for that (number of employees 
present) so too can a thin provisioning setup depend on expected 
averages (in a distribution, the "expected" value of a stochast is the 
expected average) (as a prediction in that sense).

There's no lying in that. If this hairsalon now says "You can get cut 
within 10 minutes without an appointment" then yes people could thwart 
that by suddenly all showing up at the same time.

Doesn't work like that in reality when people do not have such 
intentions.

We call that "innocence" ;-) not doing something on purpose.

That hairsalon is not lying if it guarantees 10 minute wait time in 
general. It just cannot guarantee it if people start to bugger.

Statistics is all about averages and large numbers.

"A "law of large numbers" is one of several theorems expressing the idea 
that as the number of trials of a random process increases, the 
percentage difference between the expected and actual values goes to 
zero."

That means that if you have enough numbers (enough thin volumes) the 
likelihood in actuality between what you promise and what you can 
deliver, the difference goes to zero and in effect you are always 
speaking the truth.

Remember: you are speaking the truth given normal expected reality.
You are no longer speaking the truth if people start to mess with you on 
purpose.

If you have 10.000 clients and 5.000 of them are one person intending to 
bug you out, just like in the supermarket example, well, then you've 
lost. But, that is an intentional devious thing to do just in order to 
make use of some monetary loophole in the system, so to speak.

And in general your terms of use could guard against that (and many 
companies do, I'm sure).


> Now I'm not sure what your use-case for thin pools is.

Presently maximizing space efficiency across a small number of volumes, 
as well as access to superior snapshotting ability.

> I don't see it much useful if the presented space is smaller than
> available physical space. In that case I can just use plain LVM with
> PV/VG/LV. For snaphosts you don't care much as if the snapshot
> overfills, it just becomes invalid, but won't influence the original
> LV.

You mean there'd not be any use for thin, right. I agree. The whole idea 
is to be more efficient with space.

If the presented space is smaller than you HAVE room for those 
snapshots. But with thin, you don't need to care.

Space is always there.


> But their use case is to simplify the complexity of adding storage.
> Traditionally you need to add new physical disks to the storage /
> server, add it to LVM as new PV, add this PV to VG, extend LV and
> finally extend filesystem. Usually the storage part and server (LVM)
> part is done by different people / teams. By using thinp, you create
> big enough VG, LV and filesystem. Then as it is needed you just add
> physical disks and you're done.

True but let's call it "sharing" resources.

Sharing resources is the whole idea of any advanced society.

Our western mindset doesn't work in the sense of everyone needing to be 
able to possess everything.

The example was given that everyone owns a car, that they may not use 
every day, a washing machine, that they may use 5 hours a week, a vacuum 
cleaner, that they may use 1 hour a week, and so on and so on. The 
example was given that a commercial airliner could *never* do something 
like that.

Commercial airplanes are in operation pretty much 24/7. Disuse is way 
too costly. They cannot afford to not use their machines 24/7.

Our society cannot either, but the way we live and operate with each 
other currently ensures vasts amounts of wasted materials, energy and so 
on.

Resource sharing is an advanced concept in that sense. Let's just call 
thin pools an advanced concept :p.

And let's not call it a lie just like that :) :P.

> Another benefit is disk space saving. Traditionally you need to have
> some reserve as free space in each filesystem for growth. With many
> filesystems you just wasted a lot of space. With thinp, this free
> space is "shared".

My reason exactly.

> And regarding your other mail about presenting parts / chunks of
> blocks from block layer... This is what device mapper (and LVM built
> on top of it) does - it takes many parts of many block devices and
> creates new linear block device out of them (whether it is stripped
> LV, mirrored LV, dm-crypt or just concatenation of 2 disks).

I know. But that is the reverse thing.

DM/LVM takes dispersed stuff and presents a whole.

In this case we were talking about presenting holes.

That's because in this case .....

If you are that barber/haircutter and suddenly you get an influx of 
clients you cannot handle.

Are you going to put up a sign saying "sorry, too busy" or are you going 
to try to keep your "promise" to each and every one of them? I hope you 
didn't offer financial compensation in that sense ;-).

Personally I think that as a client you making use of such "financial 
promises" is very intolerant and unforgiving and greedy and even 
avaricious ;-).

So what if your thin pool does fill up and you have no measure in place 
to handle it?

Are you going to be honest?

This question is not whether thin is currently lying. This is about 
whether you will continue to choose for it to lie.

It is not about the present. It is about the choice you are going to 
make.

Do you choose to lie or not?

Traditionally companies have always tried to keep up the pretense until 
all hell broke loose so badly that it spilled out like a tidal wave.

You can find any number of examples in the history of our world. I am 
currently thinking of the Exxon Valdez, and Enron. I don't know if that 
is applicable. Also thinking of that platform in recent times, of BP. 
Deepwater Horizon, which was said to have been deeply undermaintained.

I mean you can keep pretending everything is going just perfect, or you 
can own up a little sooner. That is a choice to make for each individual 
I guess.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29 10:06   ` Gionatan Danti
@ 2016-04-29 13:16     ` Xen
  2016-04-29 22:32       ` Xen
  0 siblings, 1 reply; 11+ messages in thread
From: Xen @ 2016-04-29 13:16 UTC (permalink / raw)
  To: LVM general discussion and development

Gionatan Danti schreef op 29-04-2016 12:06:

> Let me add one important use case: have fast, flexible snapshots.

One more huge reason for using it in a desktop system.

I didn't know about the performance benefits.

I just know that providing snapshot space in advance by registering in 
advance LVs for that purpose, is not a good way of working (for me, or 
anyone).

Although the idea of using LVM thin to provide only a single thin volume 
might be rather odd ;-).

Still, the snapshotting is clearly superior to that of traditional LVM, 
right?

Regards.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29  8:44 ` Marek Podmaka
  2016-04-29 10:06   ` Gionatan Danti
  2016-04-29 11:53   ` Xen
@ 2016-04-29 20:37   ` Chris Friesen
  2 siblings, 0 replies; 11+ messages in thread
From: Chris Friesen @ 2016-04-29 20:37 UTC (permalink / raw)
  To: linux-lvm

On 04/29/2016 03:44 AM, Marek Podmaka wrote:

> Now I'm not sure what your use-case for thin pools is.
>
> I don't see it much useful if the presented space is smaller than
> available physical space. In that case I can just use plain LVM with
> PV/VG/LV. For snaphosts you don't care much as if the snapshot
> overfills, it just becomes invalid, but won't influence the original
> LV.

One useful case for "presented space equal to physical space" with thin volumes 
is that it simplifies security issues.

With raw LVM volumes I generally need to zero out the whole volume prior to 
deleting it (to avoid leaking the contents to other users).  This takes time, 
and also seriously hammers the disks when you have multiple volumes being zeroed 
in parallel.

With thin, deletion is essentially instantaneous, and the zeroing penalty is 
paid when the disk block is actually written. Any disk blocks which have not 
been written are simply read as all-zeros.

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29 13:16     ` Xen
@ 2016-04-29 22:32       ` Xen
  2016-04-30  4:46         ` Mark Mielke
  0 siblings, 1 reply; 11+ messages in thread
From: Xen @ 2016-04-29 22:32 UTC (permalink / raw)
  To: LVM general discussion and development

I guess this Patton guy knows everything about everything, but I'm not 
responding to him anymore.

As he sets up his business empire he leaves us all in the dust anyway.

So I guess I will just keep it to the thing I know something about, 
which is talking to real people.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29 22:32       ` Xen
@ 2016-04-30  4:46         ` Mark Mielke
  2016-05-03 13:03           ` Xen
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Mielke @ 2016-04-30  4:46 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 2700 bytes --]

Lots of interesting ideas in this thread.

But the practical of things is that there is a need for thin volumes that
are over provisioned. Call it a lie if you must, but I want to have
multiple snapshots, and not be forced to have 10X the storage, just so that
I can *guarantee* that I will have the technical capability to fully
allocate every snapshot without running out of space. This is for my
requirements, where I am not being naive or irresponsible. I'm not
representing the situation to myself. I know exactly what to expect, and I
know that it isn't only important to monitor, but it is also important to
understand the usage patterns. For example, in some of our use cases, files
will only normally be extended or created as new, at which point the
overhead of a snapshot is close to zero.

If people find this model unacceptable, then I think they should not use
thin volumes. It's a technology choice.

We have many systems like this beyond LVM... For example, the NetApp FAS
devices we have are set up with this type of model, and IT normally
allocates 10% or more for "snapshots", and when we get this wrong, it does
hurt in various ways, usually requiring that the snapshots get dumped, and
that we figure out why the monitoring failed. Normally, IT adds to the
aggregate as it passes a threshold. In the particular case that is
important for me - we have a fixed size local SSD for maximum performance,
and we still want to take frequent snapshots (and prune them behind),
similar to what we do on NetApp, but all in the context of local storage. I
don't use the word "lie" to IT in these cases. It's a partnership, and
attempt to make the most use of the storage and the technology.

There was some discussion about how data is presented to the higher layers.
I didn't follow the suggestion exactly (communicating layout information?),
but I did have these thoughts:

   1. When the storage runs out, it clearly communicates layout information
   to the caller in the form of a boolean "does it work or not?"
   2. There are other ways that information does get communicated, such as
   if a device becomes read only. For example, an iSCSI LUN.

I didn't follow communication of specific layout information as this didn't
really make sense to me when it comes to dynamic allocation. But, if the
intent is to provide early warning of the likelihood of failure, compared
to waiting to the very last minute where it has already failed, it seems
like early warning would be useful. I did have a question about the
performance of this type of communication, however, as I wouldn't want the
host to be constantly polling the storage to recalculate the up-to-date
storage space available.

[-- Attachment #2: Type: text/html, Size: 2918 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-30  4:46         ` Mark Mielke
@ 2016-05-03 13:03           ` Xen
  0 siblings, 0 replies; 11+ messages in thread
From: Xen @ 2016-05-03 13:03 UTC (permalink / raw)
  To: LVM general discussion and development

Mark Mielke schreef op 30-04-2016 6:46:

> Lots of interesting ideas in this thread.

Thank you for your sane response.

> There was some discussion about how data is presented to the higher
> layers. I didn't follow the suggestion exactly (communicating layout
> information?), but I did have these thoughts:
> 
> 	* When the storage runs out, it clearly communicates layout
> information to the caller in the form of a boolean "does it work or
> not?"
> 	* There are other ways that information does get communicated, such
> as if a device becomes read only. For example, an iSCSI LUN.
> 
> I didn't follow communication of specific layout information as this
> didn't really make sense to me when it comes to dynamic allocation.
> But, if the intent is to provide early warning of the likelihood of
> failure, compared to waiting to the very last minute where it has
> already failed, it seems like early warning would be useful. I did
> have a question about the performance of this type of communication,
> however, as I wouldn't want the host to be constantly polling the
> storage to recalculate the up-to-date storage space available.

Zdenec alluded to the idea and fact that this continuous polling would 
either be required or deeply ungrateful to the hardware. In the sense of 
being hugely expensive. Of course I do not know everything about a 
system before I start thinking. If I have an idea it is usually possible 
to implement it but I only find out later down the road if this is 
actually so and if it needs amending. I could not progress with life if 
every idea needed to be 100% sure before I could commence with it, 
because in that sense the commencing and the learning would never 
happen.

I didn't know thin (or LVM) doesn't maintain maps of used blocks.

Of course for regular LVM it makes no sense if the usage of the blocks 
you have allocated to a system is none of your concern at all.

The recent DISCARD improvements apparently just signal some special case 
(?) but SSDs DO maintain maps or it wouldn't even work (?).

I don't know, it would seem that having a map of used extents in a thin 
pool is in some way deeply important in being able to allocate unused 
ones?

I would have to dig into it of course but I am sure I would be able to 
find some information (and not lies ;-))).

I guess continuous polling would be deeply disrespectful of the hardware 
and software resources.

In the theoretical system I proposed it would be a constant 
communication between systems bogging down resources. But we must agree 
we are typically talking about 4MB blocks here (and mutations to them). 
In a sense you could easily increase that to 16MB, or 32MB, or whatever.

You could even update a filesystem when mutations of a thousand 
gigabytes have happened.

We are talking about a map of regions and these regions can be as large 
as you want.

It would say to a filesystem: these regions are currently unavailable.

You would even get more flags:

- this region is entirely unavailable
- this region is now more expensive to allocate to
- this region is the preferred place

When you allocate memory in the kernel (like with kmalloc) you specify 
what kind of requirements you have.

This is more of the same kind, I guess.

Typically a thin system is a system of extent allocation, the way we 
have it.

It is the thin volume that allocates this space, but the filesystem that 
causes it.

The thin volume would be able to say "don't use these parts".

Or "all parts are equal, but don't use more than X currently".

Actually the latter is a false statement, you need real information.

I know in ext filesystems the inodes are scattered everywhere (and the 
tables) so the blocks are already getting used, in that sense. And if 
you had very large blocks that you would want to make totally 
unavailable, you would get weird issues. "That's funny, I'm already 
using it".

So in order to make sense they would have to be contiguous regions (in 
the virtual space) that are really not used yet.

I don't know, it seems fun to make something like that. Maybe I'll do it 
some day.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin)
  2016-04-28 22:37 [linux-lvm] about the lying nature of thin Xen
  2016-04-29  8:44 ` Marek Podmaka
@ 2016-05-10 21:47 ` Linda A. Walsh
  2016-05-10 23:58   ` Xen
  1 sibling, 1 reply; 11+ messages in thread
From: Linda A. Walsh @ 2016-05-10 21:47 UTC (permalink / raw)
  To: LVM general discussion and development

Xen wrote:
> You know mr. Patton made the interesting allusion that thin 
> provisioning is designed to lie and is meant to lie, and I beg to differ.
----
    Isn't using a thin memory pool for disk space similar to using
a virtual memory/swap space that is smaller than the combined sizes of all
processes?

    I.e.  Administrators can choose to decide whether to over-allocate
swap or paging file space or to have it be a hard limit -- and forgive me
if I'm wrong, but isn't this a configurable in /proc/sys/vm with the
over-commit parms (among others)?

    Doesn't over-commit in the LVM space have similar checks and balances
as over-commit in the VM space?  Whether it does or doesn't, shouldn't
the reasoning be similar in how they can be controlled?

    In regards to LVM overcommit -- does it matter (at least in the short
term), if that over-committed space is filled with "SPARSE" data files?.
I mean, suppose I allocate space for astronomical bodies -- in some 
areas/directions, I might have very SPARSE usage, vs. towards the core 
of a galaxy, I might expect less sparce usage. 

    If a file system can be successfully closed with 'no errors' -- 
doesn't that still mean it is "integrous" -- even if its sparse files 
don't all have enough room to be expanded?

    Does it make sense to think about a OOTS (OutOfThinSpace) daemon that
can be setup with priorities to reclaim space?

    I see 2 types of "quota" here. And I can see the metaphor of these 
types being extended into disk space:  Direct space, that physically 
present, and "indirect or *temporary* space" -- which you might try to 
reserve at the beginning of a job.  Your job could be configured to wait 
until the indirect space is available, or die immediately.  But 
conceivably indirect space is space on a robot-cartridge retrieval 
system that has huge amount of virtual space, but at the cost of needing 
to be loaded before your job can run. 

    Extending that idea -- the indirect space could be configured as 
"high priority space" -- meaning once it is allocated, it stays 
allocated *until* the job completes (in other words the job would have a 
low chance of being "evicted" by an OOTS damon), vs. most "extended 
space would have the priority of "temporary space" -- with processes 
using large amounts of such 'indirect space and having a low expectation 
of quick completion being high on the oots-daemon's list?

    Processes could also be willing to "give up memory and suspend" -- 
where, when called, a handler could give back Giga-or Tera bytes of memory
and save it's state as needing to restart the last pass.

    Lots of possibilities -- if LVM-this space is managed like 
memory-virtual space.  That means some outfits might choose to never 
over-allocate, while others might allow fraction.

    From how it sounds -- when you run out of thin space, what happens
now is that the OS keeps allocating more Virtual space that has no 
backing store (in memory or on disk)...with a notification buried in a 
system log
somewhere. 

    On my own machine, I've seen >50% of memory returned after
sending a '3' to /proc/sys/vm/drop_caches -- maybe similar emergency 
measures could help in the short term, with long term handling being as
similarly flexible as VM policies.

    Does any of this sound sensible or desirable?   How much effort is 
needed for how much 'bang'?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin)
  2016-05-10 21:47 ` [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin) Linda A. Walsh
@ 2016-05-10 23:58   ` Xen
  0 siblings, 0 replies; 11+ messages in thread
From: Xen @ 2016-05-10 23:58 UTC (permalink / raw)
  To: LVM general discussion and development

Hey sweet Linda,

this is beyond me at the moment. You go very far with this.

Linda A. Walsh schreef op 10-05-2016 23:47:

>    Isn't using a thin memory pool for disk space similar to using
> a virtual memory/swap space that is smaller than the combined sizes of 
> all
> processes?

I think there is a point to that, but for me the concordance is in the 
idea that filesystems should perhaps have different modes of requesting 
memory (space) as you detail below.

Virtual memory typically cannot be expanded (automatically) although you 
could.

Even with virtual memory there is normally a hard limit, and unless you 
include shared memory, there is not really any relation with 
overprovisioned space, unless you started talking about prior allotment, 
and promises being given to processes (programs) that a certain amount 
of (disk) space is going to be available when it is needed.

So what you are talking about here I think is expectation and 
reservation.

A process or application claims a certain amount of space in advance. 
The system agrees to it. Maybe the total amount of claimed space is 
greater than what is available.

Now processes (through the filesystem) are notified whether the space 
they have reserved is actually going be there, or whether they need to 
wait for that "robot cartridge retrieval system" and whether they want 
to wait or will quit.

They knew they needed space and they reserved it in advance. The system 
had a way of knowing whether the promises could be met and the requests 
could be met.

So the concept that keeps recurring here seems to be reservation of 
space in advance.

That seems to be the holy grail now.

Now I don't know but I assume you could develop a good model for this 
like you are trying here.

Sparse files are difficult for me, I have never used them.

I assume they could be considered sparse by nature and not likely to 
fill up.

Filling up is of the same nature as expanding.

The space they require is virtual space, their real space is the 
condensed space they actually take up.

It is a different concept. You really need two measures for reporting on 
these files: real and virtual.

So your filesystem might have 20G real space.
Your sparse file is the only file. It uses 10G actual space.
Its virtual file size is 2T.

Free space is reported as 10G.

Used space is given two measures: actual used space, and virtual used 
space.

The question is how you store these. I think you should store them 
condensed.

As such only the condensed blocks are given to the underlying block 
layer / LVM.

I doubt you would want to create a virtual space from LVM such that your 
sparse files can use a huge filesystem in a non-condensed state sitting 
on that virtual space?

But you can?

Then the filesystem doesn't need to maintain blocklists or whatever, but 
keep in mind that normally a filesystem will take up a lot of space in 
inode structres and the like, when the filesystem is huge but the actual 
volume is not.

If you create one thin pool, and a bunch of filesystems (thin volumes) 
of the same size, with default parameters, your entire thin pool will 
quickly fill up with just metadata structures.

I don't know. I feel that sparse files are weird anyway, but if you use 
them, you'd want them to be condensed in the first place and existing in 
a sort of mapped state where virtual blocks are mapped to actual blocks. 
That doesn't need to be LVM and would feel odd there. That's not its 
purpose right.

So for sparse you need a mapping at some point but I wouldn't abuse LVM 
for that primarily. I would say that is 80% filesystem and 20% LVM, or 
maybe even 60% custom system, 20% filesystem and 20% LVM.

Many games pack their own filesystems, like we talked about earlier 
(when you discussed inefficiency of many small files in relation to 4k 
block sizes).

If I really wanted sparse personally, as an application data storage 
model, I would first develop this model myself. I would probably want to 
map it myself. Maybe I'd want a custom filesystem for that. Maybe a 
loopback mounted custom filesystem, provided that its actual block file 
could grow.

I would imagine allocating containers for it, and I would want the 
"real" filesystem to expand my containers or to create new instances of 
them. So instead of mapping my sectors directly, I would want to map 
them myself first, in a tiered system, and the filesystem to map the 
higher hierarchy level for me. E.g. I might have containers of 10G each 
allocated in advance, and when I need more, the filesystem allocates 
another one. So I map the virtual sectors to another virtual space, such 
that my containers

container virtual space / container size = outer container addressing
container virtual space % container size = inner container addressing

outer container addressing goes to filesystem structure telling me (or 
it) where to write my data to.

inner container addressing follows normal procedure, and writes "within 
a file".

so you would have an overflow where the most significant bits cause 
container change.

At that point I've already mapped my "real" sparse space to container 
space, its just that the filesystem allows me to address it without 
breaking a beat.

What's the difference with a regular file that grows? You can attribute 
even more significant bits to filesystem change as well. You can have as 
many tiers as you want. You would get "falls outside of my jurisdiction" 
behaviour, "passing it on to someone else".

LVM thin? Hardly relates to it.

You could have addressing bits that reach to another planet ;-) :).


>    If a file system can be successfully closed with 'no errors' --
> doesn't that still mean it is "integrous" -- even if its sparse files
> don't all have enough room to be expanded?

Well that makes sense. But that's the same as saying that a thin pool is 
still "integrous" even though it is over-allocated. You are saying the 
same thing here, almost.

You are basically saying: v-space > r-space == ok?

Which is the basic premise of overprovisioning to begin with.

With the added distinction of "assumed possible intent to go and fill up 
that space".

Which comes down to:

"I have a total real space of 2GB, but my filesystem is already 8GB. 
It's a bit deceitful, but I expect to be able to add more real space 
when required."

There are two distinct cases:

- total allotment > real space, but individual allotments < real space
- total allotment > real space, AND individual allotments > real space

I consider the first acceptable. The second is spending money you don't 
have.

I would consider not ever creating an indvidual filesystem (volume) that 
is actually bigger (ON ITS OWN) than all the space that exists.

I would never consider that. I think it is like living on debt.

You borrow money to buy a house. It is that system.

You borrow future time.

You get something today but you will have to work for it for a long 
time, paying for something you bought years ago.

So how do we deal with future time? That is the question. Is it 
acceptable to borrow money from the future?

Is it acceptable to use space now, that you will only have tomorrow?

>    If a file system can be successfully closed with 'no errors' --
> doesn't that still mean it is "integrous" -- even if its sparse files
> don't all have enough room to be expanded?

If your sparse file has no intent to become non-sparse, then it is no 
issue.

If your sparse file already tells you it is going to get you in trouble, 
it is different.

This system is integrous depending on planned actions.

Same is true for LVM now. The system is safe until some program decides 
to allocate the entire filesystem.

And there are no checks and balances, the system will just crash.

The peculiar condition is that you have built a floor. You have a floor, 
like a circular area of a certain surface area. But 1/3 of the floor is 
not actually there.

You keep telling yourself not to go there.

The entire circle appears to be there. But you know some parts are 
missing.

That is the current nature of LVM thin.

You know that if you step on certain zones, you will fall through and 
crash to the ground below.

(I have had that happen as a kid. We were in the attic and we had 
covered the ladder gap with cardboard. Then, we (or at least I) forgot 
that the floor was not actually real and I walked on it, instantly 
falling through and ending on a step on the ladder below.)

[ People here keep saying that a real admin would not walk on that 
ladder gap. A real admin would know where the gap was at all times. He 
would not step on it, and not fall though.

But I've had it happen that I forgot where the gap was and I stepped on 
it anyway. ]



>    Does it make sense to think about a OOTS (OutOfThinSpace) daemon 
> that
> can be setup with priorities to reclaim space?

Does make some sense, certainly, to me at least, no matter if I 
understand little or are of no real importance here, but, I don't really 
understand the implications at this point.


>    Processes could also be willing to "give up memory and suspend" --
> where, when called, a handler could give back Giga-or Tera bytes of
> memory
> and save it's state as needing to restart the last pass.

That is almost a calamity mode. I need to shower, but I was actually 
just painting the walls. Need to stop painting that shower, so I can use 
it for something else.

I think it makes sense to lay a claim to some uncovered land, but when 
someone else also claims it, you discuss who needs it most, whether you 
feel like letting the other one have it, whose turn it is now, will it 
hurt you to let go of that.

It is almost the same as reserving classrooms.

So like I said, reservation. And like you say, only temporary space that 
you need for jobs. In a normal user system that is not computationally 
heavy, these things do not really arise, except maybe for video editing 
and the like.

If you have large data jobs like you are talking about, I think you 
would need a different kind of scheduling system anyway. But not so much 
automatic. Running out of space is not a serious issue if the 
administrator system allots space to jobs. Doesn't have to be a 
filesystem doing that.

But I guess your proposed daemon is just a layer above that, knowing 
about space constraints, and then allotting space to jobs based on 
priority queues. Again doesn't really have much to do with thin, unless 
every "job" would have its own "thin volume". And the "thin pool-volume 
system" would get used to "allot space" (the V-size of the volume) but 
if too much space was allotted, the system would get in trouble 
(overprovisioning) if all jobs run. Again, borrowing money from the 
future.

The premise of LVM is not that every volume is going to be able to use 
all its space. It's not that it should, has to, or is going to fill up 
as a matter of course, as an expected and normal thing.

You see thin LVM only works if the volumes are independent.

In that job system they are not independent. The independence entails an 
expected growth that does happen on purpose. It involves a probability 
distribution in which the average of expected space usage to be less 
than the maximum.

LVM thin is really a statistical thing basing itself on the laws of 
large numbers, averaging, and the expectation that if ONE volume is 
going to be max, another one won't.

If you are going to allot jobs that are expected to completely fill up 
the reserved space, you are talking about an entirely different thing.

You should provision based on average, but if average is max, it makes 
no sense anymore and you should just proportion according to available 
real space. You do not need thin volumes or a thin pool to do that sort 
of thing: just regular fixed-size filesystem with jobs and space 
requests.

In other words, the amount of sane overprovisioning you can do is 
related to the difference between max and average.

The different (max - average) is the amount you can safely overprovision 
given normal circumstances.

You do not "on purpose" and willfully provision less than the average 
you expect. Average is your criterium. Max is the individual max size. 
Overprovisioning is the ability of an individual volume to grow beyond 
average towards max. If the calculations hold, some other volume will be 
below average.

However if your numbers are smaller (not 1000s of volumes, but just a 
few) the variance grows enormously. And with the growth in variance you 
can no longer predict what is going to happen. But the real question is 
whether there is going to be any covariance, and in a real thin system, 
there should be none (independent).

For instance, if there is some hype and all your clients suddenly start 
downloading the next best movie from 200G television, you already have 
covariance.

Social unrest always indicates covariance. People stop making their own 
choices, and your predications and business and usual no longer hold 
true. Not because your values weren't sane. More like because people 
don't act naturally in those circumstances.

Covariance indicates that there is a tertiary factor, causing (for 
instance) growth in (volumes) across the line.

John buys a car, and Mary buys a house, but actually it is because they 
are getting married.

Or, John buys a car, and Mary buys a house, but the common element is 
that they have both been brainwashed by contemporary economists working 
at the World Bank.

All in all the insanity happens when you start to borrow from the 
future, which causes you to have to work your ass off to meet the 
demands you placed on yourself earlier, always having to rush, panic, 
and be under pressure.

Better not overprovision beyond your average, in the sense of not even 
having enough for what you expect to happen.


>    From how it sounds -- when you run out of thin space, what happens
> now is that the OS keeps allocating more Virtual space that has no
> backing store (in memory or on disk)...with a notification buried in a
> system log
> somewhere.

Sounds like the gold standard and having money that has no gold behind 
it or anything else of value.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-05-10 23:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-28 22:37 [linux-lvm] about the lying nature of thin Xen
2016-04-29  8:44 ` Marek Podmaka
2016-04-29 10:06   ` Gionatan Danti
2016-04-29 13:16     ` Xen
2016-04-29 22:32       ` Xen
2016-04-30  4:46         ` Mark Mielke
2016-05-03 13:03           ` Xen
2016-04-29 11:53   ` Xen
2016-04-29 20:37   ` Chris Friesen
2016-05-10 21:47 ` [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin) Linda A. Walsh
2016-05-10 23:58   ` Xen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.