All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] about the lying nature of thin
@ 2016-04-28 22:37 Xen
  2016-04-29  8:44 ` Marek Podmaka
  2016-05-10 21:47 ` [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin) Linda A. Walsh
  0 siblings, 2 replies; 16+ messages in thread
From: Xen @ 2016-04-28 22:37 UTC (permalink / raw)
  To: Linux lvm

You know mr. Patton made the interesting allusion that thin provisioning 
is designed to lie and is meant to lie, and I beg to differ.

Under normal operating conditions any thin volume should be allowed to 
grow to its maximum V-size provided not everyone is doing that at the 
same time.


There is nowhere in the thin contract something that says "this space I 
have available to you, you don't have to share it".

That is like saying the basement container room used as bike and motor 
space in my apartment complex is a like because if I were to fill it up, 
other people couldn't use it again anymore.

The visuals clearly indicate available physical space, but I know that 
if I use it, others won't be able to. It's called sharing.

In practical matters a thin volume only starts to lie when "real space" 
< "virtual space" -- a condition you are normally trying to avoid.

So I would not even say that by definition a thin volume or thin volume 
manager lies.

It only starts "lying" the moment real available space goes below 
virtual available space, something you would normally be trying to 
avoid.

Since your guarantee to your customers (for instance) is that this space 
IS going to be available, you're actually lying to them by not informing 
them of the condition that this guarantee can not actually be met in 
some instance of time.

Thin pools do not lie by default. They lie when they cannot fulfill 
their obligations, and this is precisely the reason for the idea I 
suggested: to stop the lie, to be honest.

It was said (by Marek Podmaka) that you don't want customers / users to 
know about the reality behind the thin pool, in some or many use cases 
(liberally interpreted). That there are use cases where you don't want 
the client to know about the thin nature.

But if you don't do your job right and the thin pool does start to fill 
up, that starts to sound like lying to your client and saying 
"everything is all right" while behind the scenes everyone is in 
calamity mode.

"Is something wrong? No no, not@all".

You're usually aware that you're being lied to ;-) if you are talking to 
a real human.

So basically:
* either you do your job right and nothing is the matter
* you don't do your job right but you don't tell anyone
* you don't do your job right and you own up.

Saying that thin pools habitually lie is not right. The question is not 
what happens or what you do while the system is functioning as intended. 
The question is what you do when that is no longer the case:

* do you inform the guest system?
* do you keep silent until shit breaks loose?

IF you had an autoextend mechanism present you could also equally well 
decide to not "inform" clients as long as that was the case. After all, 
if you have automatic extending configured and it is operational, then 
the "real size" is actually larger than what you currently have.

In that case "virtual size < real size" does not hold or does not 
happen, and there is no need to communicate anything. This is also a 
question about ethics, perhaps.

Personally I like to be informed. I don't know what you do or want.

But I can think of any number of analogies or life situations where I 
would definitely choose to be informed instead of being lied to.

Thin LVM does not lie by default. It may only start to lie when 
conditions are no longer met.

Regards, Xen.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-28 22:37 [linux-lvm] about the lying nature of thin Xen
@ 2016-04-29  8:44 ` Marek Podmaka
  2016-04-29 10:06   ` Gionatan Danti
                     ` (2 more replies)
  2016-05-10 21:47 ` [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin) Linda A. Walsh
  1 sibling, 3 replies; 16+ messages in thread
From: Marek Podmaka @ 2016-04-29  8:44 UTC (permalink / raw)
  To: Xen; +Cc: Linux lvm

Hello Xen,

Friday, April 29, 2016, 0:37:23, you wrote:

> In practical matters a thin volume only starts to lie when "real space"
> < "virtual space" -- a condition you are normally trying to avoid.

> Thin pools do not lie by default. They lie when they cannot fulfill
> their obligations, and this is precisely the reason for the idea I 
> suggested: to stop the lie, to be honest.

I would say that thin provisioning is designed to lie about the
available space. This is what it was invented for. As long as the used
space (not virtual space) is not greater then real space, everything
is ok. Your analogy with customers still applies and whole IT business
is based on it (over-provisioning home internet connection speed,
"guaranteed" webhosting disk space). It seems to me that disk space
was the last thing to get over- (or thin-) provisioned :)

Now I'm not sure what your use-case for thin pools is.

I don't see it much useful if the presented space is smaller than
available physical space. In that case I can just use plain LVM with
PV/VG/LV. For snaphosts you don't care much as if the snapshot
overfills, it just becomes invalid, but won't influence the original
LV.

But their use case is to simplify the complexity of adding storage.
Traditionally you need to add new physical disks to the storage /
server, add it to LVM as new PV, add this PV to VG, extend LV and
finally extend filesystem. Usually the storage part and server (LVM)
part is done by different people / teams. By using thinp, you create
big enough VG, LV and filesystem. Then as it is needed you just add
physical disks and you're done.

Another benefit is disk space saving. Traditionally you need to have
some reserve as free space in each filesystem for growth. With many
filesystems you just wasted a lot of space. With thinp, this free
space is "shared".

And regarding your other mail about presenting parts / chunks of
blocks from block layer... This is what device mapper (and LVM built
on top of it) does - it takes many parts of many block devices and
creates new linear block device out of them (whether it is stripped
LV, mirrored LV, dm-crypt or just concatenation of 2 disks).

-- 
  bYE, Marki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29  8:44 ` Marek Podmaka
@ 2016-04-29 10:06   ` Gionatan Danti
  2016-04-29 13:16     ` Xen
  2016-04-29 11:53   ` Xen
  2016-04-29 20:37   ` Chris Friesen
  2 siblings, 1 reply; 16+ messages in thread
From: Gionatan Danti @ 2016-04-29 10:06 UTC (permalink / raw)
  To: Marek Podmaka, LVM general discussion and development, Xen



On 29/04/2016 10:44, Marek Podmaka wrote:
> Hello Xen,
> Now I'm not sure what your use-case for thin pools is.
>
> I don't see it much useful if the presented space is smaller than
> available physical space. In that case I can just use plain LVM with
> PV/VG/LV. For snaphosts you don't care much as if the snapshot
> overfills, it just becomes invalid, but won't influence the original
> LV.
>

Let me add one important use case: have fast, flexible snapshots.

In the past I used classic LVM to build our virtualization servers, but 
this means I was basically forced to use a separate volume for each VM: 
using a single big volume and filesystem for all the VMs means that, 
while snapshotting it for backup purpose, I/O become VERY slow on ALL 
virtual machines.

On the other hand, thin pools provide much faster snapshots. On the 
latest builds, I begin using a single large thin volume, on top of a 
single large thin pool, to host a single filesystem that can be 
snapshotted with no big slowdown on the I/O part.

I understand that it is a tradeoff - classic LVM mostly provides 
contiguous blocks, so fragmentation remain quite low, while thin 
pools/volumes are much more prone to fragament, but with large enough 
chunks it is not such a big problem.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29  8:44 ` Marek Podmaka
  2016-04-29 10:06   ` Gionatan Danti
@ 2016-04-29 11:53   ` Xen
  2016-04-29 20:37   ` Chris Friesen
  2 siblings, 0 replies; 16+ messages in thread
From: Xen @ 2016-04-29 11:53 UTC (permalink / raw)
  To: Linux lvm

Marek Podmaka schreef op 29-04-2016 10:44:

> I would say that thin provisioning is designed to lie about the
> available space. This is what it was invented for. As long as the used
> space (not virtual space) is not greater then real space, everything
> is ok. Your analogy with customers still applies and whole IT business
> is based on it (over-provisioning home internet connection speed,
> "guaranteed" webhosting disk space). It seems to me that disk space
> was the last thing to get over- (or thin-) provisioned :)

But you see if my landlord tells me I can use the entire container room, 
except that I have to share it with others, does he lie?

I *can* use the entire container room. I just have to ensure it is empty 
again by the end of the day (or even sooner).

Those ISPs do not say "Every client can use the full bandwidth all at 
the same time." They don't say that. They say "Fair use policies apply". 
That's what they say. And they mean that no, you can't do that stuff 
24/7/365.

So let's talk then about two things you can lie about:
* available space
* the thought that all of the space is available to everyone at all 
times.

In a normal use case, only the latter would be a lie. But that's not 
what companies tell their clients. Maybe implicitly, at times. But not 
explicitly at all (hence fair use policy).

The former is not a lie. If you have a 1000 customers, and each has 50GB 
available total, and the average use at this point is 25GB, and you have 
provisioned for ~35GB each, meaning 35000 GB is available and 25000 is 
in use, then it is not a lie to say to any individual customer: you can 
use 50GB if you want.

The guarantee that everyone can do it all at the same time, just doesn't 
hold, but that is never communicated.

As a customer you are not aware of how many other clients there are, or 
how many other thin volumes (ordinarily) or what the max capacity is 
across all the volumes. So you are not being lied to.

For it to be a lie, you would have to be concerned about the total 
picture. You would have to have an awareness of other clients and then 
you would need to make the assumption that all of these clients at the 
same time can use all of that bandwidth/data/space.

But your personal scenario doesn't extend that far.

Just as a funny example. Nearby there was a supermarket that advertized 
with that (to my mind) stupid thought "if there are more than 4 
customers in line, and you are the 5th, you get your groceries for 
free".

What did a local student's house do? They went to the supermarket with 
about 20 people and got a lot of stuff free.

I mean in statistics you have queue calculations too but it gets 
defeated if people start doing that stuff (thwarting the mechanism on 
purpose). For example, the traditional statistics example is that of 
customers at a hairsalon. Based on a certain distribution and an average 
number of new arrivals, a conclusion is reached and certain data is 
found.

But this data is thwarted the moment customers on purpose start to pile 
up just to thwart this data, you get what I mean?

Any /intentional/ purpose to thwart the average, means it is no longer 
the average.

Normal people wanting a haircut do not show up at a salon to thwart the 
salons calculations. Ordinary use cases do not apply to this.

If you can expect a command normal amount of use, then there is no 
"intent" with those clients to be doing anything out of the ordinary.

Just like that "hairsalon" can normally depend on those "calculations" 
(you could, you know) and provision for that (number of employees 
present) so too can a thin provisioning setup depend on expected 
averages (in a distribution, the "expected" value of a stochast is the 
expected average) (as a prediction in that sense).

There's no lying in that. If this hairsalon now says "You can get cut 
within 10 minutes without an appointment" then yes people could thwart 
that by suddenly all showing up at the same time.

Doesn't work like that in reality when people do not have such 
intentions.

We call that "innocence" ;-) not doing something on purpose.

That hairsalon is not lying if it guarantees 10 minute wait time in 
general. It just cannot guarantee it if people start to bugger.

Statistics is all about averages and large numbers.

"A "law of large numbers" is one of several theorems expressing the idea 
that as the number of trials of a random process increases, the 
percentage difference between the expected and actual values goes to 
zero."

That means that if you have enough numbers (enough thin volumes) the 
likelihood in actuality between what you promise and what you can 
deliver, the difference goes to zero and in effect you are always 
speaking the truth.

Remember: you are speaking the truth given normal expected reality.
You are no longer speaking the truth if people start to mess with you on 
purpose.

If you have 10.000 clients and 5.000 of them are one person intending to 
bug you out, just like in the supermarket example, well, then you've 
lost. But, that is an intentional devious thing to do just in order to 
make use of some monetary loophole in the system, so to speak.

And in general your terms of use could guard against that (and many 
companies do, I'm sure).


> Now I'm not sure what your use-case for thin pools is.

Presently maximizing space efficiency across a small number of volumes, 
as well as access to superior snapshotting ability.

> I don't see it much useful if the presented space is smaller than
> available physical space. In that case I can just use plain LVM with
> PV/VG/LV. For snaphosts you don't care much as if the snapshot
> overfills, it just becomes invalid, but won't influence the original
> LV.

You mean there'd not be any use for thin, right. I agree. The whole idea 
is to be more efficient with space.

If the presented space is smaller than you HAVE room for those 
snapshots. But with thin, you don't need to care.

Space is always there.


> But their use case is to simplify the complexity of adding storage.
> Traditionally you need to add new physical disks to the storage /
> server, add it to LVM as new PV, add this PV to VG, extend LV and
> finally extend filesystem. Usually the storage part and server (LVM)
> part is done by different people / teams. By using thinp, you create
> big enough VG, LV and filesystem. Then as it is needed you just add
> physical disks and you're done.

True but let's call it "sharing" resources.

Sharing resources is the whole idea of any advanced society.

Our western mindset doesn't work in the sense of everyone needing to be 
able to possess everything.

The example was given that everyone owns a car, that they may not use 
every day, a washing machine, that they may use 5 hours a week, a vacuum 
cleaner, that they may use 1 hour a week, and so on and so on. The 
example was given that a commercial airliner could *never* do something 
like that.

Commercial airplanes are in operation pretty much 24/7. Disuse is way 
too costly. They cannot afford to not use their machines 24/7.

Our society cannot either, but the way we live and operate with each 
other currently ensures vasts amounts of wasted materials, energy and so 
on.

Resource sharing is an advanced concept in that sense. Let's just call 
thin pools an advanced concept :p.

And let's not call it a lie just like that :) :P.

> Another benefit is disk space saving. Traditionally you need to have
> some reserve as free space in each filesystem for growth. With many
> filesystems you just wasted a lot of space. With thinp, this free
> space is "shared".

My reason exactly.

> And regarding your other mail about presenting parts / chunks of
> blocks from block layer... This is what device mapper (and LVM built
> on top of it) does - it takes many parts of many block devices and
> creates new linear block device out of them (whether it is stripped
> LV, mirrored LV, dm-crypt or just concatenation of 2 disks).

I know. But that is the reverse thing.

DM/LVM takes dispersed stuff and presents a whole.

In this case we were talking about presenting holes.

That's because in this case .....

If you are that barber/haircutter and suddenly you get an influx of 
clients you cannot handle.

Are you going to put up a sign saying "sorry, too busy" or are you going 
to try to keep your "promise" to each and every one of them? I hope you 
didn't offer financial compensation in that sense ;-).

Personally I think that as a client you making use of such "financial 
promises" is very intolerant and unforgiving and greedy and even 
avaricious ;-).

So what if your thin pool does fill up and you have no measure in place 
to handle it?

Are you going to be honest?

This question is not whether thin is currently lying. This is about 
whether you will continue to choose for it to lie.

It is not about the present. It is about the choice you are going to 
make.

Do you choose to lie or not?

Traditionally companies have always tried to keep up the pretense until 
all hell broke loose so badly that it spilled out like a tidal wave.

You can find any number of examples in the history of our world. I am 
currently thinking of the Exxon Valdez, and Enron. I don't know if that 
is applicable. Also thinking of that platform in recent times, of BP. 
Deepwater Horizon, which was said to have been deeply undermaintained.

I mean you can keep pretending everything is going just perfect, or you 
can own up a little sooner. That is a choice to make for each individual 
I guess.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29 10:06   ` Gionatan Danti
@ 2016-04-29 13:16     ` Xen
  2016-04-29 22:32       ` Xen
  0 siblings, 1 reply; 16+ messages in thread
From: Xen @ 2016-04-29 13:16 UTC (permalink / raw)
  To: LVM general discussion and development

Gionatan Danti schreef op 29-04-2016 12:06:

> Let me add one important use case: have fast, flexible snapshots.

One more huge reason for using it in a desktop system.

I didn't know about the performance benefits.

I just know that providing snapshot space in advance by registering in 
advance LVs for that purpose, is not a good way of working (for me, or 
anyone).

Although the idea of using LVM thin to provide only a single thin volume 
might be rather odd ;-).

Still, the snapshotting is clearly superior to that of traditional LVM, 
right?

Regards.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29  8:44 ` Marek Podmaka
  2016-04-29 10:06   ` Gionatan Danti
  2016-04-29 11:53   ` Xen
@ 2016-04-29 20:37   ` Chris Friesen
  2 siblings, 0 replies; 16+ messages in thread
From: Chris Friesen @ 2016-04-29 20:37 UTC (permalink / raw)
  To: linux-lvm

On 04/29/2016 03:44 AM, Marek Podmaka wrote:

> Now I'm not sure what your use-case for thin pools is.
>
> I don't see it much useful if the presented space is smaller than
> available physical space. In that case I can just use plain LVM with
> PV/VG/LV. For snaphosts you don't care much as if the snapshot
> overfills, it just becomes invalid, but won't influence the original
> LV.

One useful case for "presented space equal to physical space" with thin volumes 
is that it simplifies security issues.

With raw LVM volumes I generally need to zero out the whole volume prior to 
deleting it (to avoid leaking the contents to other users).  This takes time, 
and also seriously hammers the disks when you have multiple volumes being zeroed 
in parallel.

With thin, deletion is essentially instantaneous, and the zeroing penalty is 
paid when the disk block is actually written. Any disk blocks which have not 
been written are simply read as all-zeros.

Chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29 13:16     ` Xen
@ 2016-04-29 22:32       ` Xen
  2016-04-30  4:46         ` Mark Mielke
  0 siblings, 1 reply; 16+ messages in thread
From: Xen @ 2016-04-29 22:32 UTC (permalink / raw)
  To: LVM general discussion and development

I guess this Patton guy knows everything about everything, but I'm not 
responding to him anymore.

As he sets up his business empire he leaves us all in the dust anyway.

So I guess I will just keep it to the thing I know something about, 
which is talking to real people.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29 22:32       ` Xen
@ 2016-04-30  4:46         ` Mark Mielke
  2016-05-03 13:03           ` Xen
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Mielke @ 2016-04-30  4:46 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 2700 bytes --]

Lots of interesting ideas in this thread.

But the practical of things is that there is a need for thin volumes that
are over provisioned. Call it a lie if you must, but I want to have
multiple snapshots, and not be forced to have 10X the storage, just so that
I can *guarantee* that I will have the technical capability to fully
allocate every snapshot without running out of space. This is for my
requirements, where I am not being naive or irresponsible. I'm not
representing the situation to myself. I know exactly what to expect, and I
know that it isn't only important to monitor, but it is also important to
understand the usage patterns. For example, in some of our use cases, files
will only normally be extended or created as new, at which point the
overhead of a snapshot is close to zero.

If people find this model unacceptable, then I think they should not use
thin volumes. It's a technology choice.

We have many systems like this beyond LVM... For example, the NetApp FAS
devices we have are set up with this type of model, and IT normally
allocates 10% or more for "snapshots", and when we get this wrong, it does
hurt in various ways, usually requiring that the snapshots get dumped, and
that we figure out why the monitoring failed. Normally, IT adds to the
aggregate as it passes a threshold. In the particular case that is
important for me - we have a fixed size local SSD for maximum performance,
and we still want to take frequent snapshots (and prune them behind),
similar to what we do on NetApp, but all in the context of local storage. I
don't use the word "lie" to IT in these cases. It's a partnership, and
attempt to make the most use of the storage and the technology.

There was some discussion about how data is presented to the higher layers.
I didn't follow the suggestion exactly (communicating layout information?),
but I did have these thoughts:

   1. When the storage runs out, it clearly communicates layout information
   to the caller in the form of a boolean "does it work or not?"
   2. There are other ways that information does get communicated, such as
   if a device becomes read only. For example, an iSCSI LUN.

I didn't follow communication of specific layout information as this didn't
really make sense to me when it comes to dynamic allocation. But, if the
intent is to provide early warning of the likelihood of failure, compared
to waiting to the very last minute where it has already failed, it seems
like early warning would be useful. I did have a question about the
performance of this type of communication, however, as I wouldn't want the
host to be constantly polling the storage to recalculate the up-to-date
storage space available.

[-- Attachment #2: Type: text/html, Size: 2918 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-30  4:46         ` Mark Mielke
@ 2016-05-03 13:03           ` Xen
  0 siblings, 0 replies; 16+ messages in thread
From: Xen @ 2016-05-03 13:03 UTC (permalink / raw)
  To: LVM general discussion and development

Mark Mielke schreef op 30-04-2016 6:46:

> Lots of interesting ideas in this thread.

Thank you for your sane response.

> There was some discussion about how data is presented to the higher
> layers. I didn't follow the suggestion exactly (communicating layout
> information?), but I did have these thoughts:
> 
> 	* When the storage runs out, it clearly communicates layout
> information to the caller in the form of a boolean "does it work or
> not?"
> 	* There are other ways that information does get communicated, such
> as if a device becomes read only. For example, an iSCSI LUN.
> 
> I didn't follow communication of specific layout information as this
> didn't really make sense to me when it comes to dynamic allocation.
> But, if the intent is to provide early warning of the likelihood of
> failure, compared to waiting to the very last minute where it has
> already failed, it seems like early warning would be useful. I did
> have a question about the performance of this type of communication,
> however, as I wouldn't want the host to be constantly polling the
> storage to recalculate the up-to-date storage space available.

Zdenec alluded to the idea and fact that this continuous polling would 
either be required or deeply ungrateful to the hardware. In the sense of 
being hugely expensive. Of course I do not know everything about a 
system before I start thinking. If I have an idea it is usually possible 
to implement it but I only find out later down the road if this is 
actually so and if it needs amending. I could not progress with life if 
every idea needed to be 100% sure before I could commence with it, 
because in that sense the commencing and the learning would never 
happen.

I didn't know thin (or LVM) doesn't maintain maps of used blocks.

Of course for regular LVM it makes no sense if the usage of the blocks 
you have allocated to a system is none of your concern at all.

The recent DISCARD improvements apparently just signal some special case 
(?) but SSDs DO maintain maps or it wouldn't even work (?).

I don't know, it would seem that having a map of used extents in a thin 
pool is in some way deeply important in being able to allocate unused 
ones?

I would have to dig into it of course but I am sure I would be able to 
find some information (and not lies ;-))).

I guess continuous polling would be deeply disrespectful of the hardware 
and software resources.

In the theoretical system I proposed it would be a constant 
communication between systems bogging down resources. But we must agree 
we are typically talking about 4MB blocks here (and mutations to them). 
In a sense you could easily increase that to 16MB, or 32MB, or whatever.

You could even update a filesystem when mutations of a thousand 
gigabytes have happened.

We are talking about a map of regions and these regions can be as large 
as you want.

It would say to a filesystem: these regions are currently unavailable.

You would even get more flags:

- this region is entirely unavailable
- this region is now more expensive to allocate to
- this region is the preferred place

When you allocate memory in the kernel (like with kmalloc) you specify 
what kind of requirements you have.

This is more of the same kind, I guess.

Typically a thin system is a system of extent allocation, the way we 
have it.

It is the thin volume that allocates this space, but the filesystem that 
causes it.

The thin volume would be able to say "don't use these parts".

Or "all parts are equal, but don't use more than X currently".

Actually the latter is a false statement, you need real information.

I know in ext filesystems the inodes are scattered everywhere (and the 
tables) so the blocks are already getting used, in that sense. And if 
you had very large blocks that you would want to make totally 
unavailable, you would get weird issues. "That's funny, I'm already 
using it".

So in order to make sense they would have to be contiguous regions (in 
the virtual space) that are really not used yet.

I don't know, it seems fun to make something like that. Maybe I'll do it 
some day.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin)
  2016-04-28 22:37 [linux-lvm] about the lying nature of thin Xen
  2016-04-29  8:44 ` Marek Podmaka
@ 2016-05-10 21:47 ` Linda A. Walsh
  2016-05-10 23:58   ` Xen
  1 sibling, 1 reply; 16+ messages in thread
From: Linda A. Walsh @ 2016-05-10 21:47 UTC (permalink / raw)
  To: LVM general discussion and development

Xen wrote:
> You know mr. Patton made the interesting allusion that thin 
> provisioning is designed to lie and is meant to lie, and I beg to differ.
----
    Isn't using a thin memory pool for disk space similar to using
a virtual memory/swap space that is smaller than the combined sizes of all
processes?

    I.e.  Administrators can choose to decide whether to over-allocate
swap or paging file space or to have it be a hard limit -- and forgive me
if I'm wrong, but isn't this a configurable in /proc/sys/vm with the
over-commit parms (among others)?

    Doesn't over-commit in the LVM space have similar checks and balances
as over-commit in the VM space?  Whether it does or doesn't, shouldn't
the reasoning be similar in how they can be controlled?

    In regards to LVM overcommit -- does it matter (at least in the short
term), if that over-committed space is filled with "SPARSE" data files?.
I mean, suppose I allocate space for astronomical bodies -- in some 
areas/directions, I might have very SPARSE usage, vs. towards the core 
of a galaxy, I might expect less sparce usage. 

    If a file system can be successfully closed with 'no errors' -- 
doesn't that still mean it is "integrous" -- even if its sparse files 
don't all have enough room to be expanded?

    Does it make sense to think about a OOTS (OutOfThinSpace) daemon that
can be setup with priorities to reclaim space?

    I see 2 types of "quota" here. And I can see the metaphor of these 
types being extended into disk space:  Direct space, that physically 
present, and "indirect or *temporary* space" -- which you might try to 
reserve at the beginning of a job.  Your job could be configured to wait 
until the indirect space is available, or die immediately.  But 
conceivably indirect space is space on a robot-cartridge retrieval 
system that has huge amount of virtual space, but at the cost of needing 
to be loaded before your job can run. 

    Extending that idea -- the indirect space could be configured as 
"high priority space" -- meaning once it is allocated, it stays 
allocated *until* the job completes (in other words the job would have a 
low chance of being "evicted" by an OOTS damon), vs. most "extended 
space would have the priority of "temporary space" -- with processes 
using large amounts of such 'indirect space and having a low expectation 
of quick completion being high on the oots-daemon's list?

    Processes could also be willing to "give up memory and suspend" -- 
where, when called, a handler could give back Giga-or Tera bytes of memory
and save it's state as needing to restart the last pass.

    Lots of possibilities -- if LVM-this space is managed like 
memory-virtual space.  That means some outfits might choose to never 
over-allocate, while others might allow fraction.

    From how it sounds -- when you run out of thin space, what happens
now is that the OS keeps allocating more Virtual space that has no 
backing store (in memory or on disk)...with a notification buried in a 
system log
somewhere. 

    On my own machine, I've seen >50% of memory returned after
sending a '3' to /proc/sys/vm/drop_caches -- maybe similar emergency 
measures could help in the short term, with long term handling being as
similarly flexible as VM policies.

    Does any of this sound sensible or desirable?   How much effort is 
needed for how much 'bang'?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin)
  2016-05-10 21:47 ` [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin) Linda A. Walsh
@ 2016-05-10 23:58   ` Xen
  0 siblings, 0 replies; 16+ messages in thread
From: Xen @ 2016-05-10 23:58 UTC (permalink / raw)
  To: LVM general discussion and development

Hey sweet Linda,

this is beyond me at the moment. You go very far with this.

Linda A. Walsh schreef op 10-05-2016 23:47:

>    Isn't using a thin memory pool for disk space similar to using
> a virtual memory/swap space that is smaller than the combined sizes of 
> all
> processes?

I think there is a point to that, but for me the concordance is in the 
idea that filesystems should perhaps have different modes of requesting 
memory (space) as you detail below.

Virtual memory typically cannot be expanded (automatically) although you 
could.

Even with virtual memory there is normally a hard limit, and unless you 
include shared memory, there is not really any relation with 
overprovisioned space, unless you started talking about prior allotment, 
and promises being given to processes (programs) that a certain amount 
of (disk) space is going to be available when it is needed.

So what you are talking about here I think is expectation and 
reservation.

A process or application claims a certain amount of space in advance. 
The system agrees to it. Maybe the total amount of claimed space is 
greater than what is available.

Now processes (through the filesystem) are notified whether the space 
they have reserved is actually going be there, or whether they need to 
wait for that "robot cartridge retrieval system" and whether they want 
to wait or will quit.

They knew they needed space and they reserved it in advance. The system 
had a way of knowing whether the promises could be met and the requests 
could be met.

So the concept that keeps recurring here seems to be reservation of 
space in advance.

That seems to be the holy grail now.

Now I don't know but I assume you could develop a good model for this 
like you are trying here.

Sparse files are difficult for me, I have never used them.

I assume they could be considered sparse by nature and not likely to 
fill up.

Filling up is of the same nature as expanding.

The space they require is virtual space, their real space is the 
condensed space they actually take up.

It is a different concept. You really need two measures for reporting on 
these files: real and virtual.

So your filesystem might have 20G real space.
Your sparse file is the only file. It uses 10G actual space.
Its virtual file size is 2T.

Free space is reported as 10G.

Used space is given two measures: actual used space, and virtual used 
space.

The question is how you store these. I think you should store them 
condensed.

As such only the condensed blocks are given to the underlying block 
layer / LVM.

I doubt you would want to create a virtual space from LVM such that your 
sparse files can use a huge filesystem in a non-condensed state sitting 
on that virtual space?

But you can?

Then the filesystem doesn't need to maintain blocklists or whatever, but 
keep in mind that normally a filesystem will take up a lot of space in 
inode structres and the like, when the filesystem is huge but the actual 
volume is not.

If you create one thin pool, and a bunch of filesystems (thin volumes) 
of the same size, with default parameters, your entire thin pool will 
quickly fill up with just metadata structures.

I don't know. I feel that sparse files are weird anyway, but if you use 
them, you'd want them to be condensed in the first place and existing in 
a sort of mapped state where virtual blocks are mapped to actual blocks. 
That doesn't need to be LVM and would feel odd there. That's not its 
purpose right.

So for sparse you need a mapping at some point but I wouldn't abuse LVM 
for that primarily. I would say that is 80% filesystem and 20% LVM, or 
maybe even 60% custom system, 20% filesystem and 20% LVM.

Many games pack their own filesystems, like we talked about earlier 
(when you discussed inefficiency of many small files in relation to 4k 
block sizes).

If I really wanted sparse personally, as an application data storage 
model, I would first develop this model myself. I would probably want to 
map it myself. Maybe I'd want a custom filesystem for that. Maybe a 
loopback mounted custom filesystem, provided that its actual block file 
could grow.

I would imagine allocating containers for it, and I would want the 
"real" filesystem to expand my containers or to create new instances of 
them. So instead of mapping my sectors directly, I would want to map 
them myself first, in a tiered system, and the filesystem to map the 
higher hierarchy level for me. E.g. I might have containers of 10G each 
allocated in advance, and when I need more, the filesystem allocates 
another one. So I map the virtual sectors to another virtual space, such 
that my containers

container virtual space / container size = outer container addressing
container virtual space % container size = inner container addressing

outer container addressing goes to filesystem structure telling me (or 
it) where to write my data to.

inner container addressing follows normal procedure, and writes "within 
a file".

so you would have an overflow where the most significant bits cause 
container change.

At that point I've already mapped my "real" sparse space to container 
space, its just that the filesystem allows me to address it without 
breaking a beat.

What's the difference with a regular file that grows? You can attribute 
even more significant bits to filesystem change as well. You can have as 
many tiers as you want. You would get "falls outside of my jurisdiction" 
behaviour, "passing it on to someone else".

LVM thin? Hardly relates to it.

You could have addressing bits that reach to another planet ;-) :).


>    If a file system can be successfully closed with 'no errors' --
> doesn't that still mean it is "integrous" -- even if its sparse files
> don't all have enough room to be expanded?

Well that makes sense. But that's the same as saying that a thin pool is 
still "integrous" even though it is over-allocated. You are saying the 
same thing here, almost.

You are basically saying: v-space > r-space == ok?

Which is the basic premise of overprovisioning to begin with.

With the added distinction of "assumed possible intent to go and fill up 
that space".

Which comes down to:

"I have a total real space of 2GB, but my filesystem is already 8GB. 
It's a bit deceitful, but I expect to be able to add more real space 
when required."

There are two distinct cases:

- total allotment > real space, but individual allotments < real space
- total allotment > real space, AND individual allotments > real space

I consider the first acceptable. The second is spending money you don't 
have.

I would consider not ever creating an indvidual filesystem (volume) that 
is actually bigger (ON ITS OWN) than all the space that exists.

I would never consider that. I think it is like living on debt.

You borrow money to buy a house. It is that system.

You borrow future time.

You get something today but you will have to work for it for a long 
time, paying for something you bought years ago.

So how do we deal with future time? That is the question. Is it 
acceptable to borrow money from the future?

Is it acceptable to use space now, that you will only have tomorrow?

>    If a file system can be successfully closed with 'no errors' --
> doesn't that still mean it is "integrous" -- even if its sparse files
> don't all have enough room to be expanded?

If your sparse file has no intent to become non-sparse, then it is no 
issue.

If your sparse file already tells you it is going to get you in trouble, 
it is different.

This system is integrous depending on planned actions.

Same is true for LVM now. The system is safe until some program decides 
to allocate the entire filesystem.

And there are no checks and balances, the system will just crash.

The peculiar condition is that you have built a floor. You have a floor, 
like a circular area of a certain surface area. But 1/3 of the floor is 
not actually there.

You keep telling yourself not to go there.

The entire circle appears to be there. But you know some parts are 
missing.

That is the current nature of LVM thin.

You know that if you step on certain zones, you will fall through and 
crash to the ground below.

(I have had that happen as a kid. We were in the attic and we had 
covered the ladder gap with cardboard. Then, we (or at least I) forgot 
that the floor was not actually real and I walked on it, instantly 
falling through and ending on a step on the ladder below.)

[ People here keep saying that a real admin would not walk on that 
ladder gap. A real admin would know where the gap was at all times. He 
would not step on it, and not fall though.

But I've had it happen that I forgot where the gap was and I stepped on 
it anyway. ]



>    Does it make sense to think about a OOTS (OutOfThinSpace) daemon 
> that
> can be setup with priorities to reclaim space?

Does make some sense, certainly, to me at least, no matter if I 
understand little or are of no real importance here, but, I don't really 
understand the implications at this point.


>    Processes could also be willing to "give up memory and suspend" --
> where, when called, a handler could give back Giga-or Tera bytes of
> memory
> and save it's state as needing to restart the last pass.

That is almost a calamity mode. I need to shower, but I was actually 
just painting the walls. Need to stop painting that shower, so I can use 
it for something else.

I think it makes sense to lay a claim to some uncovered land, but when 
someone else also claims it, you discuss who needs it most, whether you 
feel like letting the other one have it, whose turn it is now, will it 
hurt you to let go of that.

It is almost the same as reserving classrooms.

So like I said, reservation. And like you say, only temporary space that 
you need for jobs. In a normal user system that is not computationally 
heavy, these things do not really arise, except maybe for video editing 
and the like.

If you have large data jobs like you are talking about, I think you 
would need a different kind of scheduling system anyway. But not so much 
automatic. Running out of space is not a serious issue if the 
administrator system allots space to jobs. Doesn't have to be a 
filesystem doing that.

But I guess your proposed daemon is just a layer above that, knowing 
about space constraints, and then allotting space to jobs based on 
priority queues. Again doesn't really have much to do with thin, unless 
every "job" would have its own "thin volume". And the "thin pool-volume 
system" would get used to "allot space" (the V-size of the volume) but 
if too much space was allotted, the system would get in trouble 
(overprovisioning) if all jobs run. Again, borrowing money from the 
future.

The premise of LVM is not that every volume is going to be able to use 
all its space. It's not that it should, has to, or is going to fill up 
as a matter of course, as an expected and normal thing.

You see thin LVM only works if the volumes are independent.

In that job system they are not independent. The independence entails an 
expected growth that does happen on purpose. It involves a probability 
distribution in which the average of expected space usage to be less 
than the maximum.

LVM thin is really a statistical thing basing itself on the laws of 
large numbers, averaging, and the expectation that if ONE volume is 
going to be max, another one won't.

If you are going to allot jobs that are expected to completely fill up 
the reserved space, you are talking about an entirely different thing.

You should provision based on average, but if average is max, it makes 
no sense anymore and you should just proportion according to available 
real space. You do not need thin volumes or a thin pool to do that sort 
of thing: just regular fixed-size filesystem with jobs and space 
requests.

In other words, the amount of sane overprovisioning you can do is 
related to the difference between max and average.

The different (max - average) is the amount you can safely overprovision 
given normal circumstances.

You do not "on purpose" and willfully provision less than the average 
you expect. Average is your criterium. Max is the individual max size. 
Overprovisioning is the ability of an individual volume to grow beyond 
average towards max. If the calculations hold, some other volume will be 
below average.

However if your numbers are smaller (not 1000s of volumes, but just a 
few) the variance grows enormously. And with the growth in variance you 
can no longer predict what is going to happen. But the real question is 
whether there is going to be any covariance, and in a real thin system, 
there should be none (independent).

For instance, if there is some hype and all your clients suddenly start 
downloading the next best movie from 200G television, you already have 
covariance.

Social unrest always indicates covariance. People stop making their own 
choices, and your predications and business and usual no longer hold 
true. Not because your values weren't sane. More like because people 
don't act naturally in those circumstances.

Covariance indicates that there is a tertiary factor, causing (for 
instance) growth in (volumes) across the line.

John buys a car, and Mary buys a house, but actually it is because they 
are getting married.

Or, John buys a car, and Mary buys a house, but the common element is 
that they have both been brainwashed by contemporary economists working 
at the World Bank.

All in all the insanity happens when you start to borrow from the 
future, which causes you to have to work your ass off to meet the 
demands you placed on yourself earlier, always having to rush, panic, 
and be under pressure.

Better not overprovision beyond your average, in the sense of not even 
having enough for what you expect to happen.


>    From how it sounds -- when you run out of thin space, what happens
> now is that the OS keeps allocating more Virtual space that has no
> backing store (in memory or on disk)...with a notification buried in a
> system log
> somewhere.

Sounds like the gold standard and having money that has no gold behind 
it or anything else of value.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-05-03 13:43 ` matthew patton
@ 2016-05-03 17:42   ` Xen
  0 siblings, 0 replies; 16+ messages in thread
From: Xen @ 2016-05-03 17:42 UTC (permalink / raw)
  To: LVM general discussion and development

matthew patton schreef op 03-05-2016 15:43:

> Xen wrote:
> 
>> I didn't know thin (or LVM) doesn't maintain maps of used blocks.
> 
> Right, so you're ignorant of basics like how the various subsystems
> work. Like I said, go find a text on OS and filesystem design. Hell,
> read the EXT and LVM code or even just the design docs.

Why don't you do it for me and then report back? I could use a slave 
like you are trying to make me.

>> The recent DISCARD improvements apparently just signal some special 
>> case
>> (?) but SSDs DO maintain maps or it wouldn't even work (?).
> 
> Again, read up on the inner workings of SSDs. To over-simplify, SSDs
> have their own "LVM". No different really than a hardware RAID
> controller does - admittedly most raid controllers don't do anything
> particularly advanced.

It almost seems like you want me to succeed.

> clearly you are in need of much more studying. LVM knows exactly out
> of all of it's defined extents which ones are free and which ones have
> been assigned to an LV - aka written to. What individual blocks (aka
> range of bytes) inside those extents have FS-managed data in them it
> knows not nor does it care.

Then what is the issue here? That means my assumptions were all entirely 
correct and that Zdenek has said must have been false.

But what you are saying now is extent assignments to LVs, do you imply 
this is also true of assignment to thin volumes?

Yes, when you say "written to" you clearly mean thin pools.

I never alluded that it needed to know or care about the actual usage of 
its blocks (extents).

If a filesystem DISCARDs blocks, then with enough blocks it could 
discard an extent.

I don't even know what will happen if a filesystem stops using the data 
that's on it, but I will test that now. And of course it should just 
free those blocks. It didn't work with mkswap just now, but creating a 
new filesystem causes lvs to report a lower thin pool usage.

Of course, common and commonsensical. So these extents are being 
librated right? And it knows exactly how many are in use?

Then what was this about:

> Thin pool is not constructing  'free-maps'  for each LV all the time - 
> that's why tools like 'thin_ls'  are meant to be used from the 
> user-space.
> It IS very EXPENSIVE operation.

It is saying that e.g. lvs creates this free-map.

But LVM needs to know at every moment in time what extents are 
available. It also needs to runtime liberate them.

So it needs to be able to at least search for free ones and if none is 
found, to report that or do something with it. Of course that is 
different from having a map.

But in-the-moment update operations to filesystems would not require a 
map. They would require mutations being communicated. Mutations that LVM 
already knows about.

So it is nothing special. You don't need those "maps". You need to 
communicate (to other thin volumes) which extents have become 
unavailable. And which have become available once more.

Then the thin volume translates this (possibly) to whatever block system 
the underlying filesystem uses.

Logical blocks, physical blocks.

The main organisation principle is the extent. It is not the LVM that 
needs to maintain a map. It is the filesystem.

It needs to know about its potential for further allocation of the block 
space.




>> I guess continuous polling would be deeply disrespectful of the 
>> hardware
>> and software resources.
> 
> Not to mention instantaneously invalid. So you poll LVM, "what is your
> allocation map and do you have any free extents?" You get the results.
> Then the FS having been assured there is free space issues writes. But
> oh no, in the round-trip some other LV has grabbed the extent you had
> intended to use! IO=FAIL.

You know those contentions issues are everywhere and in the kernel also 
and they are always taken care of.

Don't confront me with a situation that has already been solved by 
numerous other people.

You forget, for once, that real software systems running on the 
filesystem would be aware of the lack of space to begin with. You are 
now approaching a corner case where the last free extent is being 
contended for. I am sure there would be an elegant solution to that.

This corner case is not what it's all about. What it's about is that the 
filesystem has the means to predict what is going to happen, or at least 
the software running on it.

If the situation you are describing is really an issue, you could simply 
reserve a last block (extent) for this scenario that is only written to 
if all other blocks are taken, and each filesystem (volume) has this 
free block of its own.

PROBLEM SOLVED.

You sound like Einstein when he tried to disprove Bohr's theory at that 
convention. In the end Bohr refuted everything and he (Einstein) had to 
accept he was right.

A filesystem will simply reserve the equivalent of an extent. More 
importantly, the thin volume (logical volume) will. The thin LV will 
reserve one last extent in advance from the thin pool that is only 
really given to the filesystem under conditions that the entire thin 
pool is already taken now and the filesystem is still issueing a write 
to a new block because of a race condition that prevented it from 
knowing about the space issue.

These are not difficult engineering problems.


> The ONLY way for a FS to "reserve" a set of blocks (aka extent) to
> itself is to write to it - but mind the FS has NO IDEA if needs to do
> an reservation in the first place nor if this IO just so happens to
> fit inside the allocated range but the next IO at offset +1 will
> require a new extent to be allocated from the THINP.

If you write to a full extent, you are guaranteed to get a new one. It's 
not more difficult than that. Don't make everything so difficult.

I have not talked about reservations myself (prior to this). As we just 
said, if it is only about the very last block of the entire thin pool? 
Reserve it in advance and don't let the FS do it?

If the race condition is such that larger amounts are needed for safety, 
do it? Reserve 200MB in advance if you need it?

You could configure a thin pool / volume to reserve a certain amount of 
free space that is only going to be used if the thin pool is 100% filled 
and it wasn't possible to inform the file systems fast enough.

Proportional to the size of the volume (LV). Who cares if you reserve 1% 
in each volume for this. Or less. A 2TB volume with 1GB of reserved 
space is not so bad, is it?

That's just 0.05% give or take.

If then free space is reported to the filesystem, it can:

1) simply inform programs by way of its normal operation
2) stop writing when the space known to it is gone
3) not have to worry about anything else because race conditions are 
taken care of.

In the event that a filesystem starts randomly writing a single byte to 
every possible block in order to defeat this system.

The filesystem can redirect these writes to other blocks when the LVM 
starts reporting no block for you and the filesystem still has space in 
the blocks it has.

It will just have to invalidate some of its own blocks (extents). IT 
needs to maintain a map, not LVM.

It can deduce its own free space from its own map.

It would be like allocating a thin (sparse) file but then writing to 
every possible address of it along the range. Yes the system is going to 
bug but you can take care of it. Some writes will just fail when out of 
blocks, but the filesystem can redirect it, or in the end just fail 
writing / allocating.

Any block being invalidated would instantly update its free space 
calculations.

You don't need to communicate full maps unless you were creating a new 
filesystem or trying to recover from corruption. You would query "is 
this block available" for instance. That would require a new command. It 
would take a while but that way the filesystem could reconstruct the 
blockmap.

Or it could query about ranges of blocks.

This querying is the first thing you'd introduce. Blocks N to M, are 
they available? Yes or not. Or a list of the ones that are and the ones 
that aren't (a bitmap).

To query 2000 extents you only need 2000 bits. That's 250 bytes, not a 
whole lot. A 2 TB volume would have a free map of 64k bytes.
Do you imagine how small this is?

How would maintaining free maps be an expensive operation, really?

You need a fucking 64k field with a xor operation. That fits inside a 
16-bit 8086 segment.

I mean don't bullshit me here. There is no way it could be hard to 
maintain free maps.

I'm a programmer too, you know. And I have been doing it since 1989 too.

I have programmed in pascal and assembler and I have studied Java's 
BitMap class for instance. It can be done very elegantly.

Any free map the thin LV would conjure up would be a lie in that sense, 
a choice. Because you would arbitrarily invalidate blocks at the end of 
the space.

At the end of the virtual space.

The pool communicates to the volume the acquisition and release of new 
and old extents.

The volume at that point doesn't care which they are. It only needs to 
know the number.

With every mutation it randomly invalidates a single block if it needs 
to (or enables it again).

It sets a bit flag in a 64k field. So let's assume we have a 1PB volume. 
A petabyte. That's 2^50 / 2^20 / 4 number of extents, is 2^28 bits is 
2^25 bytes. Is 2^5 megabytes is 32MB worth of data.

For a volume with 1125899906842624 bytes. Just needs 33554432 bytes to 
maintain a map, if done in 4MB extents.

If done in 4KB blocks the extent communication remains the same but it 
could amount to 1024x that number of bytes needed, is 32GB for a PB 
volume.

Is still 1/32678 of its available address space, so to speak.

But the filesystem could maintain maps of extents and not individual 
'blocks'.

Maybe 32GB is hard to communicate, but 32MB is not. And there are 
systems that have a terabyte of ram.



> I haven't checked, but it's perfectly possible for LVM THINP to
> respond to FS issued DISCARD notices and thus build an allocation map
> of an extent. And should an extent be fully empty to return the extent
> to the thin pool.

I don't know how it is done currently, because clearly the system knows, 
right?

As you say this is perfeclty possible.


> Only to have to allocate a new extent if any IO hits
> the same block range in the future. This kind of extent churn is
> probably not very useful unless your workload is in the habit of
> writing tons of data, freeing it and waiting a reasonable amount of
> time and potentially doing it again. SSDs resort to it because they
> must - it's the nature of the silicon device itself.

Unused blocks need to be made available anyway. A filesystem on which 
80% of data is deleted, and still using all those blocks in the thin 
pool? Please tell me this isn't reality (I know it isn't).


So I make this test right I am just curious what will happen:

1. Create thin pool on other hard disk 400M
2. Create 3 thin volumes totalling 600M
3. Create filesystems (ext3) and mount them.
4. Copy 90MB file to them. After 4 files 360MB of pool is used.
5. Copy 5th file. Nothing happens. No errors, nothing.
6. Copy 6th file. Nothing happens. No errors, nothing.

7. I check volumes. Nothing seems the matter. Lvdisplay no unusual.

df works and appears as though everything is normal. All volumes now 97% 
filled and pool 100% filled.

Can't last right. I see kernel block device page errors come by.

I go to one of the files that should have been successfully written (the 
4th file). I try to copy it to my main disk.

cp hangs. Terminal (tty) switching still works. Vim (I had vim open in 2 
ttys or 3) stops responding. Alt-7 (should open KDE) nothing happens. 
Cannot switch back, ie. cannot switch TTYs anymore. System hangs 
completely.

Mind you this was on a harddisk with no used volumes. No other volumes 
were mounted other than those 3 although of course they were loaded in 
LVM.

There are no dropped volumes. There are no frozen volumes. The system 
just crashes. Very graceful I must say.

I mean if this is the best you can do?

No wonder you are suggesting every admin needs to hire a drill 
instructor to get him through the day.







>> It would say to a filesystem: these regions are currently unavailable.
>> 
>> You would even get more flags:
>> 
>> - this region is entirely unavailable
>> - this region is now more expensive to allocate to
>> - this region is the preferred place
> 
> All of this "inside knowledge" and "coordination" you so desperately
> seem to want is called integration. And again spelled BTRFS and ZFS.
> et. al.

BTRFS is spelled "monopoly" and "wants to be all" and "I'm friends with 
SystemD" ;-).

ZFS I don't know, I haven't cared about it. All I see on IRC is people 
talking about it like some new toy they desperately can't live without 
even though it doesn't serve them any real purpose.

A bit like a toy drone worth 4k dollars.

The only thing that changes is that filesystems maintain bitmaps of 
available sectors/blocks or of extents, and are capable of intelligently 
allocating to the ones they have and that are available.

That's it!

You can still choose what filesystem to use. You could even choose what 
volume manager to use.

We have seen how little data it costs if the extent size is at least 
4MB.
We have seen how easy it would be to query again with the underlying 
layer in case you're not sure.

If you want a block to have more bits, easy too! If you have only 4 
possible states, you can put it in 2 bits.

That would probably be enough for any probably use case. A 2TB volume 
costs 128k bytes for this bitmap with 4 states. That's something you can 
achieve on a 286 if you are crazy enough.



> yeah, have fun with that theoretical system.

Why won't you?


> Xen, dude seriously. Go do a LOT more reading.

I am being called by name :O! I think she likes me.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
       [not found] <1093625508.5537728.1462283037119.JavaMail.yahoo.ref@mail.yahoo.com>
@ 2016-05-03 13:43 ` matthew patton
  2016-05-03 17:42   ` Xen
  0 siblings, 1 reply; 16+ messages in thread
From: matthew patton @ 2016-05-03 13:43 UTC (permalink / raw)
  To: LVM general discussion and development

Xen wrote:

> I didn't know thin (or LVM) doesn't maintain maps of used blocks.

Right, so you're ignorant of basics like how the various subsystems work. Like I said, go find a text on OS and filesystem design. Hell, read the EXT and LVM code or even just the design docs.

> The recent DISCARD improvements apparently just signal some special case 
> (?) but SSDs DO maintain maps or it wouldn't even work (?).

Again, read up on the inner workings of SSDs. To over-simplify, SSDs have their own "LVM". No different really than a hardware RAID controller does - admittedly most raid controllers don't do anything particularly advanced.

> I don't know, it would seem that having a map of used extents in a thin 
> pool is in some way deeply important in being able to allocate unused 
> ones?

clearly you are in need of much more studying. LVM knows exactly out of all of it's defined extents which ones are free and which ones have been assigned to an LV - aka written to. What individual blocks (aka range of bytes) inside those extents have FS-managed data in them it knows not nor does it care.
 
> I guess continuous polling would be deeply disrespectful of the hardware 
> and software resources.

Not to mention instantaneously invalid. So you poll LVM, "what is your allocation map and do you have any free extents?" You get the results. Then the FS having been assured there is free space issues writes. But oh no, in the round-trip some other LV has grabbed the extent you had intended to use! IO=FAIL.

The ONLY way for a FS to "reserve" a set of blocks (aka extent) to itself is to write to it - but mind the FS has NO IDEA if needs to do an reservation in the first place nor if this IO just so happens to fit inside the allocated range but the next IO at offset +1 will require a new extent to be allocated from the THINP.

I haven't checked, but it's perfectly possible for LVM THINP to respond to FS issued DISCARD notices and thus build an allocation map of an extent. And should an extent be fully empty to return the extent to the thin pool. Only to have to allocate a new extent if any IO hits the same block range in the future. This kind of extent churn is probably not very useful unless your workload is in the habit of writing tons of data, freeing it and waiting a reasonable amount of time and potentially doing it again. SSDs resort to it because they must - it's the nature of the silicon device itself.

> It would say to a filesystem: these regions are currently unavailable.
>
> You would even get more flags:
> 
> - this region is entirely unavailable
> - this region is now more expensive to allocate to
> - this region is the preferred place

All of this "inside knowledge" and "coordination" you so desperately seem to want is called integration. And again spelled BTRFS and ZFS. et. al. 
 
> In the theoretical system I proposed it would be a constant

yeah, have fun with that theoretical system.

...

Xen, dude seriously. Go do a LOT more reading.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-05-02 13:18   ` Mark H. Wood
@ 2016-05-03 11:57     ` Xen
  0 siblings, 0 replies; 16+ messages in thread
From: Xen @ 2016-05-03 11:57 UTC (permalink / raw)
  To: linux-lvm; +Cc: Mark H. Wood

Mark H. Wood schreef op 02-05-2016 15:18:

> Failure to adequately manage resources to redeem contracted promises
> is the provider's lie, not LVM's.  Failure to plan is planning to
> fail.

Exactly. And it starts being a lie when resources don't outlast use, and 
in some way the provider doesn't own up to that but let's it happen.

That is irrespective however of the thought that choosing to or not 
choosing to communicate any part of that when it does happen or would 
happen, is a choice you can make and it doesn't take away from thin 
provisioning at all.

If you feel you can always meet your expectations and those of your 
clients and work hard to achieve that, you may never run into the 
situation. However if you do run into the situation the choice becomes 
how to deal with that.

You can also make a proactive choice in advance to either then be open, 
or to stick your head in the sand, as they proverbially say.

I bet many contingency plans used in business everywhere have choices 
surrounding this being made in advance. When do we alert the public. 
When do we open up. When does it go so far that we cannot hide it 
anymore.

In Dutch we call this "keeping in the dirty laundry" -- you only take 
the clean laundry out to dry (on a line). It is quite customary and 
usual for a human being not to want to give insight into private matters 
that might only confuse the other person.

At the same time there is also the question of when to own up to stuff 
that is actually important to another person and I think this is a 
question of ethics.

Sometimes people are not harmed by not knowing things, but you would be 
harmed by them knowing it.
Sometimes people are harmed by not knowing things, and you are not 
harmed by them knowing it.

I think that if we are talking about a business setting where you have 
promised a certain thing to people who are now depending on it, that the 
thing shifts in the direction of the second statement.

If you have a contractual responsibility to deliver, you also have a 
contractual responsibility to inform. That is my opinion on the subject, 
at least.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
  2016-04-29 15:45 ` [linux-lvm] about the lying nature of thin matthew patton
@ 2016-05-02 13:18   ` Mark H. Wood
  2016-05-03 11:57     ` Xen
  0 siblings, 1 reply; 16+ messages in thread
From: Mark H. Wood @ 2016-05-02 13:18 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

On Fri, Apr 29, 2016 at 03:45:31PM +0000, matthew patton wrote:
> > ~35GB each, meaning 35000 GB is available and 25000 is 
> > in use, then it is not a lie to say to any individual customer: you can 
> > use 50GB if you want.
> 
> If enough of your so-called customers decide to use the space you promised them AND THAT THEY PAID FOR and instead they get massive data loss and outages, you can bet your hiney they'll sue you silly.

Executive summary:  you shouldn't just take a wild guess and then turn
your back on a thin-provisioned setup; you must understand your
consumers and monitor your resources.

It's reasonable in certain circumstances for a service provider to
over-subscribe his hardware.  He would be well advised to monitor
actual allocation closely, to keep some cash or ready credit on hand
for quick expansion of his real hardware, and to respond promptly by
adding capacity when usage nears real hardware limits.  He is taking a
risk, betting that most customers won't max out their promised
storage, and should manage that risk.  Indeed, he should first gather
statistics to understand the behavior of typical customers and
determine whether he would be taking a *foolish* risk.

Failure to adequately manage resources to redeem contracted promises
is the provider's lie, not LVM's.  Failure to plan is planning to
fail.

If that's too scary, don't use thin provisioning.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-lvm] about the lying nature of thin
       [not found] <1714078834.3820492.1461944731537.JavaMail.yahoo.ref@mail.yahoo.com>
@ 2016-04-29 15:45 ` matthew patton
  2016-05-02 13:18   ` Mark H. Wood
  0 siblings, 1 reply; 16+ messages in thread
From: matthew patton @ 2016-04-29 15:45 UTC (permalink / raw)
  To: LVM general discussion and development

> ~35GB each, meaning 35000 GB is available and 25000 is 
> in use, then it is not a lie to say to any individual customer: you can 
> use 50GB if you want.

If enough of your so-called customers decide to use the space you promised them AND THAT THEY PAID FOR and instead they get massive data loss and outages, you can bet your hiney they'll sue you silly.

If you want to play fast and loose in your basement that's one thing -  Thin-away. If you try to pull a similar stunt in a commercial setting you either do your homework and put all necessary safeguards in place to prevent customer demand from overwhelming your cheap-sh*t corner cutting, or better have an attorney on retainer and budgeted for breach of contract settlements.

> hold, but that is never communicated.

Then you sir, will no doubt find yourself in front of a magistrate for no less than false representation. If the storage capacity you SOLD is not also explained in the terms of service that it doesn't really exist and that if they (or anyone they are unlucky enough to be co-located with) just so happen to write too fast to their storage that they may well lose their data.

> As a customer you are not aware of how many other clients there are, or
>  how many other thin volumes (ordinarily) or what the max capacity is  across all the
>  volumes. So you are not being lied to.

I strongly suggest you go take a class on contract law (since OS basics is apparently beyond your grasp) and familiarize yourself with your country's prison conditions. At the very least, go talk to an attorney and pay him a consultation fee.
 
As to the rest of your message, perhaps you'd get more insight and traction by doing your own blog to wax philosophical over cheating paying customers, engaging in data-losing computing practices, and being too cheap, lazy, or opinionated to run a responsible service. And for trotting out example after example of non-computer social conditions as if they had any relevance to the matter at hand.

Now if you have something useful to say/ask about LVM, please continue.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-05-10 23:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-28 22:37 [linux-lvm] about the lying nature of thin Xen
2016-04-29  8:44 ` Marek Podmaka
2016-04-29 10:06   ` Gionatan Danti
2016-04-29 13:16     ` Xen
2016-04-29 22:32       ` Xen
2016-04-30  4:46         ` Mark Mielke
2016-05-03 13:03           ` Xen
2016-04-29 11:53   ` Xen
2016-04-29 20:37   ` Chris Friesen
2016-05-10 21:47 ` [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin) Linda A. Walsh
2016-05-10 23:58   ` Xen
     [not found] <1714078834.3820492.1461944731537.JavaMail.yahoo.ref@mail.yahoo.com>
2016-04-29 15:45 ` [linux-lvm] about the lying nature of thin matthew patton
2016-05-02 13:18   ` Mark H. Wood
2016-05-03 11:57     ` Xen
     [not found] <1093625508.5537728.1462283037119.JavaMail.yahoo.ref@mail.yahoo.com>
2016-05-03 13:43 ` matthew patton
2016-05-03 17:42   ` Xen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.