From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx07.extmail.prod.ext.phx2.redhat.com [10.5.110.31]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 94B3F62502 for ; Mon, 11 Sep 2017 17:34:22 +0000 (UTC) Received: from mail-wm0-f43.google.com (mail-wm0-f43.google.com [74.125.82.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1B45AC0D8D5C for ; Mon, 11 Sep 2017 17:34:21 +0000 (UTC) Received: by mail-wm0-f43.google.com with SMTP id 189so10355879wmh.1 for ; Mon, 11 Sep 2017 10:34:21 -0700 (PDT) References: <76b114ca-404b-d7e5-8f59-26336acaadcf@assyoma.it> <0c6c96790329aec2e75505eaf544bade@assyoma.it> <8fee43a1-dd57-f0a5-c9de-8bf74f16afb0@gmail.com> <7d0d218c420d7c687d1a17342da5ca00@xenhideout.nl> <6e9535b6-218c-3f66-2048-88e1fcd21329@redhat.com> <2cea88d3e483b3db671cc8dd446d66d0@xenhideout.nl> From: Zdenek Kabelac Message-ID: <14ec0303-5e4e-3100-7d0b-251532717ecc@gmail.com> Date: Mon, 11 Sep 2017 19:34:18 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] Reserve space for specific thin logical volumes Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8"; format="flowed" To: LVM general discussion and development , Xen Dne 11.9.2017 v 16:00 Xen napsal(a): > Just responding to second part of your email. > >>> Only manual intervention this one... and last resort only to prevent crash >>> so not really useful in general situation? >> >> Let's simplify it for the case: >> >> You have  1G thin-pool >> You use 10G of thinLV on top of 1G thin-pool >> >> And you ask for 'sane' behavior ?? > > Why not? Really. Because all filesystems put on top of thinLV do believe all blocks on the device actually exist.... >> Any idea of having 'reserved' space for 'prioritized' applications and >> other crazy ideas leads to nowhere. > > It already exists in Linux filesystems since long time (root user). Did I say you can't compare filesystem problem with block level problem ? If not ;) let's repeat - being out of space in a single filesystem is completely different fairy-tail with out of space thin-pool. > >> Actually there is very good link to read about: >> >> https://lwn.net/Articles/104185/ > > That was cute. > > But we're not asking aeroplane to keep flying. IMHO you just don't yet see the parallelism.... >> And we believe it's fine to solve exceptional case  by reboot. > > Well it's hard to disagree with that but for me it might take weeks before I > discover the system is offline. IMHO it's problem of proper monitoring. Still the same song here - you should actively trying to avoid car-collision, since trying to resurrect often seriously injured or even dead passenger from a demolished car is usually very complex job with unpredictable result... We do put number of 'car-protection' safety mechanism - so the newer tools, newer kernel the better - but still when you hit the wall in top-speed you can't expect you just 'walk-out' easily... and it's way cheaper to solve the problem in way you will NOT crash at all.. > > Otherwise most services would probably continue. > > So now I need to install remote monitoring that checks the system is still up > and running etc. Of course you do. thin-pool needs attention/care :) > If all solutions require more and more and more and more monitoring, that's > not good. It's the best we can provide.... >> So don't expect lvm2 team will be solving this - there are more prio work.... > > Sure, whatever. > > Safety is never prio right ;-). We are safe enough (IMHO) to NOT loose committed data, We cannot guarantee stable system though - it's too complex. lvm2/dm can't be fixing extX/btrfs/XFS and other kernel related issues... Bold men can step in - and fix it.... >> If the system volume IS that important - don't use it with over-provisiong! > > System-volume is not overprovisioned. If you have enough blocks in thin-pool to cover all needed block for all thinLV attached to it - you are not overprovisioning. > Just something else running in the system.... Use different pools ;) (i.e. 10G system + 3 snaps needs 40G of data size & appropriate metadata size to be safe from overprovisioning) > That will crash the ENTIRE SYSTEM when it fills up. > > Even if it was not used by ANY APPLICATION WHATSOEVER!!! Full thin-pool on recent kernel is certainly NOT randomly crashing entire system :) If you think it's that case - provide full trace of crashed kernel and open BZ - just be sure you use upstream Linux... > My system LV is not even ON a thin pool. Again - if you reproduce on kernel 4.13 - open BZ and provide reproducer. If you use older kernel - take a recent one and reproduce. If you can't reproduce - problem has been already fixed. It's then for your kernel provider to either back-port fix or give you fixed newer kernel - nothing really for lvm2... > It's way more practical solution the trying to fix  OOM problem :) > > Aye but in that case no one can tell you to ensure you have auto-expandable > memory ;-) ;-) ;-) :p :p :p. I'd probably recommend reading some books about how is memory mapped on a block device and what are all the constrains and related problems.. >>> Yes email monitoring would be most important I think for most people. >> Put mail messaging into  plugin script then. >> Or use any monitoring software for messages in syslog - this worked >> pretty well 20 years back - and hopefully still works well :) > > Yeah I guess but I do not have all this knowledge myself about all these > different kinds of softwares and how they work, I hoped that thin LVM would > work for me without excessive need for knowledge of many different kinds. We do provide some 'generic' script - unfortunately - every use-case is basically pretty different set of rules and constrains. So the best we have is 'auto-extension' We used to trying to umount - but this has possibly added more problems then it has actually solved... >>> I am just asking whether or not there is a clear design limitation that >>> would ever prevent safety in operation when 100% full (by accident). >> >> Don't user over-provisioning in case you don't want to see failure. > > That's no answer to that question. There is a lot of technical complexity behind it..... I'd say the main part is - 'fs' would need to be able to know understand it's living on provisioned device (something we actually do not want to, as you can change 'state' in runtime - so 'fs' should be aware & unaware at the same time ;) - checking with every request that thin-provisioning is in the place would impact performance, doing in mount-time make it also bad. Then you need to deal with fact, that writes to filesystem are 'process' aware, while writes to block-device are some anonymous page writes for your page cache. Have I said the level of problems for a single filesystem is totally different story yet ? So in a simple statement - thin-p has it's limits - if you are unhappy with them, then you probably need to look for some other solution - or starting sending patches and improve things around... > >> It's the same as you should not overcommit your RAM in case you do not >> want to see OOM.... > > But with RAM I'm sure you can typically see how much you have and can thus > take account of that, filesystem will report wrong figure ;-). Unfortunately you cannot.... Number of your free RAM is very fictional number ;) and you run in much bigger problems if you start overcommiting memory in kernel.... You can't compare your user-space failing malloc and OOM crashing Firefox.... Block device runs in-kernel - and as root... There are no reserves, all you know is you need to write block XY, you have no idea what is the block about.. (That's where ZFS/Btrfs was supposed to excel - they KNOW.... :) Regard Zdenek