From mboxrd@z Thu Jan 1 00:00:00 1970 References: <47346a29-e6c7-6e22-4360-2d07e2ec7be3@redhat.com> <7839ff52-18e5-6a95-9a2a-12ea73457700@redhat.com> <62151b2e-c21a-177e-f66b-e2e08857be17@redhat.com> From: Zdenek Kabelac Message-ID: <6514b2be-67d7-3dfe-38b9-95e3bb39f55a@redhat.com> Date: Thu, 11 Apr 2019 15:13:47 +0200 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] Aborting. LV mythinpool_tmeta is now incomplete Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Eric Ren Cc: LVM general discussion and development , thornber@redhat.com, LVM2 development Dne 11. 04. 19 v 15:09 Eric Ren napsal(a): > Hi, > >> So do you get 'partial' error on thin-pool activation on your physical server ? > > Yes, the VG of the thin pool only has one simple physical disk, at > beginning, I also suspected the disk may disconnect at that moment. > But, I start to think maybe it is caused by some reason hidden in the > interaction between lvm and dm driver in kernel. > > It can not be reproduced easily, but happens randomly for several > times. The behavior model of lvm abstracted from the upper service is > like: > there are many (64) control flow in parallel, in each one it loops to > randomly create/activate/delete thin LVs. > > The error happened two place: > 1. activate the thin LV: _lv_activate -> _tree_action -> > dev_manager_activate -> _add_new_lv_to_dtree -> add_areas_line -> > striped_add_target_line on **metadata LV**, > I don't what .add_target_line() does for? > > 2. fail to suspend the origin LV when created. > > I'm trying to reproduce it in a simple way, will report once succeed :-) > > Hi Well if your setup 'sits' on the multipath - and there are 'moments' where non of the paths are available and it happens rightly during the activation, then lvm2 can consider device is missing. It could be there is missing 'feature' where certain device types may need some 'threshold' of retries??? to consider it being gone - I don't know... Depends how common such use case is. Also you should collect kernel logs from the moment you observe such behavior, maybe multipath is not setup properly ? Anyway - proper reproducer with full -vvvv log would be really the most explanatory and needed to move on here. Regards Zdenek