archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] intermittent transaction ID mismatch with rapid use of thin snapshots
@ 2023-05-23 19:44 Michael McCracken
  0 siblings, 0 replies; only message in thread
From: Michael McCracken @ 2023-05-23 19:44 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1.1: Type: text/plain, Size: 3030 bytes --]

Hi, I am seeing occasional hard to reproduce failures to lvcreate a thin LV,
with transaction ID mismatch errors.

The system is a self-managed compute node that uses thin LVs for base system
software and containers - each service has a separate thin LV as its
rootfs, and
the system takes a fresh thin snapshot of the installed contents at every

During system bring up we have two concurrent processes adding, deleting,
renaming thin LVs from a single thin pool:

1 - a login script that creates a thin snapshot of a minimal rootfs for each
user, then launches an LXC container with that rootfs, and leaves the user
bash running in that container. If any issues occur in any of that process,
will lvremove the snapshot and retry several times. Although it creates a
container, the script itself runs as sudo root, not inside any

2 - a software install service that takes system services packaged as
and creates thin LVs based on the container image layer set, and then takes
additional thin snapshot to be mounted for the current boot. this last
is multi-step, with an initial lvcreate of a temp name and a final lvrename.

Neither of these processes are in containers or on a VM with a shared
volume, so
they should be seeing the same LVM lock files, as far as I can tell.

This overall approach has been stable for a long time, but a recent change
caused these to overlap more frequently, and we are now seeing failures in
lvcreate with a transaction id mismatch when the install service tries to
its temporary LV - here's a snippet from one such log:

Error: lvm lvcreate --activate=y --setactivationskip=y
--ignoreactivationskip --name=tmp-extract-414e5ec83c02133eae2984ee4
25b22589bca058d --snapshot
exit status 5:   /dev/sdh: open failed: No medium found
/dev/sdi: open failed: No medium found
/dev/sdk: open failed: No medium found
/dev/sdh: open failed: No medium found
/dev/sdi: open failed: No medium found
/dev/sdj: open failed: No medium found
/dev/sdk: open failed: No medium fou
ThinDataLV-tpool (251:3) transaction_id is 147, while expected 148.
Failed to suspend vg_ifc0/ThinDataLV with queued messages.

Due to some failure recovery loops, these services are running
lvcreate/lvremove/lvrename (on same VG but different LVs) as often as 5
per second, which seems fast but doesn't seem like it should be a problem.

Looking through past messages to this list, it looks like previous cases
due to sharing volumes between containers/vms without a common lock dir,
we are not doing.

Any thoughts on how to further debug or avoid this issue?

I can provide the lvm metadata backup files if that would help - there are
a lot
of them, as once it starts failing, the system retries frequently.

Ihis is on Ubuntu 20.04, with lvm 2.03.07(2)
(ubuntu package version 2.03.07-1ubuntu1)
and a custom kernel built from 5.15.68.


[-- Attachment #1.2: Type: text/html, Size: 3326 bytes --]

[-- Attachment #2: Type: text/plain, Size: 202 bytes --]

linux-lvm mailing list
read the LVM HOW-TO at

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-05-24  4:50 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23 19:44 [linux-lvm] intermittent transaction ID mismatch with rapid use of thin snapshots Michael McCracken

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).