From: Zdenek Kabelac <email@example.com>
To: "Zhangyanfei (YF)" <firstname.lastname@example.org>,
Cc: "email@example.com" <firstname.lastname@example.org>,
Subject: Re: [linux-lvm] 答复: [dm-devel] dmsetup hangs forever
Date: Fri, 27 Oct 2017 14:28:08 +0200 [thread overview]
Message-ID: <email@example.com> (raw)
Dne 27.10.2017 v 10:00 Zhangyanfei (YF) napsal(a):
> If the udevd daemon would not timeout, I think dmsetup mandatory wait udev finalizing any timeouts is good idea.
> But udevd would timeout in 180 sencond and kill the event process( systemd-udevd: timeout: killing). In this situation, I think
The point here is - udev should not be 'randomly' killing workers - and if it
needs so - it should provide a 'recovery' path mechanism, where such path
after killing worker is supported.
If this mechanism is missing - you end-up with a device which is NOT present
in udev DB - yet - there is some 'esoteric' dm device which is there
and you don't know if it's usable or not.
mandatory wait udev finalizing is useless, because udev has been killed and
can't coordination dmsetup forever. So I think it's better to tell the one who
call the dmsetup, the process error return, than let the process wait forever.
> If not add the dmsetup timeout mechanism, which strategy to solve this issue better?
You can always use i.e. cron job and every 15 minutes run:
'dmsetup udevcomplete_all 15'
To complete any blocked dmsetup/lvm command on a cookie...
The keypoint here is - you have system with broken device list - so any next
i.e. lvm2 command may 'freeze' while it would try to read a device that can
freeze the reading task - this is certainly bad case.
> 1、guarantee the udev never timeout.(but I think it is difficult to make sure any udev event will finish in 180 sencond in any abnormal situation)
> 2、modify the udev daemon, if udev event timeout，also notify the dmsetup it's done.
> 3、the one who call the command dmsetup needed timeout itself.
In principle timeouts are BAD when we talk about storage.
There needs to be some clear 'state-machine' mechanism.
It doesn't make much sense to kill worker which actually might not even be
possible if the worker freezes inside kernel.
And then there is even 2nd. category of 'kills' of workers - this happens on
overloaded machines running gazillion 'virtual guest' instances without any
resource check mechanism with assumption "OOM" killer is beautiful resource
manager in kernel - however such systems are destined for a near-time reboot -
anyone who tries to 'recover' from an OOM doesn't realize how complex that
would be - so this case is not worth dmsetup timeout either.
prev parent reply other threads:[~2017-10-27 12:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-26 8:07 [linux-lvm] dmsetup hangs forever Zhangyanfei (YF)
2017-10-26 8:39 ` [linux-lvm] [dm-devel] " Zdenek Kabelac
2017-10-27 8:00 ` [linux-lvm] 答复: " Zhangyanfei (YF)
2017-10-27 12:28 ` Zdenek Kabelac [this message]
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).