From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Gruenbacher Date: Wed, 10 Feb 2016 02:33:49 +0100 Subject: [Cluster-devel] DLM Shutdown Message-ID: List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Dave and Chrissie, I recently started looking into how DLM works, with the help of Chrissie's "Programming Locking Applications" handbook (http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf). I didn't find a simple way for testing DLM in a minimal setup: DLM requires dlm_controld which depends on corosync. dlm_controld needs some sort of membership management service and so I understand that it uses corosync, but from a testing perspective, having something simple would still be nice. So I started writing FakeDLM, a toy dlm_controld substitute (https://github.com/andreas-gruenbacher/fakedlm). This turned up a problem when shutting down DLM: dlm_controld only shuts itself down on SIGTERM when no lockspaces exist anymore, it never actively releases existing lockspaces. This means that as soon as any application creates the default lockspace (via libdlm), or if an application doesn't release any lockspaces it creates, dlm_controld will never shut down. It would make more sense, at least for testing purposes, to try removing existing lockspaces and to perform a proper cleanup, though. The only way I could find to make that happen is to do what dlm_release_lockslace() in libdlm does though: to use DLM_USER_REMOVE_LOCKSPACE requests. An added difficulty is that lockspaces can be "created" with DLM_USER_CREATE_LOCKSPACE multiple times (they are only created the first time), and only an equivalent number of DLM_USER_REMOVE_LOCKSPACE requests will eventually remove the lockspace. In addition, the DLM_USER_REMOVE_LOCKSPACE requests are blocking so they cannot be written to /dev/misc/dlm-control synchronously from the process that handles the offline@/kernel/dlm/ uevents which the removal of a lockspace triggers for cleaning up the lockspace configuration. Using aio_write instead has lead to lockdep warnings and a deadlock in the kernel; I haven't found out the reason for this problem yet, though. Any ideas would be welcome. Thanks, Andreas