From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Wed, 26 Apr 2017 16:28:40 -0400 (EDT) Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Change recovery to use mutex instead of spin_lock In-Reply-To: References: <1556412740.17222239.1492627959810.JavaMail.zimbra@redhat.com> Message-ID: <1613629309.1468188.1493238520377.JavaMail.zimbra@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit ----- Original Message ----- | If this is a spin lock, then it should not be covering anything that | blocks. Looking down lock_dlm.c, I don't see anything that obviously is | going to take a long time, so I wonder what is the real problem here? Is | there something under the spinlock taking a long time, or is one of the | bits of code it protects being called a very large number of times? | | Mostly it seems to be covering the recovery state, and that shouldn't | take very long to update, particularly compared with longer running | operations such as journal recovery, so I think we need to look a bit | harder at what is going on here, | | Steve. Based on your email, I studied how the spin_lock was used. It turns out that it's not called that often, but there are some places where recovery functions take the spin_lock then call queue_delayed_work, which can block on its own spin_lock. In my case, I've got 60 GFS2 mount points, and 16 cpus, and therefore 60 of everything. So maybe that's a better thing to address here. I'll see if I can rework it to queue the work after the spin_lock is released without harm. Regards, Bob Peterson Red Hat File Systems