From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bob Peterson <rpeterso@redhat.com>
Date: Wed, 26 Apr 2017 16:28:40 -0400 (EDT)
Subject: [Cluster-devel] [GFS2 PATCH] GFS2: Change recovery to use mutex
 instead of spin_lock
In-Reply-To: <b5d9c6b9-9cfa-189d-27c5-6149969ace3e@redhat.com>
References: <1556412740.17222239.1492627959810.JavaMail.zimbra@redhat.com>
	<b5d9c6b9-9cfa-189d-27c5-6149969ace3e@redhat.com>
Message-ID: <1613629309.1468188.1493238520377.JavaMail.zimbra@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

----- Original Message -----
| If this is a spin lock, then it should not be covering anything that
| blocks. Looking down lock_dlm.c, I don't see anything that obviously is
| going to take a long time, so I wonder what is the real problem here? Is
| there something under the spinlock taking a long time, or is one of the
| bits of code it protects being called a very large number of times?
| 
| Mostly it seems to be covering the recovery state, and that shouldn't
| take very long to update, particularly compared with longer running
| operations such as journal recovery, so I think we need to look a bit
| harder at what is going on here,
| 
| Steve.

Based on your email, I studied how the spin_lock was used.
It turns out that it's not called that often, but there are some places
where recovery functions take the spin_lock then call queue_delayed_work,
which can block on its own spin_lock.

In my case, I've got 60 GFS2 mount points, and 16 cpus, and therefore
60 of everything. So maybe that's a better thing to address here.
I'll see if I can rework it to queue the work after the spin_lock is
released without harm.

Regards,

Bob Peterson
Red Hat File Systems