Re: [PATCH] loop: replace loop_ctl_mutex with loop_idr_spinlock

From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>, Hillf Danton <hdanton@sina.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Tyler Hicks <tyhicks@linux.microsoft.com>,
	linux-block <linux-block@vger.kernel.org>
Subject: Re: [PATCH] loop: replace loop_ctl_mutex with loop_idr_spinlock
Date: Sat, 28 Aug 2021 10:10:36 +0900	[thread overview]
Message-ID: <73c53177-be1b-cff1-a09e-ef7979a95200@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20210827184302.GA29967@lst.de>

On 2021/08/28 3:43, Christoph Hellwig wrote:
> On Fri, Aug 27, 2021 at 01:03:45AM +0900, Tetsuo Handa wrote:
>>
>> loop_unregister_transfer() which is called from cleanup_cryptoloop()
>> currently lacks serialization between kfree() from loop_remove() from
>> loop_control_remove() and mutex_lock() from unregister_transfer_cb().
>> We can use refcount and loop_idr_spinlock for serialization between
>> these functions.
> 
> 
> So before we start complicating things for loop_release_xfer - how
> do you actually reproduce loop_unregister_transfer finding a loop
> device with a transfer set?  AFAICS loop_unregister_transfer is only
> called form exit_cryptoloop, which can only be called when
> cryptoloop has a zero reference count.  But as long as a transfer
> is registered an extra refcount is held on its owner.

Indeed, lo->lo_encryption is set to non-NULL by loop_init_xfer() after a refcount
is taken and lo->lo_encryption is reset to NULL by loop_release_xfer() before
that refount is dropped, and these operations are serialized by lo->lo_mutex. Then,
lo->lo_encryption == xfer can't happen unless forced module unload is requested.

That is, it seems that unregister_transfer_cb() is there in case forced module
unload of cryptoloop module was requested. And in that case, there is no point
with crashing the kernel via panic_on_warn == 1 && WARN_ON_ONCE(). Simple printk()
will be sufficient.

Removing unregister_transfer_cb() (if we ignore forced module unload) will be
a separate patch.

> 
>> @@ -2313,20 +2320,20 @@ static int loop_add(int i)
>>  		goto out;
>>  	lo->lo_state = Lo_unbound;
>>  
>> -	err = mutex_lock_killable(&loop_ctl_mutex);
>> -	if (err)
>> -		goto out_free_dev;
>> -
>>  	/* allocate id, if @id >= 0, we're requesting that specific id */
>> +	idr_preload(GFP_KERNEL);
>> +	spin_lock(&loop_idr_spinlock);
>>  	if (i >= 0) {
>> -		err = idr_alloc(&loop_index_idr, lo, i, i + 1, GFP_KERNEL);
>> +		err = idr_alloc(&loop_index_idr, lo, i, i + 1, GFP_ATOMIC);
>>  		if (err == -ENOSPC)
>>  			err = -EEXIST;
>>  	} else {
>> -		err = idr_alloc(&loop_index_idr, lo, 0, 0, GFP_KERNEL);
>> +		err = idr_alloc(&loop_index_idr, lo, 0, 0, GFP_ATOMIC);
>>  	}
>> +	spin_unlock(&loop_idr_spinlock);
>> +	idr_preload_end();
> 
> Can you explain why the mutex is switched to a spinlock?  I could not
> find any caller that can't block, so there doesn't seem to be a real
> need for a spinlock, while a spinlock requires extra work and GFP_ATOMIC
> allocations here.  Dropping the _killable probably makes some sense,
> but seems like a separate cleanup.

In order to annotate that extra operations that might sleep should not be
added inside this section. Use of sleepable locks tends to get extra
operations (e.g. wait for a different mutex / completion) and makes it unclear
what the lock is protecting. I can imagine a future that someone adds an
unwanted dependency inside this section if we use mutex here.

Technically, we can add preempt_diable() after mutex_lock() and
preempt_enable() before mutex_unlock() in order to annotate that
extra operations that might sleep should be avoided.
But idr_alloc(GFP_ATOMIC)/idr_find()/idr_for_each_entry() etc. will be
fast enough.

> 
>> +	if (!lo || !refcount_inc_not_zero(&lo->idr_visible)) {
>> +		spin_unlock(&loop_idr_spinlock);
>> +		return -ENODEV;
>>  	}
>> +	spin_unlock(&loop_idr_spinlock);
> 
>> +	refcount_dec(&lo->idr_visible);
>> +	/*
>> +	 * Try to wait for concurrent callers (they should complete shortly due to
>> +	 * lo->lo_state == Lo_deleting) operating on this loop device, in order to
>> +	 * help that subsequent loop_add() will not to fail with -EEXIST.
>> +	 * Note that this is best effort.
>> +	 */
>> +	for (ret = 0; refcount_read(&lo->idr_visible) != 1 && ret < HZ; ret++)
>> +		schedule_timeout_killable(1);
>> +	ret = 0;
> 
> This dance looks pretty strange to me.  I think just making idr_visible
> an atomic_t and using atomic_cmpxchg with just 0 and 1 as valid versions
> will make this much simpler, as it avoids the need to deal with a > 1
> count, and it also serializes multiple removal calls.

Yes if we ignore forced module unload (which needs to synchronously check
lo->lo_encryption) of cryptoloop module.

If we don't ignore forced module unload, we could update my patch to keep only
mutex_destroy() and kfree() deferred by a refcount, for only lo->lo_state,
lo->lo_refcnt and lo->lo_encryption would be accessed under lo->lo_mutex
serialization. There is no need to defer "del_gendisk() + idr_remove()"
sequence for concurrent callers.