RE: [PATCH] lightnvm: pblk: fix bio leak on large sized io

From: Chansol Kim <chansol.kim@samsung.com>
To: "Javier González" <javier@javigon.com>,
	"Matias Bjørling" <mb@lightnvm.io>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: RE: [PATCH] lightnvm: pblk: fix bio leak on large sized io
Date: Mon, 11 Feb 2019 20:14:27 +0900	[thread overview]
Message-ID: <20190211111427epcms2p7e2f59a094414ec713dd98edf6ccfe9a0@epcms2p7> (raw)
In-Reply-To: <0205AB4D-7055-45E4-935D-E858EA2CA9AD@javigon.com>

On 02/05/19 11:20 AM, Javier González wrote:
>
>> On 5 Feb 2019, at 10.23, Matias Bjørling <mb@lightnvm.io> wrote:
>> 
>> On 2/1/19 9:22 AM, Chansol Kim wrote:
>>> On 01/31/19 22:14 PM, Matias Bjørling wrote:
>>>> On 1/30/19 2:53 AM, 김찬솔 wrote:
>>>>> Changes:
>>>>>   1. Function pblk_rw_io to get bio* as a reference
>>>>>   2. In pblk_rw_io bio_put call on read case removed
>>>>> 
>>>>> A fix to address issue where
>>>>>   1. pblk_make_rq calls pblk_rw_io passes bio* pointer as a value (0xA)
>>>>>   2. pblk_rw_io calls blk_queue_split passing bio* pointer as reference
>>>>>   3. In blk_queue_split, when there is a split, the original bio* (0xA)
>>>>>      is passed to generic_make_requests, and the newly allocated bio is
>>>>>      returned
>>>>>   4. If NVM_IO_DONE returned, pblk_make_rq calls bio_endio on the bio*,
>>>>>      that is not the one returned by blk_queue_split
>>>>>   5. As a result bio_endio is not called on the newly allocated bio.
>>>>> 
>>>>> Signed-off-by: chansol.kim <chansol.kim@samsung.com>
>>>>> ---
>>>>>   drivers/lightnvm/pblk-init.c | 22 ++++++++--------------
>>>>>   1 file changed, 8 insertions(+), 14 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>>> index b57f764d..4efc929 100644
>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>> @@ -31,30 +31,24 @@ static DECLARE_RWSEM(pblk_lock);
>>>>>   struct bio_set pblk_bio_set;
>>>>>      static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
>>>>> -			  struct bio *bio)
>>>>> +			  struct bio **bio)
>>>>>   {
>>>>> -	int ret;
>>>>> -
>>>>>   	/* Read requests must be <= 256kb due to NVMe's 64 bit completion bitmap
>>>>>   	 * constraint. Writes can be of arbitrary size.
>>>>>   	 */
>>>>> -	if (bio_data_dir(bio) == READ) {
>>>>> -		blk_queue_split(q, &bio);
>>>>> -		ret = pblk_submit_read(pblk, bio);
>>>>> -		if (ret == NVM_IO_DONE && bio_flagged(bio, BIO_CLONED))
>>>>> -			bio_put(bio);
>>>> 
>>>> Could we kill the NVM_DONE_IO check in the pblk_rw_io, that should
>>>> achieve the same?
>>> I think it is possible to remove NVM_DONE_IO check here. And in that
>>> case perhaps it is necessary to change bio_endio call to somewhere other
>>> than pblk_make_rq, otherwise endio call would not be made to the new
>>> bio*.
>>> Assuming pblk_rw_io's second parameter is to be remained as bio*, There
>>> are three cases I think needs consideration. NVM_IO_ERROR return case,
>>> the read case and the write case.
>>> In NVM_IO_ERROR return case, for both read and write. NVM_IO_ERROR
>>> received by pblk_make_rq and bio_io_error called on bio, since this bio*
>>> that pblk_submit_read and pblk_write_to_cache function tried and failed
>>> might be a new one, so bio_io_error call needs to be made inside
>>> pblk_rw_io.
>>> In read case, there are three sub-cases. The first is All data is available
>>> in ring buffer and NVM_IO_DONE is returned. The second is all to be read
>>> from the device, which currently NVM_IO_OK is returned and endio is
>>> called after read completion from the device. The third is partial read,
>>> where the data that needs to be read from the device is read
>>> synchronously and pblk_rw_io returns NVM_IO_DONE.
>>> In write case, there are two sub-cases. Firstly, non REQ_PRE_FLUSH case,
>>> pblk_write_cache wil return either NVM_IO_DONE or NVM_IO_ERROR. A endio
>>> call is required in place somewhere NVM_IO_DONE is decided.
>>> For REQ_PREFLUSH case bio (new bio* if split) is added to w_ctx.bios,
>>> pblk_write_to_cache will return either NVM_IO_OK or NVM_IO_ERROR. bio*
>>> added to w_ctx.bios will be called by bio_endio on write completion to
>>> the disk. So it is already taken care of.
>>> In summary my feeling is that having pblk_rw_io receive bio* as a
>>> reference and removing bio_put in pblk_rw_io would be the minimum
>>> change. Please share your insight, I will try experimenting alternatives.
>> 
>> What rubs me the wrong way is that that pattern isn't used in the rest
>> of kernel. I would rather move the calls to bio_io_error and bio_endio
>> into the pblk_rw_io() function. The implementation of pblk_rw_io()
>> leaks out to pblk_make_rq(). The code is a mismatch of some bio_endio
>> calls inside the pblk_rw_io, and others outside. It's not coherent.
>
> I agree that NVM_IO_DONE is now more confusing than anything - this
> comes from the rrpc days... Removing it here will require some
> refactoring on the partial read path, but nothing too dramatic.
>
> I'm also OK with unfolding pblk_rw_io() into pblk_make_rq().
>
> Chansol: do you want to give it a go?
>
> Javier
>

Matias, like you mentioned and Javier suggested, unfolding pblk_rw_io
would make it more coherent with regards to call sites of bio_endio,
including REQ_OP_DISCARD with REQ_PREFLUSH unset case. pblk_make_rq
would be the place to call bio_io_error in case of NVM_IO_ERR, and to
call bio_endio for NVM_IO_DONE.

Javier: I am very up for it. Unfolding pblk_rw_io into pblk_make_rq
function. I will make the change, test etc, and submit the patch (with
better comment this time).

Thank you.

Chansol Kim