From: Chansol Kim <chansol.kim@samsung.com>
To: "Javier González" <javier@javigon.com>,
"Matias Bjørling" <mb@lightnvm.io>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: RE: [PATCH] lightnvm: pblk: fix bio leak on large sized io
Date: Mon, 11 Feb 2019 20:14:27 +0900 [thread overview]
Message-ID: <20190211111427epcms2p7e2f59a094414ec713dd98edf6ccfe9a0@epcms2p7> (raw)
In-Reply-To: <0205AB4D-7055-45E4-935D-E858EA2CA9AD@javigon.com>
On 02/05/19 11:20 AM, Javier González wrote:
>
>> On 5 Feb 2019, at 10.23, Matias Bjørling <mb@lightnvm.io> wrote:
>>
>> On 2/1/19 9:22 AM, Chansol Kim wrote:
>>> On 01/31/19 22:14 PM, Matias Bjørling wrote:
>>>> On 1/30/19 2:53 AM, 김찬솔 wrote:
>>>>> Changes:
>>>>> 1. Function pblk_rw_io to get bio* as a reference
>>>>> 2. In pblk_rw_io bio_put call on read case removed
>>>>>
>>>>> A fix to address issue where
>>>>> 1. pblk_make_rq calls pblk_rw_io passes bio* pointer as a value (0xA)
>>>>> 2. pblk_rw_io calls blk_queue_split passing bio* pointer as reference
>>>>> 3. In blk_queue_split, when there is a split, the original bio* (0xA)
>>>>> is passed to generic_make_requests, and the newly allocated bio is
>>>>> returned
>>>>> 4. If NVM_IO_DONE returned, pblk_make_rq calls bio_endio on the bio*,
>>>>> that is not the one returned by blk_queue_split
>>>>> 5. As a result bio_endio is not called on the newly allocated bio.
>>>>>
>>>>> Signed-off-by: chansol.kim <chansol.kim@samsung.com>
>>>>> ---
>>>>> drivers/lightnvm/pblk-init.c | 22 ++++++++--------------
>>>>> 1 file changed, 8 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>>> index b57f764d..4efc929 100644
>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>> @@ -31,30 +31,24 @@ static DECLARE_RWSEM(pblk_lock);
>>>>> struct bio_set pblk_bio_set;
>>>>> static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
>>>>> - struct bio *bio)
>>>>> + struct bio **bio)
>>>>> {
>>>>> - int ret;
>>>>> -
>>>>> /* Read requests must be <= 256kb due to NVMe's 64 bit completion bitmap
>>>>> * constraint. Writes can be of arbitrary size.
>>>>> */
>>>>> - if (bio_data_dir(bio) == READ) {
>>>>> - blk_queue_split(q, &bio);
>>>>> - ret = pblk_submit_read(pblk, bio);
>>>>> - if (ret == NVM_IO_DONE && bio_flagged(bio, BIO_CLONED))
>>>>> - bio_put(bio);
>>>>
>>>> Could we kill the NVM_DONE_IO check in the pblk_rw_io, that should
>>>> achieve the same?
>>> I think it is possible to remove NVM_DONE_IO check here. And in that
>>> case perhaps it is necessary to change bio_endio call to somewhere other
>>> than pblk_make_rq, otherwise endio call would not be made to the new
>>> bio*.
>>> Assuming pblk_rw_io's second parameter is to be remained as bio*, There
>>> are three cases I think needs consideration. NVM_IO_ERROR return case,
>>> the read case and the write case.
>>> In NVM_IO_ERROR return case, for both read and write. NVM_IO_ERROR
>>> received by pblk_make_rq and bio_io_error called on bio, since this bio*
>>> that pblk_submit_read and pblk_write_to_cache function tried and failed
>>> might be a new one, so bio_io_error call needs to be made inside
>>> pblk_rw_io.
>>> In read case, there are three sub-cases. The first is All data is available
>>> in ring buffer and NVM_IO_DONE is returned. The second is all to be read
>>> from the device, which currently NVM_IO_OK is returned and endio is
>>> called after read completion from the device. The third is partial read,
>>> where the data that needs to be read from the device is read
>>> synchronously and pblk_rw_io returns NVM_IO_DONE.
>>> In write case, there are two sub-cases. Firstly, non REQ_PRE_FLUSH case,
>>> pblk_write_cache wil return either NVM_IO_DONE or NVM_IO_ERROR. A endio
>>> call is required in place somewhere NVM_IO_DONE is decided.
>>> For REQ_PREFLUSH case bio (new bio* if split) is added to w_ctx.bios,
>>> pblk_write_to_cache will return either NVM_IO_OK or NVM_IO_ERROR. bio*
>>> added to w_ctx.bios will be called by bio_endio on write completion to
>>> the disk. So it is already taken care of.
>>> In summary my feeling is that having pblk_rw_io receive bio* as a
>>> reference and removing bio_put in pblk_rw_io would be the minimum
>>> change. Please share your insight, I will try experimenting alternatives.
>>
>> What rubs me the wrong way is that that pattern isn't used in the rest
>> of kernel. I would rather move the calls to bio_io_error and bio_endio
>> into the pblk_rw_io() function. The implementation of pblk_rw_io()
>> leaks out to pblk_make_rq(). The code is a mismatch of some bio_endio
>> calls inside the pblk_rw_io, and others outside. It's not coherent.
>
> I agree that NVM_IO_DONE is now more confusing than anything - this
> comes from the rrpc days... Removing it here will require some
> refactoring on the partial read path, but nothing too dramatic.
>
> I'm also OK with unfolding pblk_rw_io() into pblk_make_rq().
>
> Chansol: do you want to give it a go?
>
> Javier
>
Matias, like you mentioned and Javier suggested, unfolding pblk_rw_io
would make it more coherent with regards to call sites of bio_endio,
including REQ_OP_DISCARD with REQ_PREFLUSH unset case. pblk_make_rq
would be the place to call bio_io_error in case of NVM_IO_ERR, and to
call bio_endio for NVM_IO_DONE.
Javier: I am very up for it. Unfolding pblk_rw_io into pblk_make_rq
function. I will make the change, test etc, and submit the patch (with
better comment this time).
Thank you.
Chansol Kim
prev parent reply other threads:[~2019-02-11 11:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20190130015343epcms2p14be92e88982e86f5e9d494e3bdc3fb2a@epcms2p1>
2019-01-30 1:53 ` [PATCH] lightnvm: pblk: fix bio leak on large sized io 김찬솔
2019-01-30 6:58 ` Javier González
2019-01-30 14:06 ` Matias Bjørling
2019-01-30 15:02 ` Javier González
[not found] ` <CGME20190130015343epcms2p14be92e88982e86f5e9d494e3bdc3fb2a@epcms2p2>
2019-01-31 5:55 ` Chansol Kim
2019-01-31 21:14 ` Matias Bjørling
2019-02-01 8:22 ` Chansol Kim
2019-02-05 9:23 ` Matias Bjørling
2019-02-05 10:20 ` Javier González
[not found] ` <CGME20190130015343epcms2p14be92e88982e86f5e9d494e3bdc3fb2a@epcms2p7>
2019-02-11 11:14 ` Chansol Kim [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190211111427epcms2p7e2f59a094414ec713dd98edf6ccfe9a0@epcms2p7 \
--to=chansol.kim@samsung.com \
--cc=javier@javigon.com \
--cc=linux-block@vger.kernel.org \
--cc=mb@lightnvm.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).