linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* problem with bio handling on raid5 and pblk
@ 2018-03-22 14:34 Javier González
  2018-03-22 17:00 ` Matias Bjørling
  0 siblings, 1 reply; 4+ messages in thread
From: Javier González @ 2018-03-22 14:34 UTC (permalink / raw)
  To: Jens Axboe, shli; +Cc: linux-raid, linux-block, LKML, Huaicheng Li

[-- Attachment #1: Type: text/plain, Size: 1846 bytes --]

Hi,

I have been looking into a bug report when using pblk and raid5 on top
and I am having problems understanding if the problem is in pblk's bio
handling or on raid5's bio assumptions on the completion path.

The problem occurs on the read path. In pblk, we take a reference to
every read bio as it enters, and release it after completing the bio.

   generic_make_request()
   pblk_submit_read()
     bio_get()
     ...
     bio_endio()
     bio_put()

The problem seems to be that on raid5's bi_end_io completion path,
raid5_end_read_request(), bio_reset() is called. When put together
with pblk's bio handling:

   generic_make_request()
   pblk_submit_read()
     bio_get()
     ...
     bio_endio()
     raid5_end_read_request()
       bio_reset()
     bio_put()

it results in the newly reset bio being put immediately, thus freed.
When the bio is reused then, we have an invalid pointer. In the report
we received things crash at BUG_ON(bio->bi_next) at
generic_make_request().

As far as I understand, it is part of the bio normal operation for
drivers under generic_make_request() to be able to take references and
release them after bio completion. Thus, in this case, the assumption
made by raid5, that it can issue a bio_reset() is incorrect. But I might
be missing an implicit cross layer rule that we are violating in pblk.
Any ideas?

This said, after analyzing the problem from pblk's perspective, I see
not reason to use bio_get()/bio_put() in the read path as it is at the
pblk level that we are submitting bio_endio(), thus we cannot risk the
bio being freed underneath us. Is this reasoning correct? I remember I
introduced these at the time there was a bug on the aio path, which was
not cleaning up correctly and could trigger an early bio free, but
revisiting it now, it seems unnecessary.

Thanks for the help!

Javier


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-04-03 14:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22 14:34 problem with bio handling on raid5 and pblk Javier González
2018-03-22 17:00 ` Matias Bjørling
2018-03-23 12:52   ` Javier González
2018-04-03 14:28     ` Javier González

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).