From: Boaz Harrosh <bharrosh@panasas.com>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
Tejun Heo <htejun@gmail.com>, Mike Galbraith <efault@gmx.de>,
James.Bottomley@HansenPartnership.com, tomof@acm.org,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org,
jgarzik@pobox.com, bzolnier@gmail.com
Subject: Re: [PATCH] blk: missing add of padded bytes to io completion byte count
Date: Wed, 05 Mar 2008 14:46:44 +0200 [thread overview]
Message-ID: <47CE9634.6040501@panasas.com> (raw)
In-Reply-To: <20080305123317.GI6704@kernel.dk>
On Wed, Mar 05 2008 at 14:33 +0200, Jens Axboe <jens.axboe@oracle.com> wrote:
> On Wed, Mar 05 2008, Boaz Harrosh wrote:
>> On Wed, Mar 05 2008 at 2:26 +0200, FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
>>> On Wed, 05 Mar 2008 08:33:05 +0900
>>> Tejun Heo <htejun@gmail.com> wrote:
>>>
>>>> FUJITA Tomonori wrote:
>>>>> Hmm, does SCSI mid-layer need to care about how many bytes the block
>>>>> layer allocates? I don't think that extra_len is NOT good_bytes.
>>>>>
>>>>> I think that the block layer had better take care about it (fix
>>>>> __end_that_request_first?).
>>>> Yeah, probably calling completion functions w/o bytes count is the right
>>>> thing to do but what I was talking about was what could break when the
>>>> semantics of rq->data_len changed. If we keep rq->data_len() ==
>>>> sum(sg), we keep it business as usual for all the rest except for the
>>>> device application layer if we don't we do the reverse and SCSI midlayer
>>>> completion was a good example, I think.
>>> sglist is a low-level I/O representation for device drivers. SCSI
>>> midlayer should not care about sglist. We should not fix SCSI midlayer
>>> for rq->data_len != sum(sg) change (so I can't agree with your
>>> diagrams in another mail).
>>>
>>> When if we change a rule, we need to fix something.
>>>
>>> If we keep rq->data_len == sum(sg), we need to fix the device
>>> application layer. If we keep rq->data_len == the true data length, we
>>> need to fix the low-level drivers.
>>>
>>> Now I'm fine with the commit e97a294ef6938512b655b1abf17656cf2b26f709
>>> since we are in -rc stages. But I plan to send a patch to revert it
>>> and fix this issue in the block layer. I'd like to test it in -mm for
>>> a while.
>> No this commit is a serious bug, and the only fix is like you suggested
>> in __end_that_request_first. This is because it breaks that scsi-ml loop
>> where scsi_bufflen() can be less then blk_rq_bytes(). In that case this
>> commit is a data corruption.
>>
>>> Only sglist stuff in SCSI midlayer is scsi_req_map_sg now. As you
>>> know, we really want to remove it.
>>>
>>>
>>>> Things going the other way is fine with me but I at least want to hear a
>>>> valid rationale. Till now all I got is "because that's the true size"
>>>> which doesn't really make much sense to me.
>>> Most of users of request structure care about only the real data
>>> length, don't care about padding and drain length. Why do they bother
>>> to use a helper function to get the real data length?
>>> --
>> Submitted is the right fix to this problem, as pointed out by TOMO.
>> Please test it solves the CD burning problem.
>> (The patch includes the revert of commit e97a294e)
>> ---
>> From: Boaz Harrosh <bharrosh@panasas.com>
>> Date: Wed, 5 Mar 2008 12:07:12 +0200
>> Subject: [PATCH] blk: missing add of padded bytes to io completion byte count
>>
>> the commit e97a294ef6938512b655b1abf17656cf2b26f709 was very wrong. This is
>> because scsi-ml supports the ability to split a request into smaller chunks,
>> in which case scsi_bufflen() is smaller then request length. Then at completion
>> time the remainder can be issued as a new scsi command. In that case the above
>> commit is a data corruption.
>
> We needed something for -rc4, so it had to be rushed a bit...
>
>> Also in this fix all users of block layer are taken care of, and not only
>> scsi devices.
>>
>> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>> ---
>> block/blk-core.c | 4 ++++
>> drivers/scsi/scsi.c | 2 +-
>> 2 files changed, 5 insertions(+), 1 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 2a438a9..37fcccc 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1549,6 +1549,9 @@ static int __end_that_request_first(struct request *req, int error,
>> nr_bytes >> 9, req->sector);
>> }
>>
>> + if (nr_bytes >= blk_rq_bytes(req))
>> + nr_bytes += req->extra_len;
>> +
>> total_bytes = bio_nbytes = 0;
>> while ((bio = req->bio) != NULL) {
>> int nbytes;
>> @@ -1616,6 +1619,7 @@ static int __end_that_request_first(struct request *req, int error,
>> if (!req->bio)
>> return 0;
>>
>> + BUG_ON(total_bytes >= blk_rq_bytes(req));
>
> Make that a WARN_ON() first please. It's indeed a bug, but it wont be
> critical and it's not fair killing everything since this padding stuff
> is so fresh and may still need a tweak or two.
>
> I'd be fine with making it a BUG_ON() post 2.6.25.
>
Updated, you are absolutely right, thanks.
Will you commit below patch for 2.6.25? I know that, at the time, I have
seen this scsi-ml-loop in action on a sata drive here in the lab, on an
x86_64 machine. The current solution will silently corrupt data, which
is very hard to find.
Boaz
---
From: Boaz Harrosh <bharrosh@panasas.com>
Date: Wed, 5 Mar 2008 12:07:12 +0200
Subject: [PATCH] blk: missing add of padded bytes to io completion byte count
the commit e97a294ef6938512b655b1abf17656cf2b26f709 was very wrong. This is
because scsi-ml supports the ability to split a request into smaller chunks,
in which case scsi_bufflen() is smaller then request length. Then at completion
time the remainder can be issued as a new scsi command. In that case the above
commit is a data corruption.
Also in this fix all users of block layer are taken care of, and not only
scsi devices.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
---
block/blk-core.c | 4 ++++
drivers/scsi/scsi.c | 2 +-
2 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 2a438a9..c82e68a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1549,6 +1549,9 @@ static int __end_that_request_first(struct request *req, int error,
nr_bytes >> 9, req->sector);
}
+ if (nr_bytes >= blk_rq_bytes(req))
+ nr_bytes += req->extra_len;
+
total_bytes = bio_nbytes = 0;
while ((bio = req->bio) != NULL) {
int nbytes;
@@ -1616,6 +1619,7 @@ static int __end_that_request_first(struct request *req, int error,
if (!req->bio)
return 0;
+ WARN_ON(total_bytes >= blk_rq_bytes(req));
/*
* if the request wasn't completed, update state
*/
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index e5c6f6a..fecba05 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -757,7 +757,7 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
"Notifying upper driver of completion "
"(result %x)\n", cmd->result));
- good_bytes = scsi_bufflen(cmd) + cmd->request->extra_len;
+ good_bytes = scsi_bufflen(cmd);
if (cmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
drv = scsi_cmd_to_driver(cmd);
if (drv->done)
--
1.5.3.3
next prev parent reply other threads:[~2008-03-05 12:53 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-21 8:42 regression: CD burning (k3b) went broke Mike Galbraith
2008-02-22 7:32 ` Jens Axboe
2008-02-23 7:42 ` Mike Galbraith
2008-02-24 7:54 ` Mike Galbraith
2008-02-26 9:48 ` Mike Galbraith
2008-02-26 13:36 ` Mike Galbraith
2008-02-26 23:08 ` Andrew Morton
2008-02-27 0:46 ` Jeff Garzik
2008-02-27 2:58 ` Mike Galbraith
2008-02-27 2:24 ` Mike Galbraith
2008-02-27 6:00 ` Mike Galbraith
2008-02-27 7:07 ` Mike Galbraith
2008-02-28 7:43 ` Tejun Heo
2008-02-28 8:20 ` Mike Galbraith
2008-02-28 8:50 ` [PATCH] block: fix residual byte count handling Tejun Heo
2008-02-28 15:35 ` Jens Axboe
2008-02-28 15:46 ` Tejun Heo
2008-02-29 16:47 ` James Bottomley
2008-02-29 20:11 ` Jens Axboe
2008-03-01 6:17 ` Tejun Heo
2008-03-01 15:19 ` James Bottomley
2008-03-02 14:52 ` FUJITA Tomonori
2008-03-02 18:46 ` Mike Christie
2008-03-03 3:27 ` Mike Galbraith
2008-03-03 2:40 ` Tejun Heo
2008-03-03 3:59 ` FUJITA Tomonori
2008-03-03 4:09 ` Tejun Heo
2008-03-03 6:08 ` [PATCH 1/2] " Tejun Heo
2008-03-03 6:10 ` [PATCH] block: separate out padding from alignment Tejun Heo
2008-03-03 18:27 ` James Bottomley
2008-03-03 8:26 ` [PATCH] block: fix residual byte count handling FUJITA Tomonori
2008-03-03 9:21 ` Tejun Heo
2008-03-03 12:17 ` FUJITA Tomonori
2008-03-03 13:38 ` Tejun Heo
2008-03-03 13:50 ` FUJITA Tomonori
2008-03-03 13:55 ` Tejun Heo
2008-03-03 14:01 ` FUJITA Tomonori
2008-03-03 14:22 ` Tejun Heo
2008-03-03 14:52 ` FUJITA Tomonori
2008-03-03 22:44 ` Tejun Heo
2008-03-04 2:11 ` FUJITA Tomonori
2008-03-04 2:32 ` Tejun Heo
2008-03-04 8:53 ` FUJITA Tomonori
2008-03-04 8:59 ` Jens Axboe
2008-03-04 9:06 ` FUJITA Tomonori
2008-03-04 9:22 ` FUJITA Tomonori
2008-03-04 9:30 ` Tejun Heo
2008-03-04 9:35 ` Jens Axboe
2008-03-04 9:40 ` Tejun Heo
2008-03-04 9:46 ` Jens Axboe
2008-03-04 12:37 ` Mike Galbraith
2008-03-04 12:39 ` Jens Axboe
2008-03-04 12:43 ` Mike Galbraith
2008-03-04 12:58 ` Mike Galbraith
2008-03-04 13:03 ` Jens Axboe
2008-03-04 14:25 ` Mike Galbraith
2008-03-04 18:17 ` Jens Axboe
2008-03-04 18:29 ` Jens Axboe
2008-03-04 18:35 ` Mike Galbraith
2008-03-04 18:45 ` Jens Axboe
2008-03-04 18:49 ` Mike Galbraith
2008-03-04 18:54 ` Jens Axboe
2008-03-04 19:26 ` Mike Galbraith
2008-03-04 19:28 ` Jens Axboe
2008-03-04 16:04 ` James Bottomley
2008-03-04 18:46 ` Jens Axboe
2008-03-04 17:34 ` walt
2008-03-04 17:59 ` Tejun Heo
2008-03-04 19:42 ` Kiyoshi Ueda
2008-03-04 12:40 ` Tejun Heo
2008-03-04 12:45 ` Mike Galbraith
2008-03-04 13:30 ` FUJITA Tomonori
2008-03-04 13:50 ` Tejun Heo
2008-03-04 16:17 ` Tejun Heo
2008-03-04 16:42 ` Tejun Heo
2008-03-04 18:26 ` Boaz Harrosh
2008-03-04 18:35 ` Tejun Heo
2008-03-04 18:27 ` James Bottomley
2008-03-04 18:33 ` Tejun Heo
2008-03-04 18:45 ` Mike Galbraith
2008-03-04 19:25 ` Jens Axboe
2008-03-04 19:33 ` Mike Galbraith
2008-03-04 19:34 ` Jens Axboe
2008-03-04 19:19 ` FUJITA Tomonori
2008-03-04 23:33 ` Tejun Heo
2008-03-04 23:54 ` Tejun Heo
2008-03-05 0:26 ` FUJITA Tomonori
2008-03-05 0:44 ` Tejun Heo
2008-03-06 4:56 ` FUJITA Tomonori
2008-03-06 5:02 ` Tejun Heo
2008-03-05 10:16 ` [PATCH] blk: missing add of padded bytes to io completion byte count Boaz Harrosh
2008-03-05 12:28 ` Mike Galbraith
2008-03-05 12:33 ` Jens Axboe
2008-03-05 12:46 ` Boaz Harrosh [this message]
2008-03-05 12:48 ` Jens Axboe
2008-03-05 13:45 ` Tejun Heo
2008-03-05 13:51 ` Jens Axboe
2008-03-05 14:08 ` Tejun Heo
2008-03-05 15:21 ` James Bottomley
2008-03-06 4:41 ` FUJITA Tomonori
2008-03-06 13:41 ` Jens Axboe
2008-03-07 0:07 ` Tejun Heo
2008-03-07 15:07 ` FUJITA Tomonori
2008-03-08 1:06 ` Tejun Heo
2008-03-20 12:54 ` FUJITA Tomonori
2008-03-05 14:46 ` Boaz Harrosh
2008-03-05 15:11 ` Tejun Heo
2008-03-06 5:02 ` FUJITA Tomonori
2008-03-04 9:29 ` [PATCH] block: fix residual byte count handling Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47CE9634.6040501@panasas.com \
--to=bharrosh@panasas.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=akpm@linux-foundation.org \
--cc=bzolnier@gmail.com \
--cc=efault@gmx.de \
--cc=fujita.tomonori@lab.ntt.co.jp \
--cc=htejun@gmail.com \
--cc=jens.axboe@oracle.com \
--cc=jgarzik@pobox.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=tomof@acm.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).