All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] Revert of the IO stat fix
@ 2010-10-24 20:09 Jens Axboe
  2010-10-24 20:35 ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2010-10-24 20:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Hi Linus,

The fix for cross-partition merges screwing up disk stats turns out
to be problematic on various levels. Lets revert this one so we have
time to come up with a proper solution for this.


  git://git.kernel.dk/linux-2.6-block.git for-linus

Jens Axboe (1):
      Revert "block: fix accounting bug on cross partition merges"

 block/blk-core.c         |   24 ++++++++----------------
 block/blk-merge.c        |    2 +-
 block/blk.h              |    4 ++++
 block/genhd.c            |   14 --------------
 fs/partitions/check.c    |   12 ------------
 include/linux/blkdev.h   |    1 -
 include/linux/elevator.h |    2 --
 include/linux/genhd.h    |    1 -
 8 files changed, 13 insertions(+), 47 deletions(-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-24 20:09 [GIT PULL] Revert of the IO stat fix Jens Axboe
@ 2010-10-24 20:35 ` Linus Torvalds
  2010-10-24 20:42   ` Jens Axboe
  2010-10-25 19:24   ` Vivek Goyal
  0 siblings, 2 replies; 12+ messages in thread
From: Linus Torvalds @ 2010-10-24 20:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Sun, Oct 24, 2010 at 1:09 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
>
> The fix for cross-partition merges screwing up disk stats turns out
> to be problematic on various levels. Lets revert this one so we have
> time to come up with a proper solution for this.

Hmm.. I think the reverted patch looks like it really is the right
thing to do, so I hate reverting it this early. What were the problems
with it?

Btw, one thing that seems to be missing in the original commit (which
is not necessarily the reason for the trouble, of course), is that
elv_rq_merge_ok() seems to not check the partition. As far as I can
tell, we should have a

    if (req->part != bio->bi_bdev->bd_part)
       return 0;

there, no? And you should _not_ set rq->part in "drive_stat_acct()",
you should set it from bio->bi_bdev->bd_part when you create the
request.

(And if it is NULL, just don't do partition accounting at all)

Hmm? What am I missing? What were the bugs?

                      Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-24 20:35 ` Linus Torvalds
@ 2010-10-24 20:42   ` Jens Axboe
  2010-10-24 20:44     ` Jens Axboe
  2010-10-24 22:03     ` Maxim Levitsky
  2010-10-25 19:24   ` Vivek Goyal
  1 sibling, 2 replies; 12+ messages in thread
From: Jens Axboe @ 2010-10-24 20:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On 2010-10-24 22:35, Linus Torvalds wrote:
> On Sun, Oct 24, 2010 at 1:09 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
>>
>> The fix for cross-partition merges screwing up disk stats turns out
>> to be problematic on various levels. Lets revert this one so we have
>> time to come up with a proper solution for this.
> 
> Hmm.. I think the reverted patch looks like it really is the right
> thing to do, so I hate reverting it this early. What were the problems
> with it?
> 
> Btw, one thing that seems to be missing in the original commit (which
> is not necessarily the reason for the trouble, of course), is that
> elv_rq_merge_ok() seems to not check the partition. As far as I can
> tell, we should have a
> 
>     if (req->part != bio->bi_bdev->bd_part)
>        return 0;
> 
> there, no? And you should _not_ set rq->part in "drive_stat_acct()",
> you should set it from bio->bi_bdev->bd_part when you create the
> request.
> 
> (And if it is NULL, just don't do partition accounting at all)
> 
> Hmm? What am I missing? What were the bugs?

The patch itself is sound, the problems are around the area of it not
really liking non-elevator devices with the elv_quiesce_start/end()
parts. I had the below patch for that, but then I could not decide
whether we were fully safe on queue free after talking to Vivek about
it.

So that shows up as an oops on removal of mspro for instance, and loop
throws a fit as well. Coverage of the bug is too large just to let side
idle for a few days.

And since I'll be travelling the next few days, I would rather just
revert this one and I have time in-flight to fix it for real.

OK?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-24 20:42   ` Jens Axboe
@ 2010-10-24 20:44     ` Jens Axboe
  2010-10-24 20:53       ` Linus Torvalds
                         ` (2 more replies)
  2010-10-24 22:03     ` Maxim Levitsky
  1 sibling, 3 replies; 12+ messages in thread
From: Jens Axboe @ 2010-10-24 20:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On 2010-10-24 22:42, Jens Axboe wrote:
> On 2010-10-24 22:35, Linus Torvalds wrote:
>> On Sun, Oct 24, 2010 at 1:09 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
>>>
>>> The fix for cross-partition merges screwing up disk stats turns out
>>> to be problematic on various levels. Lets revert this one so we have
>>> time to come up with a proper solution for this.
>>
>> Hmm.. I think the reverted patch looks like it really is the right
>> thing to do, so I hate reverting it this early. What were the problems
>> with it?
>>
>> Btw, one thing that seems to be missing in the original commit (which
>> is not necessarily the reason for the trouble, of course), is that
>> elv_rq_merge_ok() seems to not check the partition. As far as I can
>> tell, we should have a
>>
>>     if (req->part != bio->bi_bdev->bd_part)
>>        return 0;
>>
>> there, no? And you should _not_ set rq->part in "drive_stat_acct()",
>> you should set it from bio->bi_bdev->bd_part when you create the
>> request.
>>
>> (And if it is NULL, just don't do partition accounting at all)
>>
>> Hmm? What am I missing? What were the bugs?
> 
> The patch itself is sound, the problems are around the area of it not
> really liking non-elevator devices with the elv_quiesce_start/end()
> parts. I had the below patch for that, but then I could not decide
> whether we were fully safe on queue free after talking to Vivek about
> it.

Forgot to include it, here it is. I'll be offline from now and 1-2 days
forward.


>From 96059cec039b666c26d300c2132e24bfd6edacdc Mon Sep 17 00:00:00 2001
From: Jens Axboe <jaxboe@fusionio.com>
Date: Sun, 24 Oct 2010 08:46:41 +0200
Subject: [PATCH] block: fix partition reload bug with non-elevator devices

The partition reload code was changed to quiesce the block
queue so that partition IO stats could safely hold a reference
to the partition table. elv_quiesce_{start,end} do not
properly work on non-elevator devices. Improve the helper
functions so that they don't care, this way we can use
the generic interface on partition reload without having
to check for queue structures or types.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
---
 block/elevator.c         |   31 +++++++++++++++++++++++--------
 block/genhd.c            |   10 +---------
 fs/partitions/check.c    |    5 -----
 include/linux/elevator.h |    2 ++
 4 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 282e830..5461075 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -590,11 +590,8 @@ void elv_drain_elevator(struct request_queue *q)
 /*
  * Call with queue lock held, interrupts disabled
  */
-void elv_quiesce_start(struct request_queue *q)
+void __elv_quiesce_start(struct request_queue *q)
 {
-	if (!q->elevator)
-		return;
-
 	queue_flag_set(QUEUE_FLAG_ELVSWITCH, q);
 
 	/*
@@ -610,11 +607,31 @@ void elv_quiesce_start(struct request_queue *q)
 	}
 }
 
-void elv_quiesce_end(struct request_queue *q)
+void elv_quiesce_start(struct request_queue *q)
+{
+	if (q->elevator) {
+		spin_lock_irq(q->queue_lock);
+		__elv_quiesce_start(q);
+		spin_unlock_irq(q->queue_lock);
+	}
+}
+
+void __elv_quiesce_end(struct request_queue *q)
 {
 	queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q);
 }
 
+void elv_quiesce_end(struct request_queue *q)
+{
+	if (q->elevator) {
+		unsigned long flags;
+
+		spin_lock_irqsave(q->queue_lock, flags);
+		__elv_quiesce_end(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+}
+
 void elv_insert(struct request_queue *q, struct request *rq, int where)
 {
 	int unplug_it = 1;
@@ -969,7 +986,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	 * Turn on BYPASS and drain all requests w/ elevator private data
 	 */
 	spin_lock_irq(q->queue_lock);
-	elv_quiesce_start(q);
+	__elv_quiesce_start(q);
 
 	/*
 	 * Remember old elevator.
@@ -995,9 +1012,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	 * finally exit old elevator and turn off BYPASS.
 	 */
 	elevator_exit(old_elevator);
-	spin_lock_irq(q->queue_lock);
 	elv_quiesce_end(q);
-	spin_unlock_irq(q->queue_lock);
 
 	blk_add_trace_msg(q, "elv switch: %s", e->elevator_type->elevator_name);
 
diff --git a/block/genhd.c b/block/genhd.c
index a8adf96..7d4d860 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -930,14 +930,9 @@ static void disk_free_ptbl_rcu_cb(struct rcu_head *head)
 	struct disk_part_tbl *ptbl =
 		container_of(head, struct disk_part_tbl, rcu_head);
 	struct gendisk *disk = ptbl->disk;
-	struct request_queue *q = disk->queue;
-	unsigned long flags;
 
 	kfree(ptbl);
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	elv_quiesce_end(q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	elv_quiesce_end(disk->queue);
 }
 
 /**
@@ -962,10 +957,7 @@ static void disk_replace_part_tbl(struct gendisk *disk,
 	if (old_ptbl) {
 		rcu_assign_pointer(old_ptbl->last_lookup, NULL);
 
-		spin_lock_irq(q->queue_lock);
 		elv_quiesce_start(q);
-		spin_unlock_irq(q->queue_lock);
-
 		call_rcu(&old_ptbl->rcu_head, disk_free_ptbl_rcu_cb);
 	}
 }
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index b81bfc0..cf4d1ee 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -367,16 +367,13 @@ static void delete_partition_rcu_cb(struct rcu_head *head)
 	struct hd_struct *part = container_of(head, struct hd_struct, rcu_head);
 	struct gendisk *disk = part_to_disk(part);
 	struct request_queue *q = disk->queue;
-	unsigned long flags;
 
 	part->start_sect = 0;
 	part->nr_sects = 0;
 	part_stat_set_all(part, 0);
 	put_device(part_to_dev(part));
 
-	spin_lock_irqsave(q->queue_lock, flags);
 	elv_quiesce_end(q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
 void delete_partition(struct gendisk *disk, int partno)
@@ -398,9 +395,7 @@ void delete_partition(struct gendisk *disk, int partno)
 	kobject_put(part->holder_dir);
 	device_del(part_to_dev(part));
 
-	spin_lock_irq(q->queue_lock);
 	elv_quiesce_start(q);
-	spin_unlock_irq(q->queue_lock);
 
 	call_rcu(&part->rcu_head, delete_partition_rcu_cb);
 }
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index 80a0ece..2d30300 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -122,7 +122,9 @@ extern void elv_completed_request(struct request_queue *, struct request *);
 extern int elv_set_request(struct request_queue *, struct request *, gfp_t);
 extern void elv_put_request(struct request_queue *, struct request *);
 extern void elv_drain_elevator(struct request_queue *);
+extern void __elv_quiesce_start(struct request_queue *);
 extern void elv_quiesce_start(struct request_queue *);
+extern void __elv_quiesce_end(struct request_queue *);
 extern void elv_quiesce_end(struct request_queue *);
 
 /*
-- 
1.7.3


-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-24 20:44     ` Jens Axboe
@ 2010-10-24 20:53       ` Linus Torvalds
  2010-11-23 10:01       ` Jerome Marchand
  2010-11-30 17:01       ` Jerome Marchand
  2 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2010-10-24 20:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, Vivek Goyal

On Sun, Oct 24, 2010 at 1:44 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
>>
>> The patch itself is sound, the problems are around the area of it not
>> really liking non-elevator devices with the elv_quiesce_start/end()
>> parts. I had the below patch for that, but then I could not decide
>> whether we were fully safe on queue free after talking to Vivek about
>> it.
>
> Forgot to include it, here it is. I'll be offline from now and 1-2 days
> forward.

Hmm. I actually like this patch more than the revert. Except that I
think the __elv_quiesce_start/end() functions should be static, and
not exposed in the header file. There don't seem to be any other files
that would want to use them anyway, and the whole change looks very
sane since all the outside users used to have to take the lock
explicitly (and thus moving it inside the elv_quiesce_start/end
functions seems to be a nice cleanup).

So if this patch is supposed to fix things and doesn't have any
*known* problems, but you're just not entirely sure about it and we're
talking just a few days of you being gone, I'd almost rather take the
risk. If worst comes to worst, I can revert both this patch and the
original one. We _are_ in the middle of the merge window after all, so
I'm much more open to risks than I would be later in the -rc series..

                                 Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-24 20:42   ` Jens Axboe
  2010-10-24 20:44     ` Jens Axboe
@ 2010-10-24 22:03     ` Maxim Levitsky
  2010-10-25  0:07       ` Linus Torvalds
  1 sibling, 1 reply; 12+ messages in thread
From: Maxim Levitsky @ 2010-10-24 22:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]

On Sun, 2010-10-24 at 22:42 +0200, Jens Axboe wrote:
> On 2010-10-24 22:35, Linus Torvalds wrote:
> > On Sun, Oct 24, 2010 at 1:09 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
> >>
> >> The fix for cross-partition merges screwing up disk stats turns out
> >> to be problematic on various levels. Lets revert this one so we have
> >> time to come up with a proper solution for this.
> > 
> > Hmm.. I think the reverted patch looks like it really is the right
> > thing to do, so I hate reverting it this early. What were the problems
> > with it?
> > 
> > Btw, one thing that seems to be missing in the original commit (which
> > is not necessarily the reason for the trouble, of course), is that
> > elv_rq_merge_ok() seems to not check the partition. As far as I can
> > tell, we should have a
> > 
> >     if (req->part != bio->bi_bdev->bd_part)
> >        return 0;
> > 
> > there, no? And you should _not_ set rq->part in "drive_stat_acct()",
> > you should set it from bio->bi_bdev->bd_part when you create the
> > request.
> > 
> > (And if it is NULL, just don't do partition accounting at all)
> > 
> > Hmm? What am I missing? What were the bugs?
> 
> The patch itself is sound, the problems are around the area of it not
> really liking non-elevator devices with the elv_quiesce_start/end()
> parts. I had the below patch for that, but then I could not decide
> whether we were fully safe on queue free after talking to Vivek about
> it.
> 
> So that shows up as an oops on removal of mspro for instance, and loop
> throws a fit as well. Coverage of the bug is too large just to let side
> idle for a few days.
Btw I had that bug reproduced 100% for all removeable devices I have.
(xD, ms, usb).


Attached logs for each.

Best regards,
	Maxim Levitsky

[-- Attachment #2: logs.tar.bz2 --]
[-- Type: application/x-bzip-compressed-tar, Size: 38268 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-24 22:03     ` Maxim Levitsky
@ 2010-10-25  0:07       ` Linus Torvalds
  2010-10-25  0:55         ` Maxim Levitsky
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2010-10-25  0:07 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Jens Axboe, linux-kernel

On Sun, Oct 24, 2010 at 3:03 PM, Maxim Levitsky <maximlevitsky@gmail.com> wrote:
>
> Btw I had that bug reproduced 100% for all removeable devices I have.
> (xD, ms, usb).

Does Jens' suggested patch (the non-revert one) fix your oops? Or only
the revert?

IOW, I would want to know whether I could try just merging the fix
instead of backing out the patch that looks like a real improvement..

                    Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-25  0:07       ` Linus Torvalds
@ 2010-10-25  0:55         ` Maxim Levitsky
  2010-10-25  1:30           ` Maxim Levitsky
  0 siblings, 1 reply; 12+ messages in thread
From: Maxim Levitsky @ 2010-10-25  0:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel

On Sun, 2010-10-24 at 17:07 -0700, Linus Torvalds wrote:
> On Sun, Oct 24, 2010 at 3:03 PM, Maxim Levitsky <maximlevitsky@gmail.com> wrote:
> >
> > Btw I had that bug reproduced 100% for all removeable devices I have.
> > (xD, ms, usb).
> 
> Does Jens' suggested patch (the non-revert one) fix your oops? Or only
> the revert?
The suggested patch doesn't fix the oops.
Testing revert now.


Best regards,
	Maxim Levitsky

> 
> IOW, I would want to know whether I could try just merging the fix
> instead of backing out the patch that looks like a real improvement..
> 
>                     Linus



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-25  0:55         ` Maxim Levitsky
@ 2010-10-25  1:30           ` Maxim Levitsky
  0 siblings, 0 replies; 12+ messages in thread
From: Maxim Levitsky @ 2010-10-25  1:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel

On Mon, 2010-10-25 at 02:55 +0200, Maxim Levitsky wrote:
> On Sun, 2010-10-24 at 17:07 -0700, Linus Torvalds wrote:
> > On Sun, Oct 24, 2010 at 3:03 PM, Maxim Levitsky <maximlevitsky@gmail.com> wrote:
> > >
> > > Btw I had that bug reproduced 100% for all removeable devices I have.
> > > (xD, ms, usb).
> > 
> > Does Jens' suggested patch (the non-revert one) fix your oops? Or only
> > the revert?
> The suggested patch doesn't fix the oops.
> Testing revert now.

Yes, revert works.
Tested xD, MS, USB.


Best regards,
	Maxim Levitsky

> 
> > 
> > IOW, I would want to know whether I could try just merging the fix
> > instead of backing out the patch that looks like a real improvement..
> > 
> >                     Linus
> 
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-24 20:35 ` Linus Torvalds
  2010-10-24 20:42   ` Jens Axboe
@ 2010-10-25 19:24   ` Vivek Goyal
  1 sibling, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2010-10-25 19:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel, Maxim Levitsky

On Sun, Oct 24, 2010 at 01:35:33PM -0700, Linus Torvalds wrote:
> On Sun, Oct 24, 2010 at 1:09 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
> >
> > The fix for cross-partition merges screwing up disk stats turns out
> > to be problematic on various levels. Lets revert this one so we have
> > time to come up with a proper solution for this.
> 
> Hmm.. I think the reverted patch looks like it really is the right
> thing to do, so I hate reverting it this early. What were the problems
> with it?
> 
> Btw, one thing that seems to be missing in the original commit (which
> is not necessarily the reason for the trouble, of course), is that
> elv_rq_merge_ok() seems to not check the partition. As far as I can
> tell, we should have a
> 
>     if (req->part != bio->bi_bdev->bd_part)
>        return 0;

Ok, not allowing requests of different partitions to merge will atleast
solve part of the problem for Yasuaki Ishimatsu. (in-flight accounting
going out of sync due to request merging across partitions).

> 
> there, no? And you should _not_ set rq->part in "drive_stat_acct()",
> you should set it from bio->bi_bdev->bd_part when you create the
> request.

Not allowing request merging across partitions, I guess should remove
the need of storing this info in request and allow us to remove req->part
as introduced by commit 7681bfeeccff5efa9eb29bf09249a3c400b15327.

> 
> (And if it is NULL, just don't do partition accounting at all)
> 
> Hmm? What am I missing? What were the bugs?

I think atleast one of the bugs was that we don't know whether request
queue pointer obtained from disk->queue is still valid or not.

Looking at the backtrace of crash reported by Maxim Levitsky here.

http://lkml.org/lkml/2010/10/23/165

mspro_block_remove() calls blk_cleanup_queue() to put its reference on the
queue and which in turn would call kmem_cache_free(blk_requestq_cachep, q)
and free up the queue.

Now driver calls mspro_block_disk_release() and which in turn ends up
accessing request queue and trying to make sure there are no inflight
requests. But request queue might be gone by now and we might end up
accessing freed object.

So one of the quick solutions probably is to set disk->queue = NULL after
del_gendisk() and before blk_cleanup_queue(). IIUC, blk_cleanup_queue()
driver's reference to the request queue and driver should in-turn cleanup
the disk->queue, atleast in theory.

Later disk_replace_part_tbl() can do

q = disk->queue;

if (q)
	quiesce_elevator;

I am not sure if it is completely race free or not from the other call
path of disk_expand_part_tbl(). What kind of locking is there to make
sure once disk->queue is fetched, some other cpu does not do
blk_cleanup_queue().

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-24 20:44     ` Jens Axboe
  2010-10-24 20:53       ` Linus Torvalds
@ 2010-11-23 10:01       ` Jerome Marchand
  2010-11-30 17:01       ` Jerome Marchand
  2 siblings, 0 replies; 12+ messages in thread
From: Jerome Marchand @ 2010-11-23 10:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, linux-kernel, Vivek Goyal

On 10/24/2010 10:44 PM, Jens Axboe wrote:
> On 2010-10-24 22:42, Jens Axboe wrote:
>> On 2010-10-24 22:35, Linus Torvalds wrote:
>>> On Sun, Oct 24, 2010 at 1:09 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
>>>>
>>>> The fix for cross-partition merges screwing up disk stats turns out
>>>> to be problematic on various levels. Lets revert this one so we have
>>>> time to come up with a proper solution for this.
>>>
>>> Hmm.. I think the reverted patch looks like it really is the right
>>> thing to do, so I hate reverting it this early. What were the problems
>>> with it?
>>>
>>> Btw, one thing that seems to be missing in the original commit (which
>>> is not necessarily the reason for the trouble, of course), is that
>>> elv_rq_merge_ok() seems to not check the partition. As far as I can
>>> tell, we should have a
>>>
>>>     if (req->part != bio->bi_bdev->bd_part)
>>>        return 0;
>>>
>>> there, no? And you should _not_ set rq->part in "drive_stat_acct()",
>>> you should set it from bio->bi_bdev->bd_part when you create the
>>> request.
>>>
>>> (And if it is NULL, just don't do partition accounting at all)
>>>
>>> Hmm? What am I missing? What were the bugs?
>>
>> The patch itself is sound, the problems are around the area of it not
>> really liking non-elevator devices with the elv_quiesce_start/end()
>> parts. I had the below patch for that, but then I could not decide
>> whether we were fully safe on queue free after talking to Vivek about
>> it.

Hi Jens,

What's the status of this ? Has there been any progress on the issue ?

Thanks,
Jerome

> 
> Forgot to include it, here it is. I'll be offline from now and 1-2 days
> forward.
> 
> 
> From 96059cec039b666c26d300c2132e24bfd6edacdc Mon Sep 17 00:00:00 2001
> From: Jens Axboe <jaxboe@fusionio.com>
> Date: Sun, 24 Oct 2010 08:46:41 +0200
> Subject: [PATCH] block: fix partition reload bug with non-elevator devices
> 
> The partition reload code was changed to quiesce the block
> queue so that partition IO stats could safely hold a reference
> to the partition table. elv_quiesce_{start,end} do not
> properly work on non-elevator devices. Improve the helper
> functions so that they don't care, this way we can use
> the generic interface on partition reload without having
> to check for queue structures or types.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
> ---
>  block/elevator.c         |   31 +++++++++++++++++++++++--------
>  block/genhd.c            |   10 +---------
>  fs/partitions/check.c    |    5 -----
>  include/linux/elevator.h |    2 ++
>  4 files changed, 26 insertions(+), 22 deletions(-)
> 
> diff --git a/block/elevator.c b/block/elevator.c
> index 282e830..5461075 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -590,11 +590,8 @@ void elv_drain_elevator(struct request_queue *q)
>  /*
>   * Call with queue lock held, interrupts disabled
>   */
> -void elv_quiesce_start(struct request_queue *q)
> +void __elv_quiesce_start(struct request_queue *q)
>  {
> -	if (!q->elevator)
> -		return;
> -
>  	queue_flag_set(QUEUE_FLAG_ELVSWITCH, q);
>  
>  	/*
> @@ -610,11 +607,31 @@ void elv_quiesce_start(struct request_queue *q)
>  	}
>  }
>  
> -void elv_quiesce_end(struct request_queue *q)
> +void elv_quiesce_start(struct request_queue *q)
> +{
> +	if (q->elevator) {
> +		spin_lock_irq(q->queue_lock);
> +		__elv_quiesce_start(q);
> +		spin_unlock_irq(q->queue_lock);
> +	}
> +}
> +
> +void __elv_quiesce_end(struct request_queue *q)
>  {
>  	queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q);
>  }
>  
> +void elv_quiesce_end(struct request_queue *q)
> +{
> +	if (q->elevator) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(q->queue_lock, flags);
> +		__elv_quiesce_end(q);
> +		spin_unlock_irqrestore(q->queue_lock, flags);
> +	}
> +}
> +
>  void elv_insert(struct request_queue *q, struct request *rq, int where)
>  {
>  	int unplug_it = 1;
> @@ -969,7 +986,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
>  	 * Turn on BYPASS and drain all requests w/ elevator private data
>  	 */
>  	spin_lock_irq(q->queue_lock);
> -	elv_quiesce_start(q);
> +	__elv_quiesce_start(q);
>  
>  	/*
>  	 * Remember old elevator.
> @@ -995,9 +1012,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
>  	 * finally exit old elevator and turn off BYPASS.
>  	 */
>  	elevator_exit(old_elevator);
> -	spin_lock_irq(q->queue_lock);
>  	elv_quiesce_end(q);
> -	spin_unlock_irq(q->queue_lock);
>  
>  	blk_add_trace_msg(q, "elv switch: %s", e->elevator_type->elevator_name);
>  
> diff --git a/block/genhd.c b/block/genhd.c
> index a8adf96..7d4d860 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -930,14 +930,9 @@ static void disk_free_ptbl_rcu_cb(struct rcu_head *head)
>  	struct disk_part_tbl *ptbl =
>  		container_of(head, struct disk_part_tbl, rcu_head);
>  	struct gendisk *disk = ptbl->disk;
> -	struct request_queue *q = disk->queue;
> -	unsigned long flags;
>  
>  	kfree(ptbl);
> -
> -	spin_lock_irqsave(q->queue_lock, flags);
> -	elv_quiesce_end(q);
> -	spin_unlock_irqrestore(q->queue_lock, flags);
> +	elv_quiesce_end(disk->queue);
>  }
>  
>  /**
> @@ -962,10 +957,7 @@ static void disk_replace_part_tbl(struct gendisk *disk,
>  	if (old_ptbl) {
>  		rcu_assign_pointer(old_ptbl->last_lookup, NULL);
>  
> -		spin_lock_irq(q->queue_lock);
>  		elv_quiesce_start(q);
> -		spin_unlock_irq(q->queue_lock);
> -
>  		call_rcu(&old_ptbl->rcu_head, disk_free_ptbl_rcu_cb);
>  	}
>  }
> diff --git a/fs/partitions/check.c b/fs/partitions/check.c
> index b81bfc0..cf4d1ee 100644
> --- a/fs/partitions/check.c
> +++ b/fs/partitions/check.c
> @@ -367,16 +367,13 @@ static void delete_partition_rcu_cb(struct rcu_head *head)
>  	struct hd_struct *part = container_of(head, struct hd_struct, rcu_head);
>  	struct gendisk *disk = part_to_disk(part);
>  	struct request_queue *q = disk->queue;
> -	unsigned long flags;
>  
>  	part->start_sect = 0;
>  	part->nr_sects = 0;
>  	part_stat_set_all(part, 0);
>  	put_device(part_to_dev(part));
>  
> -	spin_lock_irqsave(q->queue_lock, flags);
>  	elv_quiesce_end(q);
> -	spin_unlock_irqrestore(q->queue_lock, flags);
>  }
>  
>  void delete_partition(struct gendisk *disk, int partno)
> @@ -398,9 +395,7 @@ void delete_partition(struct gendisk *disk, int partno)
>  	kobject_put(part->holder_dir);
>  	device_del(part_to_dev(part));
>  
> -	spin_lock_irq(q->queue_lock);
>  	elv_quiesce_start(q);
> -	spin_unlock_irq(q->queue_lock);
>  
>  	call_rcu(&part->rcu_head, delete_partition_rcu_cb);
>  }
> diff --git a/include/linux/elevator.h b/include/linux/elevator.h
> index 80a0ece..2d30300 100644
> --- a/include/linux/elevator.h
> +++ b/include/linux/elevator.h
> @@ -122,7 +122,9 @@ extern void elv_completed_request(struct request_queue *, struct request *);
>  extern int elv_set_request(struct request_queue *, struct request *, gfp_t);
>  extern void elv_put_request(struct request_queue *, struct request *);
>  extern void elv_drain_elevator(struct request_queue *);
> +extern void __elv_quiesce_start(struct request_queue *);
>  extern void elv_quiesce_start(struct request_queue *);
> +extern void __elv_quiesce_end(struct request_queue *);
>  extern void elv_quiesce_end(struct request_queue *);
>  
>  /*


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Revert of the IO stat fix
  2010-10-24 20:44     ` Jens Axboe
  2010-10-24 20:53       ` Linus Torvalds
  2010-11-23 10:01       ` Jerome Marchand
@ 2010-11-30 17:01       ` Jerome Marchand
  2 siblings, 0 replies; 12+ messages in thread
From: Jerome Marchand @ 2010-11-30 17:01 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, linux-kernel, Maxim Levitsky, Yasuaki Ishimatsu,
	Vivek Goyal

On 10/24/2010 10:44 PM, Jens Axboe wrote:
> diff --git a/block/genhd.c b/block/genhd.c
> index a8adf96..7d4d860 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -930,14 +930,9 @@ static void disk_free_ptbl_rcu_cb(struct rcu_head *head)
>  	struct disk_part_tbl *ptbl =
>  		container_of(head, struct disk_part_tbl, rcu_head);
>  	struct gendisk *disk = ptbl->disk;
> -	struct request_queue *q = disk->queue;
> -	unsigned long flags;
>  
>  	kfree(ptbl);
> -
> -	spin_lock_irqsave(q->queue_lock, flags);
> -	elv_quiesce_end(q);
> -	spin_unlock_irqrestore(q->queue_lock, flags);
> +	elv_quiesce_end(disk->queue);

Here, the queue may be already gone. We should check it is still here and alive:

static void disk_free_ptbl_rcu_cb(struct rcu_head *head)
{
	struct disk_part_tbl *ptbl =
		container_of(head, struct disk_part_tbl, rcu_head);
	struct gendisk *disk = ptbl->disk;
	struct request_queue *q = disk->queue;

	kfree(ptbl);
	if (q && !test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))
		elv_quiesce_end(q);
}


>  }
>  
>  /**
> @@ -962,10 +957,7 @@ static void disk_replace_part_tbl(struct gendisk *disk,
>  	if (old_ptbl) {
>  		rcu_assign_pointer(old_ptbl->last_lookup, NULL);
>  
> -		spin_lock_irq(q->queue_lock);
>  		elv_quiesce_start(q);

Same as above.

I'm not sure that is enough, but on my test machine these changes fix the crash
at USB key device removal.

Regards,
Jerome

> -		spin_unlock_irq(q->queue_lock);
> -
>  		call_rcu(&old_ptbl->rcu_head, disk_free_ptbl_rcu_cb);
>  	}
>  }

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-11-30 17:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-24 20:09 [GIT PULL] Revert of the IO stat fix Jens Axboe
2010-10-24 20:35 ` Linus Torvalds
2010-10-24 20:42   ` Jens Axboe
2010-10-24 20:44     ` Jens Axboe
2010-10-24 20:53       ` Linus Torvalds
2010-11-23 10:01       ` Jerome Marchand
2010-11-30 17:01       ` Jerome Marchand
2010-10-24 22:03     ` Maxim Levitsky
2010-10-25  0:07       ` Linus Torvalds
2010-10-25  0:55         ` Maxim Levitsky
2010-10-25  1:30           ` Maxim Levitsky
2010-10-25 19:24   ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.