linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* DAC960 crash dequeueing request
@ 2003-06-04 22:04 Dave Olien
  2003-06-04 22:08 ` Andrew Morton
  2003-06-06 16:51 ` Dave Olien
  0 siblings, 2 replies; 3+ messages in thread
From: Dave Olien @ 2003-06-04 22:04 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel


In linux 2.5.70, with no patches applied, we've had one BUG of the form:

kernel BUG at include/linux/blkdev.h:407!

This is running a database workload, on an 8-way x86 machine with 4gig
of memory.  we've seen this only once, after 6 hours of run time.
Ironically, the BUG occurs during a part of the test that is not
particularly disk I/O intensive. The disk I/O that IS going on at this
time is predominantly sequential writes to a logging device.

no file systems are involved.

------------[ cut here ]------------
kernel BUG at include/linux/blkdev.h:407!
invalid operand: 0000 [#1]
CPU:    4
EIP:    0060:[<c01f3565>]    Not tainted
EFLAGS: 00010046
EIP is at DAC960_ProcessRequest+0xb5/0x170
eax: 00000080   ebx: f78f3360   ecx: f59eeb98   edx: f59eeb98
esi: f7fa8174   edi: f05fbc58   ebp: f7fa8000   esp: f05fbc0c
ds: 007b   es: 007b   ss: 0068
Process kernel (pid: 8281, threadinfo=f05fa000 task=f0737940)
Stack: 01000082 00000001 f7fa8174 f05fbc58 00000002 c01f36d6 f7fa8000
00000001 
       f7fa8174 c032c8a0 c01e10c2 f7fa8174 00000040 00000000 00209008
f7656e00 
       f05fbc58 c01e1252 f7fa8174 f05fbc58 f05fbc58 f7656e00 f05fa000
00000000 
Call Trace:
 [<c01f36d6>] DAC960_RequestFunction+0x26/0x30
 [<c01e10c2>] generic_unplug_device+0x52/0x70
 [<c01e1252>] blk_run_queues+0x72/0xa0
 [<c016a676>] dio_await_one+0x56/0xa0
 [<c016a7ad>] dio_await_completion+0x1d/0x40
 [<c016b2ee>] direct_io_worker+0x2ce/0x340
 [<c016b499>] blockdev_direct_IO+0x139/0x14c
 [<c0152a90>] blkdev_get_blocks+0x0/0x90
 [<c0152b61>] blkdev_direct_IO+0x41/0x50
 [<c0152a90>] blkdev_get_blocks+0x0/0x90
 [<c013575c>] generic_file_direct_IO+0x5c/0x80
 [<c0134f04>] generic_file_aio_write_nolock+0x3f4/0x9a0
 [<c01a9f24>] sys_semtimedop+0x564/0x5b0
 [<c0152b61>] blkdev_direct_IO+0x41/0x50
 [<c0152a90>] blkdev_get_blocks+0x0/0x90
 [<c013575c>] generic_file_direct_IO+0x5c/0x80
 [<c01631cd>] update_atime+0x6d/0xc0
 [<c0133e71>] __generic_file_aio_read+0x111/0x1e0
 [<c013551f>] generic_file_write_nolock+0x6f/0x90
 [<c0121edb>] do_softirq+0x6b/0xd0
 [<c01169e9>] smp_apic_timer_interrupt+0x149/0x160
 [<c011181d>] restore_i387_fxsave+0x5d/0x70
 [<c01cdf47>] raw_file_write+0x27/0x30
 [<c014c07a>] vfs_write+0xaa/0xe0
 [<c014bb50>] default_llseek+0x0/0xd0
 [<c014bd65>] sys_llseek+0xb5/0xe0
 [<c014c12f>] sys_write+0x2f/0x50
 [<c010a953>] syscall_call+0x7/0xb

Code: 0f 0b 97 01 04 93 2a c0 8b 41 04 89 42 04 89 10 89 09 89 49 

--------------------------------------------------------------------
 
The BUG is occuring in DAC960_ProcessRequest, which essentially does:

The DAC960_ProcessRequest 
{
	elv_next_request

	blkdev_dequeue_request
}

It looks as though there was a request on the request queue
when elv_next_request was called, but it was removed before
blkdev_dequeue_request is called.  The only scenarios I can
imagine this happening would be if there were some flaw in the
locking code around accesses to the request queue, or if there were
some other kind of memory corruption going on.

Regarding locking,

Called from DAC960_RequestFunction, which is entered and exited
with queue_lock held.

Called from DAC960_BA_InterruptHandler(), with queue_lock held.

I'm looking for some hints on how to track down what's happening, so that
when it happens again, we can get more information.

I'm thinking of starting by stashing a copy of the request structure found
on the request queue, and dumping it out at BUG() time, to see if
examining the data there gives any hint what happened to it.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: DAC960 crash dequeueing request
  2003-06-04 22:04 DAC960 crash dequeueing request Dave Olien
@ 2003-06-04 22:08 ` Andrew Morton
  2003-06-06 16:51 ` Dave Olien
  1 sibling, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2003-06-04 22:08 UTC (permalink / raw)
  To: Dave Olien; +Cc: axboe, linux-kernel

Dave Olien wrote:
> 
> In linux 2.5.70, with no patches applied, we've had one BUG of the form:
> 
> kernel BUG at include/linux/blkdev.h:407!
> 


The below should fix it.


diff -puN drivers/block/deadline-iosched.c~deadline-hash-removal-fix drivers/block/deadline-iosched.c
--- 25/drivers/block/deadline-iosched.c~deadline-hash-removal-fix	2003-06-04 00:50:36.000000000 -0700
+++ 25-akpm/drivers/block/deadline-iosched.c	2003-06-04 00:50:36.000000000 -0700
@@ -121,6 +121,15 @@ static inline void deadline_del_drq_hash
 		__deadline_del_drq_hash(drq);
 }
 
+static void
+deadline_remove_merge_hints(request_queue_t *q, struct deadline_rq *drq)
+{
+	deadline_del_drq_hash(drq);
+
+	if (q->last_merge == &drq->request->queuelist)
+		q->last_merge = NULL;
+}
+
 static inline void
 deadline_add_drq_hash(struct deadline_data *dd, struct deadline_rq *drq)
 {
@@ -310,7 +319,7 @@ static void deadline_remove_request(requ
 		struct deadline_data *dd = q->elevator.elevator_data;
 
 		list_del_init(&drq->fifo);
-		deadline_del_drq_hash(drq);
+		deadline_remove_merge_hints(q, drq);
 		deadline_del_drq_rb(dd, drq);
 	}
 }

_

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: DAC960 crash dequeueing request
  2003-06-04 22:04 DAC960 crash dequeueing request Dave Olien
  2003-06-04 22:08 ` Andrew Morton
@ 2003-06-06 16:51 ` Dave Olien
  1 sibling, 0 replies; 3+ messages in thread
From: Dave Olien @ 2003-06-06 16:51 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel


Mary,

How's this for a response to Andrew and lkml?

------------------------------------------------------------------------

Andrew,

After running several iterations of the database workload, it seems
the patch below did indeed eliminate the BUG() we were hitting.

In addition, the patched kernel seems to perform somwhat better.
The workload performance is improved by about 2%, and the standard
deviation in performance between iterations of the work load was reduced
from 137 to 48.

A performance results comparison can be found at

http://www.osdl.org/projects/dbt2dev/results/8way/70AKPM/results.html

Thanks to Mary Meredith  <maryedie@osdl.org> for the performance
measurements and comparisons, and for hitting the BUG() in the first place.

Andrew Morton wrote:
> 
> Dave Olien wrote:
> > 
> > In linux 2.5.70, with no patches applied, we've had one BUG of the form:
> > 
> > kernel BUG at include/linux/blkdev.h:407!
> 
> 
> 
> The below should fix it.
> 
> 
> diff -puN drivers/block/deadline-iosched.c~deadline-hash-removal-fix drivers/block/deadline-iosched.c
> --- 25/drivers/block/deadline-iosched.c~deadline-hash-removal-fix	2003-06-04 00:50:36.000000000 -0700
> +++ 25-akpm/drivers/block/deadline-iosched.c	2003-06-04 00:50:36.000000000 -0700
> @@ -121,6 +121,15 @@ static inline void deadline_del_drq_hash
>  		__deadline_del_drq_hash(drq);
>  }
>  
> +static void
> +deadline_remove_merge_hints(request_queue_t *q, struct deadline_rq *drq)
> +{
> +	deadline_del_drq_hash(drq);
> +
> +	if (q->last_merge == &drq->request->queuelist)
> +		q->last_merge = NULL;
> +}
> +
>  static inline void
>  deadline_add_drq_hash(struct deadline_data *dd, struct deadline_rq *drq)
>  {
> @@ -310,7 +319,7 @@ static void deadline_remove_request(requ
>  		struct deadline_data *dd = q->elevator.elevator_data;
>  
>  		list_del_init(&drq->fifo);
> -		deadline_del_drq_hash(drq);
> +		deadline_remove_merge_hints(q, drq);
>  		deadline_del_drq_rb(dd, drq);
>  	}
>  }
> 
> _
> 
> ----- End forwarded message -----
> 
> --=-OFfR5BGtLWFFgyRX7TNt--


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-06-06 16:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-04 22:04 DAC960 crash dequeueing request Dave Olien
2003-06-04 22:08 ` Andrew Morton
2003-06-06 16:51 ` Dave Olien

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).