On Mar 1, 2018, at 9:04 AM, Theodore Ts'o wrote: > This doesn't seem to make sense; the PC is where we are currently > executing, and LR is the "Link Register" where the flow of control > will be returning after the current function returns, right? Well, > dx_probe should *not* be returning to __wait_on_bit(). So this just > seems.... weird. > > Ignoring the LR register, this stack trace looks sane... I can't see > which pointer could be NULL and getting dereferenced, though. How > easily can you reproduce the problem? Can you either (a) translate > the PC into a line number, or better yet, if you can reproduce, add a > series of BUG_ON's so we can see what's going on? > > + BUG_ON(frame); I think you mean: BUG_ON(frame == NULL); or BUG_ON(!frame); > memset(frame_in, 0, EXT4_HTREE_LEVEL * sizeof(frame_in[0])); > frame->bh = ext4_read_dirblock(dir, 0, INDEX); > if (IS_ERR(frame->bh)) > return (struct dx_frame *) frame->bh; > > + BUG_ON(frame->bh); > + BUG_ON(frame->bh->b_data); Same here. BUG_ON(frame->bh == NULL); BUG_ON(frame->bh->b_data == NULL); This is why I don't like implicit "is NULL" or "is non-zero" usage. Lustre used to require "== NULL" or "!= NULL" to avoid bugs like this, but had to abandon that because of upstream code style. > root = (struct dx_root *) frame->bh->b_data; > if (root->info.hash_version != DX_HASH_TEA && > root->info.hash_version != DX_HASH_HALF_MD4 && > root->info.hash_version != DX_HASH_LEGACY) { > > These are "could never" happen scenarios from looking at the code, but > that will help explain what is going on. > > If this is reliably only happening with mq, the only way I could see > that if is something is returning an error when it previously wasn't. > This isn't a problem we're seeing with any of our testing, though. > > Cheers, > > - Ted > Cheers, Andreas