All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs panic in 3.5.0
@ 2012-08-07 17:40 Marc MERLIN
  2012-08-07 18:14 ` Arne Jansen
  2012-08-08  2:38 ` Jérôme Poulin
  0 siblings, 2 replies; 8+ messages in thread
From: Marc MERLIN @ 2012-08-07 17:40 UTC (permalink / raw)
  To: linux-btrfs

Unfortunately I only have a screenshot.

Apparently the panic was in 
btrfs_set_lock_blocking_rw
with a RIP in btrfs_cow_block

Screenshot here:
http://marc.merlins.org/tmp/btrfs_oops.jpg

Because the display looks a bit messed up, I can't tell if the ata error
happened before or after the oops.

System rebooted ok.

Was there a better way to get this ooops if I didn't have serial console?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs panic in 3.5.0
  2012-08-07 17:40 btrfs panic in 3.5.0 Marc MERLIN
@ 2012-08-07 18:14 ` Arne Jansen
  2012-08-07 18:47   ` Marc MERLIN
  2012-08-08  2:38 ` Jérôme Poulin
  1 sibling, 1 reply; 8+ messages in thread
From: Arne Jansen @ 2012-08-07 18:14 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On 08/07/2012 07:40 PM, Marc MERLIN wrote:
> Unfortunately I only have a screenshot.
> 
> Apparently the panic was in 
> btrfs_set_lock_blocking_rw
> with a RIP in btrfs_cow_block
> 

Can you please resolve btrfs_cow_block+0x3b to a line number?

gdb btrfs.ko
(gdb) info line *btrfs_cow_block+0x3b

Thanks,
Arne

> Screenshot here:
> http://marc.merlins.org/tmp/btrfs_oops.jpg
> 
> Because the display looks a bit messed up, I can't tell if the ata error
> happened before or after the oops.
> 
> System rebooted ok.
> 
> Was there a better way to get this ooops if I didn't have serial console?
> 
> Marc
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs panic in 3.5.0
  2012-08-07 18:14 ` Arne Jansen
@ 2012-08-07 18:47   ` Marc MERLIN
  2012-08-09  2:52     ` Marc MERLIN
  0 siblings, 1 reply; 8+ messages in thread
From: Marc MERLIN @ 2012-08-07 18:47 UTC (permalink / raw)
  To: Arne Jansen; +Cc: linux-btrfs

On Tue, Aug 07, 2012 at 08:14:23PM +0200, Arne Jansen wrote:
> On 08/07/2012 07:40 PM, Marc MERLIN wrote:
> > Unfortunately I only have a screenshot.
> > 
> > Apparently the panic was in 
> > btrfs_set_lock_blocking_rw
> > with a RIP in btrfs_cow_block
> 
> Can you please resolve btrfs_cow_block+0x3b to a line number?
> 
> gdb btrfs.ko
> (gdb) info line *btrfs_cow_block+0x3b

So, I'm not very good at this, sorry if I'm doing it wrong:
gandalfthegreat:~# gdb /lib/modules/3.5.0-amd64-preempt-noide-20120410/kernel/fs/btrfs/btrfs.ko
Reading symbols from /lib/modules/3.5.0-amd64-preempt-noide-20120410/kernel/fs/btrfs/btrfs.ko...(no debugging symbols found)...done.
(gdb) info line *btrfs_cow_block+0x3b
No line number information available for address 0x9a6e

Mmmh, it seems that I'm missing a kernel option that adds symbols in modules?

I can add it for my next kernel compile. Do you have the config option name
off hand?

I put my module here if that helps:
http://marc.merlins.org/tmp/btrfs.ko

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs panic in 3.5.0
  2012-08-07 17:40 btrfs panic in 3.5.0 Marc MERLIN
  2012-08-07 18:14 ` Arne Jansen
@ 2012-08-08  2:38 ` Jérôme Poulin
  2012-08-08  2:47   ` Marc MERLIN
  1 sibling, 1 reply; 8+ messages in thread
From: Jérôme Poulin @ 2012-08-08  2:38 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On Tue, Aug 7, 2012 at 1:40 PM, Marc MERLIN <marc@merlins.org> wrote:
>
> System rebooted ok.

I just want to be sure that you are aware that your hard drive is
currently killing itself. Those READ FPDMA QUEUED mean that your hard
disk is relocatting bad sectors and has problem reading those.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs panic in 3.5.0
  2012-08-08  2:38 ` Jérôme Poulin
@ 2012-08-08  2:47   ` Marc MERLIN
  0 siblings, 0 replies; 8+ messages in thread
From: Marc MERLIN @ 2012-08-08  2:47 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-btrfs

On Tue, Aug 07, 2012 at 10:38:28PM -0400, Jérôme Poulin wrote:
> On Tue, Aug 7, 2012 at 1:40 PM, Marc MERLIN <marc@merlins.org> wrote:
> >
> > System rebooted ok.
> 
> I just want to be sure that you are aware that your hard drive is
> currently killing itself. Those READ FPDMA QUEUED mean that your hard
> disk is relocatting bad sectors and has problem reading those.

Yeah, I saw that, so it's actually an SSD (the wretched samsung one I've
been posting about), and I'm just about to return it.

What's interesting is that smart shows no such error:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       132
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       19
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       29
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   051   040   000    Old_age   Always       -       49
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   253   253   000    Old_age   Always       -       2
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       6
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       627681656

I'm not saying nothing is wrong with the drive, but that it's not a magnetic
bad sector.

Either way, I'm going to get a different SSD soon, although I guess this faliure mode
was useful in finding a bug in the btrfs code in the meantime :)

Thanks for the heads up
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs panic in 3.5.0
  2012-08-07 18:47   ` Marc MERLIN
@ 2012-08-09  2:52     ` Marc MERLIN
  2012-08-09  6:42       ` Arne Jansen
  0 siblings, 1 reply; 8+ messages in thread
From: Marc MERLIN @ 2012-08-09  2:52 UTC (permalink / raw)
  To: Arne Jansen; +Cc: linux-btrfs

On Tue, Aug 07, 2012 at 11:47:36AM -0700, Marc MERLIN wrote:
> On Tue, Aug 07, 2012 at 08:14:23PM +0200, Arne Jansen wrote:
> > On 08/07/2012 07:40 PM, Marc MERLIN wrote:
> > > Unfortunately I only have a screenshot.
> > > 
> > > Apparently the panic was in 
> > > btrfs_set_lock_blocking_rw
> > > with a RIP in btrfs_cow_block
> > 
> > Can you please resolve btrfs_cow_block+0x3b to a line number?
> > 
> > gdb btrfs.ko
> > (gdb) info line *btrfs_cow_block+0x3b
> 
> So, I'm not very good at this, sorry if I'm doing it wrong:
> gandalfthegreat:~# gdb /lib/modules/3.5.0-amd64-preempt-noide-20120410/kernel/fs/btrfs/btrfs.ko
> Reading symbols from /lib/modules/3.5.0-amd64-preempt-noide-20120410/kernel/fs/btrfs/btrfs.ko...(no debugging symbols found)...done.
> (gdb) info line *btrfs_cow_block+0x3b
> No line number information available for address 0x9a6e
> 
> Mmmh, it seems that I'm missing a kernel option that adds symbols in modules?
> 
> I can add it for my next kernel compile. Do you have the config option name
> off hand?
> 
> I put my module here if that helps:
> http://marc.merlins.org/tmp/btrfs.ko

I felt bad for having a kernel without debug symbols it seems, so I looked
at my kernel config and I do have:
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_REDUCED is not set

Any idea what else I'm missing to provide better debug info if I have a
problem again?

And is it reasonably easy to take the .ko apparently without line numbers,
like the one I gave you, and infer the line of code for a function offset?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs panic in 3.5.0
  2012-08-09  2:52     ` Marc MERLIN
@ 2012-08-09  6:42       ` Arne Jansen
  2012-08-09  6:43         ` Jan Schmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Arne Jansen @ 2012-08-09  6:42 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On 09.08.2012 04:52, Marc MERLIN wrote:
> On Tue, Aug 07, 2012 at 11:47:36AM -0700, Marc MERLIN wrote:
>> On Tue, Aug 07, 2012 at 08:14:23PM +0200, Arne Jansen wrote:
>>> On 08/07/2012 07:40 PM, Marc MERLIN wrote:
>>>> Unfortunately I only have a screenshot.
>>>>
>>>> Apparently the panic was in 
>>>> btrfs_set_lock_blocking_rw
>>>> with a RIP in btrfs_cow_block
>>>
>>> Can you please resolve btrfs_cow_block+0x3b to a line number?
>>>
>>> gdb btrfs.ko
>>> (gdb) info line *btrfs_cow_block+0x3b
>>
>> So, I'm not very good at this, sorry if I'm doing it wrong:
>> gandalfthegreat:~# gdb /lib/modules/3.5.0-amd64-preempt-noide-20120410/kernel/fs/btrfs/btrfs.ko
>> Reading symbols from /lib/modules/3.5.0-amd64-preempt-noide-20120410/kernel/fs/btrfs/btrfs.ko...(no debugging symbols found)...done.
>> (gdb) info line *btrfs_cow_block+0x3b
>> No line number information available for address 0x9a6e
>>
>> Mmmh, it seems that I'm missing a kernel option that adds symbols in modules?
>>
>> I can add it for my next kernel compile. Do you have the config option name
>> off hand?
>>
>> I put my module here if that helps:
>> http://marc.merlins.org/tmp/btrfs.ko
> 
> I felt bad for having a kernel without debug symbols it seems, so I looked
> at my kernel config and I do have:
> CONFIG_DEBUG_BUGVERBOSE=y
> CONFIG_DEBUG_INFO=y
> # CONFIG_DEBUG_INFO_REDUCED is not set
> 
> Any idea what else I'm missing to provide better debug info if I have a
> problem again?
> 
> And is it reasonably easy to take the .ko apparently without line numbers,
> like the one I gave you, and infer the line of code for a function offset?

The .ko is fine. It crashes here:

noinline int btrfs_cow_block(struct btrfs_trans_handle *trans,
                    struct btrfs_root *root, struct extent_buffer *buf,
                    struct extent_buffer *parent, int parent_slot,
                    struct extent_buffer **cow_ret)
{
        u64 search_start;
        int ret;

        if (trans->transaction != root->fs_info->running_transaction) {
                printk(KERN_CRIT "trans %llu running %llu\n",
                       (unsigned long long)trans->transid,
                       (unsigned long long)
                       root->fs_info->running_transaction->transid);
                                                          ^^

                WARN_ON(1);
        }

fs_info->running_transaction is probably NULL.


> 
> Thanks,
> Marc


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs panic in 3.5.0
  2012-08-09  6:42       ` Arne Jansen
@ 2012-08-09  6:43         ` Jan Schmidt
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Schmidt @ 2012-08-09  6:43 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Arne Jansen, linux-btrfs

On Thu, August 09, 2012 at 08:42 (+0200), Arne Jansen wrote:
> On 09.08.2012 04:52, Marc MERLIN wrote:
>> On Tue, Aug 07, 2012 at 11:47:36AM -0700, Marc MERLIN wrote:
>>> On Tue, Aug 07, 2012 at 08:14:23PM +0200, Arne Jansen wrote:
>>>> On 08/07/2012 07:40 PM, Marc MERLIN wrote:
>>>>> Unfortunately I only have a screenshot.
>>>>>
>>>>> Apparently the panic was in 
>>>>> btrfs_set_lock_blocking_rw
>>>>> with a RIP in btrfs_cow_block
>>>>
>>>> Can you please resolve btrfs_cow_block+0x3b to a line number?
>>>>
>>>> gdb btrfs.ko
>>>> (gdb) info line *btrfs_cow_block+0x3b
>>>
>>> So, I'm not very good at this, sorry if I'm doing it wrong:
>>> gandalfthegreat:~# gdb /lib/modules/3.5.0-amd64-preempt-noide-20120410/kernel/fs/btrfs/btrfs.ko
>>> Reading symbols from /lib/modules/3.5.0-amd64-preempt-noide-20120410/kernel/fs/btrfs/btrfs.ko...(no debugging symbols found)...done.
>>> (gdb) info line *btrfs_cow_block+0x3b
>>> No line number information available for address 0x9a6e
>>>
>>> Mmmh, it seems that I'm missing a kernel option that adds symbols in modules?
>>>
>>> I can add it for my next kernel compile. Do you have the config option name
>>> off hand?
>>>
>>> I put my module here if that helps:
>>> http://marc.merlins.org/tmp/btrfs.ko
>>
>> I felt bad for having a kernel without debug symbols it seems, so I looked
>> at my kernel config and I do have:
>> CONFIG_DEBUG_BUGVERBOSE=y
>> CONFIG_DEBUG_INFO=y
>> # CONFIG_DEBUG_INFO_REDUCED is not set
>>
>> Any idea what else I'm missing to provide better debug info if I have a
>> problem again?
>>
>> And is it reasonably easy to take the .ko apparently without line numbers,
>> like the one I gave you, and infer the line of code for a function offset?
> 
> The .ko is fine. It crashes here:
> 
> noinline int btrfs_cow_block(struct btrfs_trans_handle *trans,
>                     struct btrfs_root *root, struct extent_buffer *buf,
>                     struct extent_buffer *parent, int parent_slot,
>                     struct extent_buffer **cow_ret)
> {
>         u64 search_start;
>         int ret;
> 
>         if (trans->transaction != root->fs_info->running_transaction) {
>                 printk(KERN_CRIT "trans %llu running %llu\n",
>                        (unsigned long long)trans->transid,
>                        (unsigned long long)
>                        root->fs_info->running_transaction->transid);
>                                                           ^^
> 
>                 WARN_ON(1);
>         }
> 
> fs_info->running_transaction is probably NULL.

Agreed. Which means, that we probably came through btrfs_cleanup_transaction,
which explicitly sets it to NULL.

-Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-08-09  6:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-07 17:40 btrfs panic in 3.5.0 Marc MERLIN
2012-08-07 18:14 ` Arne Jansen
2012-08-07 18:47   ` Marc MERLIN
2012-08-09  2:52     ` Marc MERLIN
2012-08-09  6:42       ` Arne Jansen
2012-08-09  6:43         ` Jan Schmidt
2012-08-08  2:38 ` Jérôme Poulin
2012-08-08  2:47   ` Marc MERLIN

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.