All of lore.kernel.org
 help / color / mirror / Atom feed
* unable to handle kernel paging request - btrfs
@ 2016-09-22 12:18 Rich Freeman
  2016-09-22 12:44 ` Holger Hoffstätte
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Rich Freeman @ 2016-09-22 12:18 UTC (permalink / raw)
  To: Btrfs BTRFS

I have been getting panics consistently after doing a btrfs replace
operation on a raid1 and rebooting.  I linked a photo of the panic; I
haven't been able to get a text capture of it.

https://ibin.co/2vx0HhDeViu3.jpg

I'm getting this error on the latest 4.4, 4.1, and even on an old
3.18.26 kernel I had lying around.

I tried the remove root_log_ctx from ctx list before btrfs_sync_log
returns patch on 4.1 and that did not solve my problem either.

I'm able to boot into single-user mode and if I don't start any
processes the system seems fairly stable.  I am also able to start a
btrfs balance and run that for several hours without issue.  If I
start launching services the system will tend to panic, though how
many processes I can launch will vary.  I don't think that it is a
particular file being accessed that is triggering the issue since the
point where it fails varies.  I suspect it may be load-related.

Mounting with compress=no doesn't seem to help either.  Granted, I see
lzo_decompress in the backtrace and that is probably a read operation.

Any suggestions?  Google hasn't been helpful on this one...

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-22 12:18 unable to handle kernel paging request - btrfs Rich Freeman
@ 2016-09-22 12:44 ` Holger Hoffstätte
  2016-09-22 16:23   ` David Sterba
  2016-09-22 16:46 ` Rich Freeman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Holger Hoffstätte @ 2016-09-22 12:44 UTC (permalink / raw)
  To: Rich Freeman, Btrfs BTRFS

On 09/22/16 14:18, Rich Freeman wrote:
> I have been getting panics consistently after doing a btrfs replace
> operation on a raid1 and rebooting.  I linked a photo of the panic; I
> haven't been able to get a text capture of it.
> 
> https://ibin.co/2vx0HhDeViu3.jpg
> 
> I'm getting this error on the latest 4.4, 4.1, and even on an old
> 3.18.26 kernel I had lying around.

What happens when you try to boot e.g. SystemRescueCD?
If it is what I think it is (see below) then that should start bombing
as well.

> I'm able to boot into single-user mode and if I don't start any
> processes the system seems fairly stable.  I am also able to start a
> btrfs balance and run that for several hours without issue.  If I
> start launching services the system will tend to panic, though how
> many processes I can launch will vary.  I don't think that it is a
> particular file being accessed that is triggering the issue since the
> point where it fails varies.  I suspect it may be load-related.

If the SystemRescue method does not work then you have either
an overheating/dying CPU or - more likely - bad memory.

Another - probably unlikely, but not impossible - option would be to
delete the swap file, if you have one. I've seen some super-strange
things with corrupt or incorrectly created swap *even if it isn't
heavily used*, right after boot. E.g. if your swapfile was fallocated
instead of dd'ed and lives on ext4 or XFS, you *must* use the -z option
to pre-touch all extents.

Or maybe it's the ghost of ZFS, if that's patched in as well.. :-)

-h


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-22 12:44 ` Holger Hoffstätte
@ 2016-09-22 16:23   ` David Sterba
  0 siblings, 0 replies; 20+ messages in thread
From: David Sterba @ 2016-09-22 16:23 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Rich Freeman, Btrfs BTRFS

On Thu, Sep 22, 2016 at 02:44:44PM +0200, Holger Hoffstätte wrote:
> On 09/22/16 14:18, Rich Freeman wrote:
> > I have been getting panics consistently after doing a btrfs replace
> > operation on a raid1 and rebooting.  I linked a photo of the panic; I
> > haven't been able to get a text capture of it.
> > 
> > https://ibin.co/2vx0HhDeViu3.jpg
> > 
> > I'm getting this error on the latest 4.4, 4.1, and even on an old
> > 3.18.26 kernel I had lying around.
> 
> What happens when you try to boot e.g. SystemRescueCD?
> If it is what I think it is (see below) then that should start bombing
> as well.
> 
> > I'm able to boot into single-user mode and if I don't start any
> > processes the system seems fairly stable.  I am also able to start a
> > btrfs balance and run that for several hours without issue.  If I
> > start launching services the system will tend to panic, though how
> > many processes I can launch will vary.  I don't think that it is a
> > particular file being accessed that is triggering the issue since the
> > point where it fails varies.  I suspect it may be load-related.
> 
> If the SystemRescue method does not work then you have either
> an overheating/dying CPU or - more likely - bad memory.

Maybe a bad memory, I'd suspect other layers than the filesystem. The
faulty address is 0xfff...ffd8 which is -40 == -ELOOP. This error code
is not used in btrfs. It looks like an error code ended up returned as a
pointer so any access to it explodes. The last line in the list is

"Fixing recursive fault but reboot is needed"

so it's recursive or looping somewhere deep in the memory management.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-22 12:18 unable to handle kernel paging request - btrfs Rich Freeman
  2016-09-22 12:44 ` Holger Hoffstätte
@ 2016-09-22 16:46 ` Rich Freeman
  2016-09-22 17:29   ` Chris Murphy
  2016-09-22 17:41 ` Jeff Mahoney
  2016-09-23  4:58 ` Duncan
  3 siblings, 1 reply; 20+ messages in thread
From: Rich Freeman @ 2016-09-22 16:46 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sep 22, 2016 8:18 AM, "Rich Freeman" <r-btrfs@thefreemanclan.net> wrote:
>
> I have been getting panics consistently after doing a btrfs replace
> operation on a raid1 and rebooting.  I linked a photo of the panic; I
> haven't been able to get a text capture of it.
>
> https://ibin.co/2vx0HhDeViu3.jpg
>
> I'm getting this error on the latest 4.4, 4.1, and even on an old
> 3.18.26 kernel I had lying around.


Apologies for replying out of order, but I lost the reply (the
downside to running btrfs on my mail server).

I was also suspicious of memory but memtestx86 hasn't found anything
in (though I haven't run it longer than 10min). Sysrescuecd is my
normal rescue image and I haven't gotten that to panic, but I haven't
tried to start any services from it. Unfortunately it is openrc based
so getting systemd running on my btrfs image inside a container will
be painful. Maybe I can find a systemd based rescue image that
includes nspawn.

I suspect I'll have the same issue though as several kernels have failed.

I'll do more extensive RAM tests just in case.  However, the
consistency of the error messages suggests a bug to me.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-22 16:46 ` Rich Freeman
@ 2016-09-22 17:29   ` Chris Murphy
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Murphy @ 2016-09-22 17:29 UTC (permalink / raw)
  To: Rich Freeman; +Cc: Btrfs BTRFS

On Thu, Sep 22, 2016 at 10:46 AM, Rich Freeman
<r-btrfs@thefreemanclan.net> wrote:
> On Sep 22, 2016 8:18 AM, "Rich Freeman" <r-btrfs@thefreemanclan.net> wrote:
>>
>> I have been getting panics consistently after doing a btrfs replace
>> operation on a raid1 and rebooting.  I linked a photo of the panic; I
>> haven't been able to get a text capture of it.
>>
>> https://ibin.co/2vx0HhDeViu3.jpg
>>
>> I'm getting this error on the latest 4.4, 4.1, and even on an old
>> 3.18.26 kernel I had lying around.
>
>
> Apologies for replying out of order, but I lost the reply (the
> downside to running btrfs on my mail server).
>
> I was also suspicious of memory but memtestx86 hasn't found anything
> in (though I haven't run it longer than 10min).

It can take hours or days. Only the most obvious memory problems are
found in minutes with memtest86+. And it's possible it won't find the
more rare problems.

I mentioned this in a related thread:

As for hardware induced corruptions there are lots of threads in the
archive, quick search:

http://www.spinics.net/lists/linux-btrfs/msg56954.html
http://www.spinics.net/lists/linux-btrfs/msg57008.html




Sysrescuecd is my
> normal rescue image and I haven't gotten that to panic, but I haven't
> tried to start any services from it. Unfortunately it is openrc based
> so getting systemd running on my btrfs image inside a container will
> be painful. Maybe I can find a systemd based rescue image that
> includes nspawn.

This ISO can be dd'd to a USB stick and will boot almost anything,
BIOS or UEFI, even a Mac.
https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20160921.n.0/compose/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-Rawhide-20160921.n.0.iso


You could of course use an actual release version of Fedora, I just
happen to have the URL handy and have tested that ISO. It has kernel
4.8rc7 and btrfs-progs 4.7.2. It is a "live" cd. You might prefer to
add 3 to boot parameters to not get a graphical boot. Login is either
root or liveuser, no password. Change the password for either one,
then 'systemctl start sshd' and 'ip a' to get an IP to login to
remotely.


>
> I suspect I'll have the same issue though as several kernels have failed.
>
> I'll do more extensive RAM tests just in case.  However, the
> consistency of the error messages suggests a bug to me.

Or binary file corruption.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-22 12:18 unable to handle kernel paging request - btrfs Rich Freeman
  2016-09-22 12:44 ` Holger Hoffstätte
  2016-09-22 16:46 ` Rich Freeman
@ 2016-09-22 17:41 ` Jeff Mahoney
  2016-09-30 18:54   ` Rich Freeman
  2016-09-23  4:58 ` Duncan
  3 siblings, 1 reply; 20+ messages in thread
From: Jeff Mahoney @ 2016-09-22 17:41 UTC (permalink / raw)
  To: Rich Freeman, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1632 bytes --]

On 9/22/16 8:18 AM, Rich Freeman wrote:
> I have been getting panics consistently after doing a btrfs replace
> operation on a raid1 and rebooting.  I linked a photo of the panic; I
> haven't been able to get a text capture of it.
> 
> https://ibin.co/2vx0HhDeViu3.jpg
> 
> I'm getting this error on the latest 4.4, 4.1, and even on an old
> 3.18.26 kernel I had lying around.
> 
> I tried the remove root_log_ctx from ctx list before btrfs_sync_log
> returns patch on 4.1 and that did not solve my problem either.
> 
> I'm able to boot into single-user mode and if I don't start any
> processes the system seems fairly stable.  I am also able to start a
> btrfs balance and run that for several hours without issue.  If I
> start launching services the system will tend to panic, though how
> many processes I can launch will vary.  I don't think that it is a
> particular file being accessed that is triggering the issue since the
> point where it fails varies.  I suspect it may be load-related.
> 
> Mounting with compress=no doesn't seem to help either.  Granted, I see
> lzo_decompress in the backtrace and that is probably a read operation.
> 
> Any suggestions?  Google hasn't been helpful on this one...

Can you boot with panic_on_oops=1, reproduce it, and capture that Oops?
The trace in your photo is a secondary Oops (tainted D), which means
that something else went wrong before that and now the system is
tripping over it.  Secondary Oopses don't really help the debugging
process because the system was already in a broken, undefined, state.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-22 12:18 unable to handle kernel paging request - btrfs Rich Freeman
                   ` (2 preceding siblings ...)
  2016-09-22 17:41 ` Jeff Mahoney
@ 2016-09-23  4:58 ` Duncan
  2016-09-25 13:55   ` Rich Freeman
  3 siblings, 1 reply; 20+ messages in thread
From: Duncan @ 2016-09-23  4:58 UTC (permalink / raw)
  To: linux-btrfs

Rich Freeman posted on Thu, 22 Sep 2016 07:18:35 -0500 as excerpted:

> I have been getting panics consistently after doing a btrfs replace
> operation on a raid1 and rebooting.  I linked a photo of the panic; I
> haven't been able to get a text capture of it.
> 
> https://ibin.co/2vx0HhDeViu3.jpg
> 
> I'm getting this error on the latest 4.4, 4.1, and even on an old
> 3.18.26 kernel I had lying around.
> 
> I tried the remove root_log_ctx from ctx list before btrfs_sync_log
> returns patch on 4.1 and that did not solve my problem either.
> 
> I'm able to boot into single-user mode and if I don't start any
> processes the system seems fairly stable.  I am also able to start a
> btrfs balance and run that for several hours without issue.  If I start
> launching services the system will tend to panic, though how many
> processes I can launch will vary.  I don't think that it is a particular
> file being accessed that is triggering the issue since the point where
> it fails varies.  I suspect it may be load-related.
> 
> Mounting with compress=no doesn't seem to help either.  Granted, I see
> lzo_decompress in the backtrace and that is probably a read operation.
> 
> Any suggestions?  Google hasn't been helpful on this one...

Btrfs raid1 you say, and you have existing compressed files it's trying 
to read in the backtrace?

Sounds like the issues I see sometimes and have posted about where after 
a crash that resulted in one device of my raid1 pair getting behind the 
other, the kernel will crash if it sees too many csum-errors, even tho 
it's /supposed/ to check the other copy and read from it if valid (which 
it is as a btrfs scrub resolves the issue).

When booted to rescue/single-user mode, can you run a scrub?  If it's the 
csum-related problem I see and the replace worked, a scrub should 
complete fine, repairing the bad copy from the mirror, and the problem 
should be resolved.  If the replace bugged out and you now have only one 
copy of some chunks, if scrub finds an error there it obviously won't be 
able to repair from the good mirror, but it should at least spot some csum 
errors it can't repair.

If a scrub crashes too, if it completes without finding any errors to 
correct, or if it finds and corrects errors but the issue persists, then 
it's unlikely to be the issue I've seen.

FWIW, the issue I've seen appears to be related to attempts to read 
compressed files.  It does not appear to affect users who don't have any 
such files or do but they're simply not accessed in ordinary operations.  
It may or may not affect other than raid1 and likely raid10, but they 
make it easiest to verify due to the possibility of one copy getting out 
of sync with the other, and due to scrub's ability to confirm that as the 
problem as it can repair the bad copy from the good one, which the kernel 
should do dynamically as well, but that's where the bug is as too many 
dynamic csum errors trigger a crash even when there's a second copy 
available, that scrub later verifies as valid.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-23  4:58 ` Duncan
@ 2016-09-25 13:55   ` Rich Freeman
  2016-09-26  0:22     ` Jeff Mahoney
  2016-09-26  2:21     ` Duncan
  0 siblings, 2 replies; 20+ messages in thread
From: Rich Freeman @ 2016-09-25 13:55 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>
> Btrfs raid1 you say, and you have existing compressed files it's trying
> to read in the backtrace?
>
> Sounds like the issues I see sometimes and have posted about where after
> a crash that resulted in one device of my raid1 pair getting behind the
> other, the kernel will crash if it sees too many csum-errors, even tho
> it's /supposed/ to check the other copy and read from it if valid (which
> it is as a btrfs scrub resolves the issue).
>
> When booted to rescue/single-user mode, can you run a scrub?

After a few reboots trying to capture the initial panic message (even
when I set panic_on_oops=1 I was getting multiple ones with only the
tainted one staying on screen), the system managed to stay up.  I
completed a scrub and it found no errors.  I also haven't had any
issues with it but haven't attempted another reboot.  I figured the
safest course was to just leave it on for a good week so that whatever
was in the log/etc that was giving it trouble works its way out.  I'm
also doing a balance which may or may not help (and which is useful
anyway since I increased the size of the drive I replaced).

I'm still pretty skeptical of a hardware problem, but once I think the
system is able to be safely rebooted I'll go ahead and run a longer
memory test/etc.  This really doesn't seem like a memory problem, and
I don't see a corrupted binary as an issue since everything running in
kernel space is versioned and the problem happens with multiple kernel
versions (and the older ones haven't been touched on-disk in ages).  A
problem in glibc/etc shouldn't cause a kernel oops absent a bug.  But,
if there is a hardware problem I obviously want to know about it, and
I've had a few RAM failures over the years...

--
Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-25 13:55   ` Rich Freeman
@ 2016-09-26  0:22     ` Jeff Mahoney
  2016-09-26  0:37       ` Rich Freeman
  2016-09-26  2:21     ` Duncan
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff Mahoney @ 2016-09-26  0:22 UTC (permalink / raw)
  To: Rich Freeman, Duncan; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1388 bytes --]

On 9/25/16 9:55 AM, Rich Freeman wrote:
> On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>>
>> Btrfs raid1 you say, and you have existing compressed files it's trying
>> to read in the backtrace?
>>
>> Sounds like the issues I see sometimes and have posted about where after
>> a crash that resulted in one device of my raid1 pair getting behind the
>> other, the kernel will crash if it sees too many csum-errors, even tho
>> it's /supposed/ to check the other copy and read from it if valid (which
>> it is as a btrfs scrub resolves the issue).
>>
>> When booted to rescue/single-user mode, can you run a scrub?
> 
> After a few reboots trying to capture the initial panic message (even
> when I set panic_on_oops=1 I was getting multiple ones with only the
> tainted one staying on screen), the system managed to stay up.  I
> completed a scrub and it found no errors.  I also haven't had any
> issues with it but haven't attempted another reboot.  I figured the
> safest course was to just leave it on for a good week so that whatever
> was in the log/etc that was giving it trouble works its way out.  I'm
> also doing a balance which may or may not help (and which is useful
> anyway since I increased the size of the drive I replaced).

If it stays up, can you post the initial Oops then?

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-26  0:22     ` Jeff Mahoney
@ 2016-09-26  0:37       ` Rich Freeman
  2016-09-26  0:39         ` Jeff Mahoney
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Freeman @ 2016-09-26  0:37 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Duncan, Btrfs BTRFS

On Sun, Sep 25, 2016 at 7:22 PM, Jeff Mahoney <jeffm@suse.com> wrote:
> On 9/25/16 9:55 AM, Rich Freeman wrote:
>> On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>>>
>>> Btrfs raid1 you say, and you have existing compressed files it's trying
>>> to read in the backtrace?
>>>
>>> Sounds like the issues I see sometimes and have posted about where after
>>> a crash that resulted in one device of my raid1 pair getting behind the
>>> other, the kernel will crash if it sees too many csum-errors, even tho
>>> it's /supposed/ to check the other copy and read from it if valid (which
>>> it is as a btrfs scrub resolves the issue).
>>>
>>> When booted to rescue/single-user mode, can you run a scrub?
>>
>> After a few reboots trying to capture the initial panic message (even
>> when I set panic_on_oops=1 I was getting multiple ones with only the
>> tainted one staying on screen), the system managed to stay up.  I
>> completed a scrub and it found no errors.  I also haven't had any
>> issues with it but haven't attempted another reboot.  I figured the
>> safest course was to just leave it on for a good week so that whatever
>> was in the log/etc that was giving it trouble works its way out.  I'm
>> also doing a balance which may or may not help (and which is useful
>> anyway since I increased the size of the drive I replaced).
>
> If it stays up, can you post the initial Oops then?
>

Unfortunately, it stays up because there is no OOPS.  It was crashing
fairly consistently, but for whatever reason it didn't this time.
Since I needed the box working and wasn't having a lot of luck
capturing the OOPS I just let it run with minimal prodding, and
hopefully it is now in a state where it won't crash.

But, if it happens again I'll try to capture an initial OOPS output,
and I'll do a memory test in any case (though I really am not
expecting anything there).

If I were able to get kernel core dumping working on this machine,
would that contain information about the initial oops.  I forget if
they contain the full ring buffer/etc.  I used to have it working but
some change in either the kernel or the utils was causing issues with
it.  I still boot my kernels with space set aside for the crash
kernel...

--
Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-26  0:37       ` Rich Freeman
@ 2016-09-26  0:39         ` Jeff Mahoney
  2016-09-26  0:42           ` Rich Freeman
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Mahoney @ 2016-09-26  0:39 UTC (permalink / raw)
  To: Rich Freeman; +Cc: Duncan, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 2556 bytes --]

On 9/25/16 8:37 PM, Rich Freeman wrote:
> On Sun, Sep 25, 2016 at 7:22 PM, Jeff Mahoney <jeffm@suse.com> wrote:
>> On 9/25/16 9:55 AM, Rich Freeman wrote:
>>> On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>>>>
>>>> Btrfs raid1 you say, and you have existing compressed files it's trying
>>>> to read in the backtrace?
>>>>
>>>> Sounds like the issues I see sometimes and have posted about where after
>>>> a crash that resulted in one device of my raid1 pair getting behind the
>>>> other, the kernel will crash if it sees too many csum-errors, even tho
>>>> it's /supposed/ to check the other copy and read from it if valid (which
>>>> it is as a btrfs scrub resolves the issue).
>>>>
>>>> When booted to rescue/single-user mode, can you run a scrub?
>>>
>>> After a few reboots trying to capture the initial panic message (even
>>> when I set panic_on_oops=1 I was getting multiple ones with only the
>>> tainted one staying on screen), the system managed to stay up.  I
>>> completed a scrub and it found no errors.  I also haven't had any
>>> issues with it but haven't attempted another reboot.  I figured the
>>> safest course was to just leave it on for a good week so that whatever
>>> was in the log/etc that was giving it trouble works its way out.  I'm
>>> also doing a balance which may or may not help (and which is useful
>>> anyway since I increased the size of the drive I replaced).
>>
>> If it stays up, can you post the initial Oops then?
>>
> 
> Unfortunately, it stays up because there is no OOPS.  It was crashing
> fairly consistently, but for whatever reason it didn't this time.
> Since I needed the box working and wasn't having a lot of luck
> capturing the OOPS I just let it run with minimal prodding, and
> hopefully it is now in a state where it won't crash.
> 
> But, if it happens again I'll try to capture an initial OOPS output,
> and I'll do a memory test in any case (though I really am not
> expecting anything there).
> 
> If I were able to get kernel core dumping working on this machine,
> would that contain information about the initial oops.  I forget if
> they contain the full ring buffer/etc.  I used to have it working but
> some change in either the kernel or the utils was causing issues with
> it.  I still boot my kernels with space set aside for the crash
> kernel...

I'm not sure about other distros, but at least with SLES/openSUSE you
can configure kdump to /just/ dump the dmesg.

-Jeff


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-26  0:39         ` Jeff Mahoney
@ 2016-09-26  0:42           ` Rich Freeman
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Freeman @ 2016-09-26  0:42 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Duncan, Btrfs BTRFS

On Sun, Sep 25, 2016 at 7:39 PM, Jeff Mahoney <jeffm@suse.com> wrote:
>
> I'm not sure about other distros, but at least with SLES/openSUSE you
> can configure kdump to /just/ dump the dmesg.
>

Well, on Gentoo I wrote the official docs on how it works, quite some
time ago...  :)  It is purely manual, so you can of course capture
whatever you want.  Just have it reboot to a shell and then save what
you need to.  An automated cross-distro core capture tool would
probably be useful.  I wonder if there is a generic one floating
around somewhere.

--
RIch

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-25 13:55   ` Rich Freeman
  2016-09-26  0:22     ` Jeff Mahoney
@ 2016-09-26  2:21     ` Duncan
  1 sibling, 0 replies; 20+ messages in thread
From: Duncan @ 2016-09-26  2:21 UTC (permalink / raw)
  To: linux-btrfs

Rich Freeman posted on Sun, 25 Sep 2016 09:55:42 -0400 as excerpted:

> On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>>
>> Btrfs raid1 you say, and you have existing compressed files it's trying
>> to read in the backtrace?
>>
>> Sounds like the issues I see sometimes and have posted about where
>> after a crash that resulted in one device of my raid1 pair getting
>> behind the other, the kernel will crash if it sees too many
>> csum-errors, even tho it's /supposed/ to check the other copy and read
>> from it if valid (which it is as a btrfs scrub resolves the issue).
>>
>> When booted to rescue/single-user mode, can you run a scrub?
> 
> After a few reboots trying to capture the initial panic message (even
> when I set panic_on_oops=1 I was getting multiple ones with only the
> tainted one staying on screen), the system managed to stay up.  I
> completed a scrub and it found no errors.

Well, so much for that theory.  If it found and fixed errors you'd likely 
be seeing the same problem I see sometimes, but if it didn't find any to 
fix... unlikely.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-22 17:41 ` Jeff Mahoney
@ 2016-09-30 18:54   ` Rich Freeman
  2016-09-30 20:55     ` Jeff Mahoney
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Freeman @ 2016-09-30 18:54 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Btrfs BTRFS

On Thu, Sep 22, 2016 at 1:41 PM, Jeff Mahoney <jeffm@suse.com> wrote:
> On 9/22/16 8:18 AM, Rich Freeman wrote:
>> I have been getting panics consistently after doing a btrfs replace
>> operation on a raid1 and rebooting.  I linked a photo of the panic; I
>> haven't been able to get a text capture of it.
>>
>> https://ibin.co/2vx0HhDeViu3.jpg
>>
>> I'm getting this error on the latest 4.4, 4.1, and even on an old
>> 3.18.26 kernel I had lying around.
>>
>> I tried the remove root_log_ctx from ctx list before btrfs_sync_log
>> returns patch on 4.1 and that did not solve my problem either.
>>
>> I'm able to boot into single-user mode and if I don't start any
>> processes the system seems fairly stable.  I am also able to start a
>> btrfs balance and run that for several hours without issue.  If I
>> start launching services the system will tend to panic, though how
>> many processes I can launch will vary.  I don't think that it is a
>> particular file being accessed that is triggering the issue since the
>> point where it fails varies.  I suspect it may be load-related.
>>
>> Mounting with compress=no doesn't seem to help either.  Granted, I see
>> lzo_decompress in the backtrace and that is probably a read operation.
>>
>> Any suggestions?  Google hasn't been helpful on this one...
>
> Can you boot with panic_on_oops=1, reproduce it, and capture that Oops?
> The trace in your photo is a secondary Oops (tainted D), which means
> that something else went wrong before that and now the system is
> tripping over it.  Secondary Oopses don't really help the debugging
> process because the system was already in a broken, undefined, state.
>

Ok, the system has been up for a week without issue, but just paniced
and rebooted right towards the end of a balance (it literally had
about 30 of 2500 chunks left).

After it came up (and waiting for it to fully mount as there were a
bunch of free space warnings/etc) I managed to capture an initial oops
when it happened again:

https://ibin.co/2wt0n2IaCOA3.jpg

This is on a system without swap, though my understanding is that the
paging system is used for other things.

Note that I've updated my kernel since my last post.  When it paniced
during the balance it was running 4.4.21, and on the oops I actually
captured it was on 4.4.23 (I was actually just waiting for the balance
to finish before rebooting with a new kernel).

--
Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-30 18:54   ` Rich Freeman
@ 2016-09-30 20:55     ` Jeff Mahoney
  2016-09-30 21:07       ` Rich Freeman
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Mahoney @ 2016-09-30 20:55 UTC (permalink / raw)
  To: Rich Freeman; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 2862 bytes --]

On 9/30/16 2:54 PM, Rich Freeman wrote:
> On Thu, Sep 22, 2016 at 1:41 PM, Jeff Mahoney <jeffm@suse.com> wrote:
>> On 9/22/16 8:18 AM, Rich Freeman wrote:
>>> I have been getting panics consistently after doing a btrfs replace
>>> operation on a raid1 and rebooting.  I linked a photo of the panic; I
>>> haven't been able to get a text capture of it.
>>>
>>> https://ibin.co/2vx0HhDeViu3.jpg
>>>
>>> I'm getting this error on the latest 4.4, 4.1, and even on an old
>>> 3.18.26 kernel I had lying around.
>>>
>>> I tried the remove root_log_ctx from ctx list before btrfs_sync_log
>>> returns patch on 4.1 and that did not solve my problem either.
>>>
>>> I'm able to boot into single-user mode and if I don't start any
>>> processes the system seems fairly stable.  I am also able to start a
>>> btrfs balance and run that for several hours without issue.  If I
>>> start launching services the system will tend to panic, though how
>>> many processes I can launch will vary.  I don't think that it is a
>>> particular file being accessed that is triggering the issue since the
>>> point where it fails varies.  I suspect it may be load-related.
>>>
>>> Mounting with compress=no doesn't seem to help either.  Granted, I see
>>> lzo_decompress in the backtrace and that is probably a read operation.
>>>
>>> Any suggestions?  Google hasn't been helpful on this one...
>>
>> Can you boot with panic_on_oops=1, reproduce it, and capture that Oops?
>> The trace in your photo is a secondary Oops (tainted D), which means
>> that something else went wrong before that and now the system is
>> tripping over it.  Secondary Oopses don't really help the debugging
>> process because the system was already in a broken, undefined, state.
>>
> 
> Ok, the system has been up for a week without issue, but just paniced
> and rebooted right towards the end of a balance (it literally had
> about 30 of 2500 chunks left).
> 
> After it came up (and waiting for it to fully mount as there were a
> bunch of free space warnings/etc) I managed to capture an initial oops
> when it happened again:
> 
> https://ibin.co/2wt0n2IaCOA3.jpg
> 
> This is on a system without swap, though my understanding is that the
> paging system is used for other things.

It's not paging in the way Microsoft uses the term.  In this context, it
just means that the kernel tried to resolve a virtual address and
failed.  When the address is < PAGE_SIZE, it prints the NULL pointer
dereference message instead.  It's literally the same code otherwise.
Short version: it's the same thing as a segfault in userspace code.

This looks like a use-after-free on one of the pages used for
compression.  Can you post the output of objdump -Dr
/lib/modules/$(uname -r)/kernel/fs/btrfs/btrfs.ko somewhere?

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-30 20:55     ` Jeff Mahoney
@ 2016-09-30 21:07       ` Rich Freeman
  2016-10-01  0:38         ` Jeff Mahoney
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Freeman @ 2016-09-30 21:07 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Btrfs BTRFS

On Fri, Sep 30, 2016 at 4:55 PM, Jeff Mahoney <jeffm@suse.com> wrote:
> This looks like a use-after-free on one of the pages used for
> compression.  Can you post the output of objdump -Dr
> /lib/modules/$(uname -r)/kernel/fs/btrfs/btrfs.ko somewhere?
>

Sure:
https://drive.google.com/open?id=0BwUDImviY_gcR3JfT0Z1cUlRVEk

I was impressed by just how large it was.

I take it you're going to try to use the offsets in the oops to figure
out where it went wrong?  I really need to get kernel core dumping
working on this box...

--
Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-09-30 21:07       ` Rich Freeman
@ 2016-10-01  0:38         ` Jeff Mahoney
  2016-10-07 14:00           ` Rich Freeman
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Mahoney @ 2016-10-01  0:38 UTC (permalink / raw)
  To: Rich Freeman; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1071 bytes --]

On 9/30/16 5:07 PM, Rich Freeman wrote:
> On Fri, Sep 30, 2016 at 4:55 PM, Jeff Mahoney <jeffm@suse.com> wrote:
>> This looks like a use-after-free on one of the pages used for
>> compression.  Can you post the output of objdump -Dr
>> /lib/modules/$(uname -r)/kernel/fs/btrfs/btrfs.ko somewhere?
>>
> 
> Sure:
> https://drive.google.com/open?id=0BwUDImviY_gcR3JfT0Z1cUlRVEk
> 
> I was impressed by just how large it was.
> 
> I take it you're going to try to use the offsets in the oops to figure
> out where it went wrong?  I really need to get kernel core dumping
> working on this box...

Yep.  What I think is happening is that we have workspace getting freed
while it's in use.  The faulting address is in vmalloc space and it's
also the first argument to memcpy, which makes it the destination.  In
lzo_decompress_biovec, that means it's the workspace->cbuf.  Beyond that
I'll have to dig a bit more.

It's the same fault that your first photo showed as a secondary Oops,
but that's not always the case.


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-10-01  0:38         ` Jeff Mahoney
@ 2016-10-07 14:00           ` Rich Freeman
  2016-10-08 21:55             ` Rich Freeman
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Freeman @ 2016-10-07 14:00 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Btrfs BTRFS

On Fri, Sep 30, 2016 at 8:38 PM, Jeff Mahoney <jeffm@suse.com> wrote:
> On 9/30/16 5:07 PM, Rich Freeman wrote:
>> On Fri, Sep 30, 2016 at 4:55 PM, Jeff Mahoney <jeffm@suse.com> wrote:
>>> This looks like a use-after-free on one of the pages used for
>>> compression.  Can you post the output of objdump -Dr
>>> /lib/modules/$(uname -r)/kernel/fs/btrfs/btrfs.ko somewhere?
>>>
>>
>> Sure:
>> https://drive.google.com/open?id=0BwUDImviY_gcR3JfT0Z1cUlRVEk
>>
>> I was impressed by just how large it was.
>>
>> I take it you're going to try to use the offsets in the oops to figure
>> out where it went wrong?  I really need to get kernel core dumping
>> working on this box...
>
> Yep.  What I think is happening is that we have workspace getting freed
> while it's in use.  The faulting address is in vmalloc space and it's
> also the first argument to memcpy, which makes it the destination.  In
> lzo_decompress_biovec, that means it's the workspace->cbuf.  Beyond that
> I'll have to dig a bit more.
>

I'll confess to not being much of a kernel hacker, but could this
error also be caused by a buffer overrun?  If working_bytes or
in_page_bytes_left are larger than the size of the buffer then the
memcpy would overrun the length of the buffer.  I don't know if that
generates a different error than the one reported.

What guarantee do we have that working_bytes is less than the size of
workspace->cbuf?  I'm just throwing stuff out there because as far as
I can tell the code never frees workspace (I'm guessing kunmap at the
very end might take care of it).

--
Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-10-07 14:00           ` Rich Freeman
@ 2016-10-08 21:55             ` Rich Freeman
  2016-10-10 12:54               ` Rich Freeman
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Freeman @ 2016-10-08 21:55 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Btrfs BTRFS

I'm not sure if this is related to the same issue or not, but I just
started getting a new BUG, followed by a panic.  (I'm also enabled
network console capture so that you won't have to squint at photos.)

Original BUG is:


[14740.444257] ------------[ cut here ]------------
[14740.444293] kernel BUG at /usr/src/linux-stable/fs/btrfs/volumes.c:5509!
[14740.444323] invalid opcode: 0000 [#1] SMP
[14740.444348] Modules linked in: nfsd auth_rpcgss oid_registry lockd
grace sunrpc it87 hwmon_vid netconsole configfs tun ipt_MASQUERADE
nf_nat_masquerade_ipv4 xt_conntrack veth iptable_mangle iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_n
         at nf_conntrack iptable_filter ip_tables ext4 crc16 mbcache
jbd2 radeon nxt200x cx88_dvb cx88_vp3054_i2c videobuf2_dvb
dvb_coretuner_simple tuner_types tuner fbcon bitblit softcursor font
tileblit drm_kms_helper kvm_amd kvm cfbfillrect syscopyarea cfbimgblt
sysfillrect sysimgblt mousedev fb_sys_fops cfbcopyarea cx88_alsa ttm
cx8802 drm cx8800 videobuf2_dma_sg videobuf2_memops videobuf2_v4l2
cx88xx snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel
videobuf2_core snd_hda_codec tveeprom rc_core irqbypass v4l2_common
videodev k10temp i2c_algo_bit
[14740.444799]  snd_hwdep i2c_piix4 snd_hda_core hid_logitech_hidpp
snd_pcm r8169 8250 snd_timer snd mii 8250_base backlight serial_core
soundcore evdev sch_fq_codel hid_logitech_dj hid_generic usbhid btrfs
firewire_ohci atkbd ata_generic pata_acpi firew
       ire_core crc_itu_t xor zlib_deflate ohci_pci pata_atiixp
raid6_pq ehci_pci ohci_hcd ehci_hcd usbcore usb_common dm_mirror
dm_region_hash dm_log dm_mod
[14740.445028] CPU: 1 PID: 3213 Comm: kworker/u16:2 Not tainted 4.4.24 #1
[14740.445056] Hardware name: Gigabyte Technology Co., Ltd.
GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010
[14740.445116] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[14740.445143] task: ffff8803ff527300 ti: ffff8803e3c8c000 task.ti:
ffff8803e3c8c000
[14740.445173] RIP: 0010:[<ffffffffa02c3ffd>]  [<ffffffffa02c3ffd>]
__btrfs_map_block+0xdfd/0x1140 [btrfs]
[14740.445226] RSP: 0018:ffff8803e3c8faa0  EFLAGS: 00010282
[14740.445248] RAX: 00000000cdf2f040 RBX: 0000000000000002 RCX: 0000000000000002
[14740.445277] RDX: 0000000000000000 RSI: 0000000021b27000 RDI: ffff8800cab4fb40
[14740.445306] RBP: ffff8803e3c8fb88 R08: 0000050743c00000 R09: 00000000cdf2f040
[14740.445334] R10: 0000000000010000 R11: 0000000000001e4d R12: 00000000cdf2f03f
[14740.445363] R13: 0000000000009000 R14: ffff8803e3c8fbd0 R15: 0000000000010000
[14740.445391] FS:  00007f9e2befc7c0(0000) GS:ffff880427c40000(0000)
knlGS:0000000000000000
[14740.445423] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[14740.445446] CR2: 00007fc533bf7000 CR3: 00000003e29e4000 CR4: 00000000000006e0
[14740.445474] Stack:
[14740.445484]  ffff8803e3c8fab0 ffffffff81084577 ffffffff8112acf0
0000000002011200
[14740.445526]  ffff880410cacc60 ffff880410cacc90 000000001e4e0000
ffff8803ff527300
[14740.445565]  0000000000000000 0000000000001e4e ffff880414e68ee8
ffffffff00000000
[14740.445603] Call Trace:
[14740.445618]  [<ffffffff81084577>] ? __enqueue_entity+0x67/0x70
[14740.445644]  [<ffffffff8112acf0>] ? mempool_alloc_slab+0x10/0x20
[14740.445680]  [<ffffffffa02c48e1>] btrfs_map_bio+0x71/0x320 [btrfs]
[14740.445707]  [<ffffffff8117e630>] ? kmem_cache_alloc+0x190/0x1f0
[14740.445742]  [<ffffffffa0290d4e>] ? btrfs_bio_wq_end_io+0x2e/0x80 [btrfs]
[14740.445780]  [<ffffffffa02e0f91>]
btrfs_submit_compressed_read+0x451/0x4a0 [btrfs]
[14740.445821]  [<ffffffffa029bae0>] btrfs_submit_bio_hook+0x1a0/0x1b0 [btrfs]
[14740.445860]  [<ffffffffa02b9870>] ? btrfs_io_bio_alloc+0x10/0x30 [btrfs]
[14740.445900]  [<ffffffffa02b9d93>] ? btrfs_create_repair_bio+0xc3/0xe0 [btrfs]
[14740.445940]  [<ffffffffa02ba1ff>] end_bio_extent_readpage+0x44f/0x510 [btrfs]
[14740.445981]  [<ffffffffa02b9db0>] ? btrfs_create_repair_bio+0xe0/0xe0 [btrfs]
[14740.446011]  [<ffffffff8125e3ca>] bio_endio+0x3a/0x70
[14740.446042]  [<ffffffffa028f1e7>] end_workqueue_fn+0x37/0x40 [btrfs]
[14740.446080]  [<ffffffffa02c908e>] normal_work_helper+0xae/0x2d0 [btrfs]
[14740.446118]  [<ffffffffa02c931d>] btrfs_endio_helper+0xd/0x10 [btrfs]
[14740.446145]  [<ffffffff8106eb58>] process_one_work+0x148/0x400
[14740.446170]  [<ffffffff8106f126>] worker_thread+0x46/0x430
[14740.446193]  [<ffffffff8106f0e0>] ? rescuer_thread+0x2d0/0x2d0
[14740.446217]  [<ffffffff8106f0e0>] ? rescuer_thread+0x2d0/0x2d0
[14740.446241]  [<ffffffff81073f54>] kthread+0xc4/0xe0
[14740.446262]  [<ffffffff81073e90>] ? kthread_park+0x50/0x50
[14740.446286]  [<ffffffff8155169f>] ret_from_fork+0x3f/0x70
[14740.446309]  [<ffffffff81073e90>] ? kthread_park+0x50/0x50
[14740.446332] Code: 60 ff ff ff 48 63 d3 48 2b 4d c0 48 0f af c1 48
39 c2 48 0f 46 c2 48 89 45 90 89 d9 c7 85 70 ff ff ff 00 0
                  0 00 00 e9 f9 f3 ff ff <0f> 0b bb f4 ff ff ff e9 c7
fa ff ff be 6a 16 00 00 48 c7 c7 18
[14740.446672] RIP  [<ffffffffa02c3ffd>] __btrfs_map_block+0xdfd/0x1140 [btrfs]
[14740.446714]  RSP <ffff8803e3c8faa0>
[14740.456756] ---[ end trace e349a675c6512569 ]---
[14740.456832] BUG: unable to handle kernel paging request at ffffffffffffffd8
[14740.456869] IP: [<ffffffff810746bb>] kthread_data+0xb/0x20
[14740.456896] PGD 1a0a067 PUD 1a0c067 PMD 0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: unable to handle kernel paging request - btrfs
  2016-10-08 21:55             ` Rich Freeman
@ 2016-10-10 12:54               ` Rich Freeman
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Freeman @ 2016-10-10 12:54 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Btrfs BTRFS

Here is another trace, similar to the original issue, but I have a bit
more detail on this one and it is available as text which if nothing
else is more convenient so I'll go ahead and paste this.  I don't
intend to keep pasting these unless I get something that looks
different.

I only posted the initial BUG.

Oct 10 05:11:15 hab nc[1250]: ip_tables ext4 crc16 mbcache jbd2 radeon
nxt200x cx88_dvb cx88_vp3054_i2c videobuf2_dvb dvb_core tuner_simple
tuner_types tuner cx8800 cx8802 videobuf2_dma_sg videobuf2_memops
videobuf2_v4l2 cx88_alsa cx88xx mousedev fbcon videobuf2_core bitblit
dm_region_hash dm_log dm_mod
Oct 10 05:11:15 hab nc[1250]: [81346.935203] CPU: 3 PID: 29648 Comm:
kworker/u16:3 Not tainted 4.4.24 #1
Oct 10 05:11:15 hab nc[1250]: [81346.935317] Hardware name: Gigabyte
Technology Co., Ltd. GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010
Oct 10 05:11:15 hab nc[1250]: [81346.935544] Workqueue: btrfs-endio
btrfs_endio_helper [btrfs]
Oct 10 05:11:15 hab nc[1250]: [81346.935657] task: ffff880415acae00
ti: ffff88019a584000 task.ti: ffff88019a584000
Oct 10 05:11:15 hab nc[1250]: [81346.935783] RIP:
0010:[<ffffffff81296e72>]  [<ffffffff81296e72>] __memcpy+0x12/0x20
Oct 10 05:11:15 hab nc[1250]: [81346.935930] RSP:
0018:ffff88019a587c68  EFLAGS: 00010246
Oct 10 05:11:15 hab nc[1250]: [81346.936023] RAX: ffffc90002ecfff8
RBX: 0000000000001000 RCX: 00000000000001ff
Oct 10 05:11:15 hab nc[1250]: [81346.936142] RDX: 0000000000000000
RSI: ffff88008c950008 RDI: ffffc90002ed0000
Oct 10 05:11:15 hab nc[1250]: [81346.936262] RBP: ffff88019a587d30
R08: 0000000041545345 R09: ffffc90002ece000
Oct 10 05:11:15 hab nc[1250]: [81346.936382] R10: ffffe8ffffcc09e0
R11: 0000000000001000 R12: ffff88008c950000
Oct 10 05:11:15 hab nc[1250]: [81346.936502] R13: 000000004154534d
R14: 0000000000000000 R15: ffff8802b25b2798
Oct 10 05:11:15 hab nc[1250]: [81346.936623] FS:
00007fe90a15d780(0000) GS:ffff880427cc0000(0000)
knlGS:0000000000000000
Oct 10 05:11:15 hab nc[1250]: [81346.936756] CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Oct 10 05:11:16 hab nc[1250]: [81346.937182]  ffff8800102c5720
0000000000000004 0000000041545345 000000004154334d
Oct 10 05:11:16 hab nc[1250]: [81346.937347]  ffffc90002ece000
0000000000001000 0000000000000002 00000000003a0000
Oct 10 05:11:16 hab nc[1250]: [81346.937515] Call Trace:
Oct 10 05:11:16 hab nc[1250]: [81346.937621]  [<ffffffffa02ef741>] ?
lzo_decompress_biovec+0x1d1/0x2c0 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.944148]  [<ffffffffa02f06fc>]
end_compressed_bio_read+0x20c/0x2c0 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.950610]  [<ffffffff8107cb30>] ?
resched_curr+0x60/0xc0
Oct 10 05:11:16 hab nc[1250]: [81346.957055]  [<ffffffff8125e3ca>]
bio_endio+0x3a/0x70
Oct 10 05:11:16 hab nc[1250]: [81346.963516]  [<ffffffffa029f1e7>]
end_workqueue_fn+0x37/0x40 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.970009]  [<ffffffffa02d908e>]
normal_work_helper+0xae/0x2d0 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.976532]  [<ffffffffa02d931d>]
btrfs_endio_helper+0xd/0x10 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.983010]  [<ffffffff8106eb58>]
process_one_work+0x148/0x400
Oct 10 05:11:16 hab nc[1250]: [81346.989509]  [<ffffffff8106f126>]
worker_thread+0x46/0x430
Oct 10 05:11:16 hab nc[1250]: [81346.996013]  [<ffffffff8106f0e0>] ?
rescuer_thread+0x2d0/0x2d0
Oct 10 05:11:16 hab nc[1250]: [81347.034423] Code: ff ff 48 8b 43 60
48 2b 43 50 88 43 4e 5b 5d f3 c3 90 90 90 90 90 90 90 90 00 48 89 f8
48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00
00 48 89 f8 48 89 d1 f3
Oct 10 05:11:16 hab nc[1250]: [81347.041852] RIP  [<ffffffff81296e72>]
__memcpy+0x12/0x20
Oct 10 05:11:16 hab nc[1250]: [81347.048565]  RSP <ffff88019a587c68>
Oct 10 05:11:16 hab nc[1250]: [81347.055218] CR2: ffffc90002ed0000
Oct 10 05:11:16 hab nc[1250]: [81347.104741] ---[ end trace
9a43c0b6d874fe31 ]---
Oct 10 05:11:16 hab nc[1250]: [81347.104752] BUG: unable to handle
kernel paging request at ffffc90002c4a000
Oct 10 05:11:16 hab nc[1250]: [81347.104761] IP: [<ffffffff81296e72>]
__memcpy+0x12/0x20
Oct 10 05:11:16 hab nc[1250]: [81347.104767] PGD 417427067 PUD
417488067 PMD 410881067 PTE 0
Oct 10 05:11:16 hab nc[1250]: [81347.104771] Oops: 0002 [#2] SMP
Oct 10 05:11:16 hab nc[1250]: [81347.104825] Modules linked in:
netconsole configfs tun ipt_MASQUERADE nf_nat_masquerade_ipv4
xt_conntrack veth iptable_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter
ip_tables ext4 crc16 mbcache jbd2 radeon nxt200x cx88_dvb
cx88_vp3054_i2c videobuf2_dvb dvb_core tuner_simple tuner_types tuner
cx8800 cx8802 videobuf2_dma_sg videobuf2_memops videobuf2_v4l2
cx88_alsa cx88xx mousedev fbcon videobuf2_core bitblit softcursor
tveeprom font tileblit drm_kms_helper kvm_amd rc_core kvm v4l2_common
cfbfillrect syscopyarea videodev cfbimgblt sysfillrect
snd_hda_codec_realtek snd_hda_codec_generic irqbypass i2c_algo_bit
sysimgblt fb_sys_fops snd_hda_intel k10temp cfbcopyarea ttm
snd_hda_codec snd_hwdep i2c_piix4 snd_hda_core drm hid_logitech_hidpp
snd_pcm r8169[81347.104954] CR2: ffffc90002c4a000 CR3:
00000000cb2fb000 CR4: 00000000000006e0
Oct 10 05:11:16 hab nc[1250]: [81347.104955] Stack:
Oct 10 05:11:16 hab nc[1250]: [81347.104960]  ffffffffa02ef741
ffff8801a4736280 ffff8800cb1cc200 0000160000000000
Oct 10 05:11:16 hab nc[1250]: [81347.104964]  ffff88011c5eae00
000000000000000e 00000000b93d27eb 00000000b93d07f3
Oct 10 05:11:16 hab nc[1250]: [81347.104968]  ffffc90002c48000
0000000000001000 0000000000000002 0000000000260000
Oct 10 05:11:16 hab nc[1250]: [81347.104969] Call Trace:
Oct 10 05:11:16 hab nc[1250]: [81347.105023]  [<ffffffffa02ef741>] ?
lzo_decompress_biovec+0x1d1/0x2c0 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81347.105073]  [<ffffffffa02f06fc>]
end_compressed_bio_read+0x20c/0x2c0 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81347.105079]  [<ffffffff8125e3ca>]
bio_endio+0x3a/0x70
Oct 10 05:11:16 hab nc[1250]: [81347.105122]  [<ffffffffa029f1e7>]
end_workqueue_fn+0x37/0x40 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81347.105170]  [<ffffffffa02d908e>]
normal_work_helper+0xae/0x2d0 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81347.105218]  [<ffffffffa02d931d>]
btrfs_endio_helper+0xd/0x10 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81347.105223]  [<ffffffff8106eb58>]
process_one_work+0x148/0x400
Oct 10 05:11:16 hab nc[1250]: [81347.105227]  [<ffffffff8106f126>]
worker_thread+0x46/0x430
Oct 10 05:11:16 hab nc[1250]: [81347.105230]  [<ffffffff8106f0e0>] ?
rescuer_thread+0x2d0/0x2d0
Oct 10 05:11:16 hab nc[1250]: [81347.105235]  [<ffffffff81073f54>]
kthread+0xc4/0xe0
Oct 10 05:11:16 hab nc[1250]: [81347.105239]  [<ffffffff81073e90>] ?
kthread_park+0x50/0x50
Oct 10 05:11:16 hab nc[1250]: [81347.105244]  [<ffffffff8155169f>]
ret_from_fork+0x3f/0x70
Oct 10 05:11:17 hab nc[1250]: [81347.105426] Hardware name: Gigabyte
Technology Co., Ltd. GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010
Oct 10 05:11:17 hab nc[1250]: [81347.105495] Workqueue: btrfs-endio
btrfs_endio_helper [btrfs]
Oct 10 05:11:17 hab nc[1250]: [81347.105498] task: ffff88041197b980
ti: ffff88019f5bc000 task.ti: ffff88019f5bc000
Oct 10 05:11:17 hab nc[1250]: [81347.105506] RIP:
0010:[<ffffffff81296e72>]  [<ffffffff81296e72>] __memcpy+0x12/0x20
Oct 10 05:11:17 hab nc[1250]: [81347.105508] RSP:
0018:ffff88019f5bfc68  EFLAGS: 00010246
Oct 10 05:11:17 hab nc[1250]: [81347.105511] RAX: ffffc90002e9cff8
RBX: 0000000000001000 RCX: 00000000000001ff
Oct 10 05:11:17 hab nc[1250]: [81347.105513] RDX: 0000000000000000
RSI: ffff8800a0f0c008 RDI: ffffc90002e9d000
Oct 10 05:11:17 hab nc[1250]: [81347.105515] RBP: ffff88019f5bfd30
R08: 00000000f86f2235 R09: ffffc90002e9b000
Oct 10 05:11:17 hab nc[1250]: [81347.105517] R10: ffffe8ffffc809e0
R11: 0000000000001000 R12: ffff8800a0f0c000
Oct 10 05:11:17 hab nc[1250]: [81347.105519] R13: 00000000f86f223d
R14: 0000000000000000 R15: ffff88034f468f98
Oct 10 05:11:17 hab nc[1250]: [81347.105522] FS:
00007f397f8df7c0(0000) GS:ffff880427c80000(0000)
knlGS:0000000000000000
Oct 10 05:11:17 hab nc[1250]: [81347.105524] CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Oct 10 05:11:17 hab nc[1250]: [81347.105526] CR2: ffffc90002e9d000
CR3: 00000000cb2fb000 CR4: 00000000000006e0
Oct 10 05:11:17 hab nc[1250]: [81347.105527] Stack:
Oct 10 05:11:17 hab nc[1250]: [81347.105532]  ffffffffa02ef741
ffff880417403300 ffff880303e63040 0000160000000000
Oct 10 05:11:17 hab nc[1250]: [81347.105536]  ffff88034970b540
0000000000000008 00000000f86f2235 00000000f86f023d
Oct 10 05:11:17 hab nc[1250]: [81347.105540]  ffffc90002e9b000
0000000000001000 0000000000000002 0000000000380000
Oct 10 05:11:17 hab nc[1250]: [81347.105823]  [<ffffffff81073e90>] ?
kthread_park+0x50/0x50
Oct 10 05:11:17 hab nc[1250]: [81347.105829]  [<ffffffff8155169f>]
ret_from_fork+0x3f/0x70
Oct 10 05:11:17 hab nc[1250]: [81347.105833]  [<ffffffff81073e90>] ?
kthread_park+0x50/0x50
Oct 10 05:11:17 hab nc[1250]: [81347.105873] Code: ff ff 48 8b 43 60
48 2b 43 50 88 43 4e 5b 5d f3 c3 90 90 90 90 90 90 90 90 0f 1f 44 00
00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66
0f 1f 44 00 00 48 89 f8 48 89 d1 f3
Oct 10 05:11:17 hab nc[1250]: [81347.105878] RIP  [<ffffffff81296e72>]
__memcpy+0x12/0x20
Oct 10 05:11:17 hab nc[1250]: [81347.105879]  RSP <ffff88019f5bfc68>
Oct 10 05:11:17 hab nc[1250]: [81347.105880] CR2: ffffc90002e9d000
Oct 10 05:11:17 hab nc[1250]: [81347.105884] ---[ end trace
9a43c0b6d874fe33 ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-10-10 12:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22 12:18 unable to handle kernel paging request - btrfs Rich Freeman
2016-09-22 12:44 ` Holger Hoffstätte
2016-09-22 16:23   ` David Sterba
2016-09-22 16:46 ` Rich Freeman
2016-09-22 17:29   ` Chris Murphy
2016-09-22 17:41 ` Jeff Mahoney
2016-09-30 18:54   ` Rich Freeman
2016-09-30 20:55     ` Jeff Mahoney
2016-09-30 21:07       ` Rich Freeman
2016-10-01  0:38         ` Jeff Mahoney
2016-10-07 14:00           ` Rich Freeman
2016-10-08 21:55             ` Rich Freeman
2016-10-10 12:54               ` Rich Freeman
2016-09-23  4:58 ` Duncan
2016-09-25 13:55   ` Rich Freeman
2016-09-26  0:22     ` Jeff Mahoney
2016-09-26  0:37       ` Rich Freeman
2016-09-26  0:39         ` Jeff Mahoney
2016-09-26  0:42           ` Rich Freeman
2016-09-26  2:21     ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.