All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs problems on new file system
@ 2015-12-25 10:03 covici
  2015-12-25 19:01 ` Henk Slager
  0 siblings, 1 reply; 14+ messages in thread
From: covici @ 2015-12-25 10:03 UTC (permalink / raw)
  To: linux-btrfs

Hi.  I created a file system using 4.3.1 version of btrfsprogs and have
been using it for some three days.  I have gotten the following errors
in the log this morning:
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981

The file system was then made read only.  I unmounted, did a check
without repair which said it was fine, and remounted successfully in
read/write mode, but am I in trouble?  This was on a solid state drive
using lvm.

Thanks in advance for any suggestions.

-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici
         covici@ccs.covici.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-25 10:03 btrfs problems on new file system covici
@ 2015-12-25 19:01 ` Henk Slager
  2015-12-25 21:14   ` covici
  0 siblings, 1 reply; 14+ messages in thread
From: Henk Slager @ 2015-12-25 19:01 UTC (permalink / raw)
  To: linux-btrfs

On Fri, Dec 25, 2015 at 11:03 AM,  <covici@ccs.covici.com> wrote:
> Hi.  I created a file system using 4.3.1 version of btrfsprogs and have
> been using it for some three days.  I have gotten the following errors
> in the log this morning:
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
>
> The file system was then made read only.  I unmounted, did a check
> without repair which said it was fine, and remounted successfully in
> read/write mode, but am I in trouble?  This was on a solid state drive
> using lvm.
What kernel version are you using?
I think you might have some hardware error or glitch somewhere,
otherwise I don't know why you have such errors. These kind of errors
remind me of SATA/cable failures over quite a period of time (multipe
days). Or something with lvm or trim of SSD.
Any unusual with the SSD if you run  smartctl?
A btrfs check will indeed likely result in an OK for this case.
What about running read-only scrub?
Maybe running  memtest86+  can rule-out the worst case.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-25 19:01 ` Henk Slager
@ 2015-12-25 21:14   ` covici
  2015-12-26  3:53     ` Chris Murphy
  2015-12-26  5:20     ` Duncan
  0 siblings, 2 replies; 14+ messages in thread
From: covici @ 2015-12-25 21:14 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

Henk Slager <eye1tm@gmail.com> wrote:

> On Fri, Dec 25, 2015 at 11:03 AM,  <covici@ccs.covici.com> wrote:
> > Hi.  I created a file system using 4.3.1 version of btrfsprogs and have
> > been using it for some three days.  I have gotten the following errors
> > in the log this morning:
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> >
> > The file system was then made read only.  I unmounted, did a check
> > without repair which said it was fine, and remounted successfully in
> > read/write mode, but am I in trouble?  This was on a solid state drive
> > using lvm.
> What kernel version are you using?
> I think you might have some hardware error or glitch somewhere,
> otherwise I don't know why you have such errors. These kind of errors
> remind me of SATA/cable failures over quite a period of time (multipe
> days). Or something with lvm or trim of SSD.
> Any unusual with the SSD if you run  smartctl?
> A btrfs check will indeed likely result in an OK for this case.
> What about running read-only scrub?
> Maybe running  memtest86+  can rule-out the worst case.

I am running 4.1.12-gentoo and btrfs progs 4.3.1.  Same thing happened
on another filesystem, so I switched them over to ext4 and no troubles
since.  As far as I know the ssd drives are fine, I have been using them
for months.  Maybe btrfs needs some more work.  I did do scrubs on the
filesystems after I went offline and remounted them, and they were
successful, and I got no errors from the lower layers at all.  Maybe
I'll try this in a year or so.



-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici
         covici@ccs.covici.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-25 21:14   ` covici
@ 2015-12-26  3:53     ` Chris Murphy
  2015-12-26  7:29       ` covici
  2015-12-26  5:20     ` Duncan
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2015-12-26  3:53 UTC (permalink / raw)
  To: covici; +Cc: Henk Slager, linux-btrfs

If you can post the entire dmesg somewhere that'd be useful. MUAs tend
to wrap that text and make it unreadable on list. I think the problems
with your volume happened before the messages, but it's hard to say.
Also, a generation of nearly 5000 is not that new?

On another thread someone said you probably need to specify the device
to mount when using Btrfs and lvmcache? And the device to specify is
the combined HDD+SSD logical device, for lvmcache that's the "cache
LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount
the origin, it can result in corruption.


Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-25 21:14   ` covici
  2015-12-26  3:53     ` Chris Murphy
@ 2015-12-26  5:20     ` Duncan
  2015-12-26  7:44       ` covici
  1 sibling, 1 reply; 14+ messages in thread
From: Duncan @ 2015-12-26  5:20 UTC (permalink / raw)
  To: linux-btrfs

covici posted on Fri, 25 Dec 2015 16:14:58 -0500 as excerpted:

> Henk Slager <eye1tm@gmail.com> wrote:
> 
>> On Fri, Dec 25, 2015 at 11:03 AM,  <covici@ccs.covici.com> wrote:
>> > Hi.  I created a file system using 4.3.1 version of btrfsprogs and
>> > have been using it for some three days.  I have gotten the following
>> > errors in the log this morning:

>> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
>> > transid verify failed on 51776421888 wanted 4983 found 4981

[Several of these within a second, same block and transids, wanted 4983, 
found 4981.]

>> > The file system was then made read only.  I unmounted, did a check
>> > without repair which said it was fine, and remounted successfully in
>> > read/write mode, but am I in trouble?  This was on a solid state
>> > drive using lvm.
>> What kernel version are you using?
>> I think you might have some hardware error or glitch somewhere,
>> otherwise I don't know why you have such errors. These kind of errors
>> remind me of SATA/cable failures over quite a period of time (multipe
>> days). Or something with lvm or trim of SSD.
>> Any unusual with the SSD if you run  smartctl?
>> A btrfs check will indeed likely result in an OK for this case.
>> What about running read-only scrub?
>> Maybe running  memtest86+  can rule-out the worst case.
> 
> I am running 4.1.12-gentoo and btrfs progs 4.3.1.  Same thing happened
> on another filesystem, so I switched them over to ext4 and no troubles
> since.  As far as I know the ssd drives are fine, I have been using them
> for months.  Maybe btrfs needs some more work.  I did do scrubs on the
> filesystems after I went offline and remounted them, and they were
> successful, and I got no errors from the lower layers at all.  Maybe
> I'll try this in a year or so.

Well, as I seem to say every few posts, btrfs is "still stabilizing, not 
fully stable and mature", so it's a given that more work is needed, tho 
it's demonstrated to be "stable enough" for many in daily use, as long as 
they're generally aware of stability status and are following the admin's 
rule of backups[1] with the increased risk-factor of running "still 
stabilizing" filesystems in mind.

The very close generation/transid numbers, only two commits apart, for 
the exact same block, within the same second, indicate a quite recent 
block-write update failure, possibly only a minute or two old.  You could 
tell how recent by comparing the generation/transid in the superblock 
(using btrfs-show-super) at as close to the same time as possible, seeing 
how far ahead it is.

I'd check smartctl -A for the device(s), then run scrub and check it 
again, to see if the raw number for ID5, Reallocated_Sector_Ct (or 
similar for your device) changed.  (I have some experience with this.[2])

If the raw reallocated sector count goes up, it's obviously the device.  
If it doesn't but scrub fixes an error, then it's likely elsewhere in the 
hardware (cabling, power, memory or storage bus errors, sata/scsi 
controller...).  If scrub detects but can't fix the error the lack of fix 
is probably due to single mode, with the original error due possibly to a 
bad shutdown/umount or a btrfs bug.  If scrub says it's fine, then 
whatever it was was temporary could be due to all sorts of things, from a 
cosmic ray induced memory error, to btrfs bug, to...

In any case, if scrub fixes or doesn't detect an error, I'd not worry 
about it too much, as it doesn't seem to be affecting operation, you 
didn't get a lockup or backtrace, etc.  In fact, I'd take that as 
indication of btrfs normal problem detection and self-healing, likely due 
to being able to pull a valid copy from elsewhere due to raidN or dup 
redundancy or parity.

Tho there's no shame in simply deciding btrfs is still too "stabilizing, 
not fully stable and mature" for you, either.  I know I'd still hesitate 
to use it in a full production environment, unless I had both good/tested 
backups and failover in place.  "Good enough for daily use, provided 
there's backups if you don't consider the data throwaway", is just that; 
it's not really yet good enough for "I just need it to work, reliably, 
because it's big money and people's jobs if it doesn't."

---
[1] Admin's rule of backups:  For any given level of backup, you either 
have it, or by your actions are defining the data to be of less value 
than the hassle and resources taken to do the backup, multiplied by the 
risk factor of actually needing that backup.  As a consequence, after the 
fact protests to the contrary are simply lies, as actions spoke louder 
than words and they defined the time and hassle saved as more valuable, 
so the valuable was saved in any case and in this case the user should be 
happy they saved the more valuable hassle and resources even if the data 
got lost.

And of course with btrfs still stabilizing, that risk factor remains 
somewhat elevated, meaning more levels of backups need to be kept, for 
relatively lower value data.

But AFAIK, you've stated elsewhere that you have backups, so this is more 
for completeness and for other readers than for you, thus its footnoting, 
here.

[2] smartctl -A: ID5, reallocated sectors: 

For some months I ran a bad ssd that was gradually failing sectors and 
reallocating them, in btrfs raid1 mode for both data and metadata, using 
scrub to detect and rewrite the errors from the good copy on the other 
device, forcing device sector reallocation in the process.  I ran it down 
to about 85% spare sectors remaining, 36% being the reported threshold 
value.  (My cooked value dropped from 253, none replaced, to 100, percent 
remaining, with the first replacement, and continued dropping percentage 
from there over time.

Primarily I was just curious to see how both the device and btrfs behaved 
a bit longer term with a failing device, and I took the opportunity 
afforded me by btrfs raid1 and the btrfs data integrity features to find 
out.  At about 85% I decided I had learned about all I was going to learn 
and it wasn't worth the hassle any longer, and replaced the ssd.

My primary takeaway, besides getting rather good at doing scrubs and 
looking at that particular smartctl -A failure mode, was that at least 
with that device, there were a *LOT* more spare sectors than I had 
imagined there'd be.  At 85% I had replaced several MiB worth, at half a 
KiB per sector, 2000 sectors per MiB, and it looked to have 100 to 
perhaps 128 MiB or so of spare sectors, on a 238 GiB ssd.  I'd have 
guessed perhaps 8-16 MiB worth, which I had already used up by the time I 
replaced it at 85% still available, so I didn't actually get to see what 
it did when they ran out, as I had hoped. =:^(  But I was tired of 
dealing with it and wasn't anywhere close to running out of sectors, when 
I gave up on it.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-26  3:53     ` Chris Murphy
@ 2015-12-26  7:29       ` covici
  2015-12-26 10:47         ` Duncan
  0 siblings, 1 reply; 14+ messages in thread
From: covici @ 2015-12-26  7:29 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Henk Slager, linux-btrfs

Chris Murphy <lists@colorremedies.com> wrote:

> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
> to wrap that text and make it unreadable on list. I think the problems
> with your volume happened before the messages, but it's hard to say.
> Also, a generation of nearly 5000 is not that new?

The file system was only a few days old.  It was on an lvm volume group
which consisted of two ssd drives, so I am not sure what you are saying
about lvm cache -- how could I do anything different?


> 
> On another thread someone said you probably need to specify the device
> to mount when using Btrfs and lvmcache? And the device to specify is
> the combined HDD+SSD logical device, for lvmcache that's the "cache
> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount
> the origin, it can result in corruption.

See above.


-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici
         covici@ccs.covici.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-26  5:20     ` Duncan
@ 2015-12-26  7:44       ` covici
  0 siblings, 0 replies; 14+ messages in thread
From: covici @ 2015-12-26  7:44 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

Duncan <1i5t5.duncan@cox.net> wrote:

> covici posted on Fri, 25 Dec 2015 16:14:58 -0500 as excerpted:
> 
> > Henk Slager <eye1tm@gmail.com> wrote:
> > 
> >> On Fri, Dec 25, 2015 at 11:03 AM,  <covici@ccs.covici.com> wrote:
> >> > Hi.  I created a file system using 4.3.1 version of btrfsprogs and
> >> > have been using it for some three days.  I have gotten the following
> >> > errors in the log this morning:
> 
> >> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> >> > transid verify failed on 51776421888 wanted 4983 found 4981
> 
> [Several of these within a second, same block and transids, wanted 4983, 
> found 4981.]
> 
> >> > The file system was then made read only.  I unmounted, did a check
> >> > without repair which said it was fine, and remounted successfully in
> >> > read/write mode, but am I in trouble?  This was on a solid state
> >> > drive using lvm.
> >> What kernel version are you using?
> >> I think you might have some hardware error or glitch somewhere,
> >> otherwise I don't know why you have such errors. These kind of errors
> >> remind me of SATA/cable failures over quite a period of time (multipe
> >> days). Or something with lvm or trim of SSD.
> >> Any unusual with the SSD if you run  smartctl?
> >> A btrfs check will indeed likely result in an OK for this case.
> >> What about running read-only scrub?
> >> Maybe running  memtest86+  can rule-out the worst case.
> > 
> > I am running 4.1.12-gentoo and btrfs progs 4.3.1.  Same thing happened
> > on another filesystem, so I switched them over to ext4 and no troubles
> > since.  As far as I know the ssd drives are fine, I have been using them
> > for months.  Maybe btrfs needs some more work.  I did do scrubs on the
> > filesystems after I went offline and remounted them, and they were
> > successful, and I got no errors from the lower layers at all.  Maybe
> > I'll try this in a year or so.
> 
> Well, as I seem to say every few posts, btrfs is "still stabilizing, not 
> fully stable and mature", so it's a given that more work is needed, tho 
> it's demonstrated to be "stable enough" for many in daily use, as long as 
> they're generally aware of stability status and are following the admin's 
> rule of backups[1] with the increased risk-factor of running "still 
> stabilizing" filesystems in mind.
> 
> The very close generation/transid numbers, only two commits apart, for 
> the exact same block, within the same second, indicate a quite recent 
> block-write update failure, possibly only a minute or two old.  You could 
> tell how recent by comparing the generation/transid in the superblock 
> (using btrfs-show-super) at as close to the same time as possible, seeing 
> how far ahead it is.
> 
> I'd check smartctl -A for the device(s), then run scrub and check it 
> again, to see if the raw number for ID5, Reallocated_Sector_Ct (or 
> similar for your device) changed.  (I have some experience with this.[2])
> 
> If the raw reallocated sector count goes up, it's obviously the device.  
> If it doesn't but scrub fixes an error, then it's likely elsewhere in the 
> hardware (cabling, power, memory or storage bus errors, sata/scsi 
> controller...).  If scrub detects but can't fix the error the lack of fix 
> is probably due to single mode, with the original error due possibly to a 
> bad shutdown/umount or a btrfs bug.  If scrub says it's fine, then 
> whatever it was was temporary could be due to all sorts of things, from a 
> cosmic ray induced memory error, to btrfs bug, to...
> 
> In any case, if scrub fixes or doesn't detect an error, I'd not worry 
> about it too much, as it doesn't seem to be affecting operation, you 
> didn't get a lockup or backtrace, etc.  In fact, I'd take that as 
> indication of btrfs normal problem detection and self-healing, likely due 
> to being able to pull a valid copy from elsewhere due to raidN or dup 
> redundancy or parity.
> 
> Tho there's no shame in simply deciding btrfs is still too "stabilizing, 
> not fully stable and mature" for you, either.  I know I'd still hesitate 
> to use it in a full production environment, unless I had both good/tested 
> backups and failover in place.  "Good enough for daily use, provided 
> there's backups if you don't consider the data throwaway", is just that; 
> it's not really yet good enough for "I just need it to work, reliably, 
> because it's big money and people's jobs if it doesn't."
> 
> ---
> [1] Admin's rule of backups:  For any given level of backup, you either 
> have it, or by your actions are defining the data to be of less value 
> than the hassle and resources taken to do the backup, multiplied by the 
> risk factor of actually needing that backup.  As a consequence, after the 
> fact protests to the contrary are simply lies, as actions spoke louder 
> than words and they defined the time and hassle saved as more valuable, 
> so the valuable was saved in any case and in this case the user should be 
> happy they saved the more valuable hassle and resources even if the data 
> got lost.
> 
> And of course with btrfs still stabilizing, that risk factor remains 
> somewhat elevated, meaning more levels of backups need to be kept, for 
> relatively lower value data.
> 
> But AFAIK, you've stated elsewhere that you have backups, so this is more 
> for completeness and for other readers than for you, thus its footnoting, 
> here.

...
...

The show stopper for me was that the file system was put into read only
mode and even though scrub was fine, which would not run in read only
mode, I had to unmount the fs, and  run the check, which maybe I didn't
really need to do and remount, which for me is not practical to do.  So,
even though I had no actual data loss, I had to say it was not worth it
for the time being.

-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici
         covici@ccs.covici.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-26  7:29       ` covici
@ 2015-12-26 10:47         ` Duncan
  2015-12-26 11:38           ` covici
  0 siblings, 1 reply; 14+ messages in thread
From: Duncan @ 2015-12-26 10:47 UTC (permalink / raw)
  To: linux-btrfs

covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:

> Chris Murphy <lists@colorremedies.com> wrote:
> 
>> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
>> to wrap that text and make it unreadable on list. I think the problems
>> with your volume happened before the messages, but it's hard to say.
>> Also, a generation of nearly 5000 is not that new?
> 
> The file system was only a few days old.  It was on an lvm volume group
> which consisted of two ssd drives, so I am not sure what you are saying
> about lvm cache -- how could I do anything different?
> 
>> On another thread someone said you probably need to specify the device
>> to mount when using Btrfs and lvmcache? And the device to specify is
>> the combined HDD+SSD logical device, for lvmcache that's the "cache
>> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
>> origin, it can result in corruption.
> 
> See above.

I think he mixed up two threads and thought you were running lvm-cache, 
not just regular lvm, which should be good unless you're exposing lvm 
snapshots and thus letting btrfs see multiple supposed UUIDs that aren't 
actually universal.  Since btrfs is multi-device and uses the UUID to 
track which devices belong to it (because they're _supposed_ to be 
universally unique, it's even in the _name_!), if it sees the same UUID 
it'll consider it part of the same filesystem, thus potentially causing 
corruption if it's a snapshot or something that's not actually supposed 
to be part of the (current) filesystem.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-26 10:47         ` Duncan
@ 2015-12-26 11:38           ` covici
  2015-12-26 19:07             ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: covici @ 2015-12-26 11:38 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]

Duncan <1i5t5.duncan@cox.net> wrote:

> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
> 
> > Chris Murphy <lists@colorremedies.com> wrote:
> > 
> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
> >> to wrap that text and make it unreadable on list. I think the problems
> >> with your volume happened before the messages, but it's hard to say.
> >> Also, a generation of nearly 5000 is not that new?
> > 
> > The file system was only a few days old.  It was on an lvm volume group
> > which consisted of two ssd drives, so I am not sure what you are saying
> > about lvm cache -- how could I do anything different?
> > 
> >> On another thread someone said you probably need to specify the device
> >> to mount when using Btrfs and lvmcache? And the device to specify is
> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
> >> origin, it can result in corruption.
> > 
> > See above.
> 
> I think he mixed up two threads and thought you were running lvm-cache, 
> not just regular lvm, which should be good unless you're exposing lvm 
> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't 
> actually universal.  Since btrfs is multi-device and uses the UUID to 
> track which devices belong to it (because they're _supposed_ to be 
> universally unique, it's even in the _name_!), if it sees the same UUID 
> it'll consider it part of the same filesystem, thus potentially causing 
> corruption if it's a snapshot or something that's not actually supposed 
> to be part of the (current) filesystem.

I found a few more log entries, perhaps these may be helpful to track
this down, or maybe prevent the filesystem from going read-only.

[-- Attachment #2: btrfslog.txt --]
[-- Type: text/plain, Size: 4949 bytes --]

------------[ cut here ]------------
Dec 25 03:57:42 ccs.covici.com kernel: WARNING: CPU: 1 PID: 16580 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
Dec 25 03:57:42 ccs.covici.com kernel: BTRFS: Transaction aborted (error -95)
Dec 25 03:57:42 ccs.covici.com kernel: Modules linked in: rfcomm ip6table_nat nf_nat_ipv6 ip6t_REJECT nf_reject_ipv6 ip6table_mangle ip6table_raw nf_conntrack_ipv6 nf_defrag_ipv6 nf_log_ipv6 ip6table_filter ip6_tables sit tunnel4 ip_tunnel vmnet(O) fuse vmw_vsock_vmci_transport vsock vmw_vmci vmmon(O) uinput cmac ecb xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_recent xt_comment ipt_REJECT nf_reject_ipv4 xt_addrtype xt_mark xt_CT xt_multiport xt_NFLOG nfnetlink_log xt_LOG nf_log_ipv4 nf_log_common nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink
Dec 25 03:57:42 ccs.covici.com kernel:  nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpudp xt_conntrack iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_filter ip_tables x_tables bnep ext4 jbd2 gpio_ich snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_pcm_oss snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq snd_mixer_oss btusb joydev btintel btbcm snd_emu10k1 bluetooth intel_rapl rfkill iosf_mbi x86_pkg_temp_thermal crc16 snd_util_mem snd_hwdep coretemp snd_ac97_codec ac97_bus kvm_intel snd_rawmidi snd_seq_device kvm snd_pcm e1000e snd_timer r8169 emu10k1_gp snd ptp gameport microcode i2c_i801 pps_core pcspkr lpc_ich mii acpi_cpufreq 8250_fintek processor
Dec 25 03:57:42 ccs.covici.com kernel:  button sch_fq_codel nvidia(PO) drm agpgart hid_logitech_hidpp dm_snapshot dm_bufio hid_logitech_dj usbhid btrfs xor raid6_pq ata_generic pata_acpi uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel cryptd xhci_pci xhci_hcd ehci_pci ehci_hcd ahci libahci pata_marvell libata usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod ipv6 autofs4
Dec 25 03:57:42 ccs.covici.com kernel: CPU: 1 PID: 16580 Comm: kworker/u16:5 Tainted: P           O    4.1.12-gentoo #1
Dec 25 03:57:42 ccs.covici.com kernel: Hardware name: Supermicro C7P67/C7P67, BIOS 4.6.4 07/01/2011
Dec 25 03:57:42 ccs.covici.com kernel: Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
Dec 25 03:57:42 ccs.covici.com kernel:  0000000000000009 ffff88037ca27c28 ffffffff81458291 0000000080000000
Dec 25 03:57:42 ccs.covici.com kernel:  ffff88037ca27c78 ffff88037ca27c68 ffffffff81045b50 ffff88037ca27c58
Dec 25 03:57:42 ccs.covici.com kernel:  ffffffffa0370008 00000000ffffffa1 ffff880166d8e228 ffffffffa0400aa0
Dec 25 03:57:42 ccs.covici.com kernel: Call Trace:
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff81458291>] dump_stack+0x4f/0x7b
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff81045b50>] warn_slowpath_common+0xa1/0xbb
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffffa0370008>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff81045bb0>] warn_slowpath_fmt+0x46/0x48
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffffa0370008>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffffa03a06cb>] btrfs_finish_ordered_io+0x340/0x457 [btrfs]
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffffa03a09d7>] finish_ordered_fn+0x15/0x17 [btrfs]
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffffa03c0f0a>] normal_work_helper+0xd7/0x2b8 [btrfs]
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffffa03c1338>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff81058cbb>] process_one_work+0x1b3/0x358
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff81059a87>] worker_thread+0x273/0x35b
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff81059814>] ? rescuer_thread+0x283/0x283
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff8105dbc4>] kthread+0xd2/0xda
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff81060000>] ? current_is_async+0x1e/0x3c
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff8105daf2>] ? kthread_create_on_node+0x180/0x180
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff8145da02>] ret_from_fork+0x42/0x70
Dec 25 03:57:42 ccs.covici.com kernel:  [<ffffffff8105daf2>] ? kthread_create_on_node+0x180/0x180
Dec 25 03:57:42 ccs.covici.com kernel: ---[ end trace 666a42c31af28f83 ]---
Dec 25 03:57:42 ccs.covici.com kernel: BTRFS: error (device dm-20) in btrfs_finish_ordered_io:2896: errno=-95 unknown
Dec 25 03:57:42 ccs.covici.com kernel: BTRFS info (device dm-20): forced readonly
Dec 25 03:57:42 ccs.covici.com kernel: pending csums is 1601536

[-- Attachment #3: Type: text/plain, Size: 150 bytes --]


-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici
         covici@ccs.covici.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-26 11:38           ` covici
@ 2015-12-26 19:07             ` Chris Murphy
  2015-12-26 19:22               ` covici
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2015-12-26 19:07 UTC (permalink / raw)
  To: John Covici; +Cc: Duncan, Btrfs BTRFS

On Sat, Dec 26, 2015 at 4:38 AM,  <covici@ccs.covici.com> wrote:
> Duncan <1i5t5.duncan@cox.net> wrote:
>
>> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
>>
>> > Chris Murphy <lists@colorremedies.com> wrote:
>> >
>> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
>> >> to wrap that text and make it unreadable on list. I think the problems
>> >> with your volume happened before the messages, but it's hard to say.
>> >> Also, a generation of nearly 5000 is not that new?
>> >
>> > The file system was only a few days old.  It was on an lvm volume group
>> > which consisted of two ssd drives, so I am not sure what you are saying
>> > about lvm cache -- how could I do anything different?
>> >
>> >> On another thread someone said you probably need to specify the device
>> >> to mount when using Btrfs and lvmcache? And the device to specify is
>> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
>> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
>> >> origin, it can result in corruption.
>> >
>> > See above.
>>
>> I think he mixed up two threads and thought you were running lvm-cache,
>> not just regular lvm, which should be good unless you're exposing lvm
>> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't
>> actually universal.  Since btrfs is multi-device and uses the UUID to
>> track which devices belong to it (because they're _supposed_ to be
>> universally unique, it's even in the _name_!), if it sees the same UUID
>> it'll consider it part of the same filesystem, thus potentially causing
>> corruption if it's a snapshot or something that's not actually supposed
>> to be part of the (current) filesystem.
>
> I found a few more log entries, perhaps these may be helpful to track
> this down, or maybe prevent the filesystem from going read-only.

No, you need to post the entire dmesg. The "cut here" part is maybe
useful for a developer diagnosing Btrfs's response to the problem, but
the problem, or the pre-problem, happened before this.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-26 19:07             ` Chris Murphy
@ 2015-12-26 19:22               ` covici
  2015-12-26 19:50                 ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: covici @ 2015-12-26 19:22 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Duncan, Btrfs BTRFS

Chris Murphy <lists@colorremedies.com> wrote:

> On Sat, Dec 26, 2015 at 4:38 AM,  <covici@ccs.covici.com> wrote:
> > Duncan <1i5t5.duncan@cox.net> wrote:
> >
> >> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
> >>
> >> > Chris Murphy <lists@colorremedies.com> wrote:
> >> >
> >> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
> >> >> to wrap that text and make it unreadable on list. I think the problems
> >> >> with your volume happened before the messages, but it's hard to say.
> >> >> Also, a generation of nearly 5000 is not that new?
> >> >
> >> > The file system was only a few days old.  It was on an lvm volume group
> >> > which consisted of two ssd drives, so I am not sure what you are saying
> >> > about lvm cache -- how could I do anything different?
> >> >
> >> >> On another thread someone said you probably need to specify the device
> >> >> to mount when using Btrfs and lvmcache? And the device to specify is
> >> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
> >> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
> >> >> origin, it can result in corruption.
> >> >
> >> > See above.
> >>
> >> I think he mixed up two threads and thought you were running lvm-cache,
> >> not just regular lvm, which should be good unless you're exposing lvm
> >> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't
> >> actually universal.  Since btrfs is multi-device and uses the UUID to
> >> track which devices belong to it (because they're _supposed_ to be
> >> universally unique, it's even in the _name_!), if it sees the same UUID
> >> it'll consider it part of the same filesystem, thus potentially causing
> >> corruption if it's a snapshot or something that's not actually supposed
> >> to be part of the (current) filesystem.
> >
> > I found a few more log entries, perhaps these may be helpful to track
> > this down, or maybe prevent the filesystem from going read-only.
> 
> No, you need to post the entire dmesg. The "cut here" part is maybe
> useful for a developer diagnosing Btrfs's response to the problem, but
> the problem, or the pre-problem, happened before this.

It would be a 20meg file, if I were to post the whole file.  but I can
tell you, no hardware errors at any time.


-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici
         covici@ccs.covici.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-26 19:22               ` covici
@ 2015-12-26 19:50                 ` Chris Murphy
  2015-12-26 20:02                   ` covici
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2015-12-26 19:50 UTC (permalink / raw)
  To: John Covici; +Cc: Duncan, Btrfs BTRFS

On Sat, Dec 26, 2015 at 12:22 PM,  <covici@ccs.covici.com> wrote:
> Chris Murphy <lists@colorremedies.com> wrote:
>
>> On Sat, Dec 26, 2015 at 4:38 AM,  <covici@ccs.covici.com> wrote:
>> > Duncan <1i5t5.duncan@cox.net> wrote:
>> >
>> >> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
>> >>
>> >> > Chris Murphy <lists@colorremedies.com> wrote:
>> >> >
>> >> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
>> >> >> to wrap that text and make it unreadable on list. I think the problems
>> >> >> with your volume happened before the messages, but it's hard to say.
>> >> >> Also, a generation of nearly 5000 is not that new?
>> >> >
>> >> > The file system was only a few days old.  It was on an lvm volume group
>> >> > which consisted of two ssd drives, so I am not sure what you are saying
>> >> > about lvm cache -- how could I do anything different?
>> >> >
>> >> >> On another thread someone said you probably need to specify the device
>> >> >> to mount when using Btrfs and lvmcache? And the device to specify is
>> >> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
>> >> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
>> >> >> origin, it can result in corruption.
>> >> >
>> >> > See above.
>> >>
>> >> I think he mixed up two threads and thought you were running lvm-cache,
>> >> not just regular lvm, which should be good unless you're exposing lvm
>> >> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't
>> >> actually universal.  Since btrfs is multi-device and uses the UUID to
>> >> track which devices belong to it (because they're _supposed_ to be
>> >> universally unique, it's even in the _name_!), if it sees the same UUID
>> >> it'll consider it part of the same filesystem, thus potentially causing
>> >> corruption if it's a snapshot or something that's not actually supposed
>> >> to be part of the (current) filesystem.
>> >
>> > I found a few more log entries, perhaps these may be helpful to track
>> > this down, or maybe prevent the filesystem from going read-only.
>>
>> No, you need to post the entire dmesg. The "cut here" part is maybe
>> useful for a developer diagnosing Btrfs's response to the problem, but
>> the problem, or the pre-problem, happened before this.
>
> It would be a 20meg file, if I were to post the whole file.  but I can
> tell you, no hardware errors at any time.

The kernel is tainted, looks like a proprietary kernel module, so you
have to have very good familiarity with the workings of that module to
know whether it might affect what's going on, or you'd have to retest
without that kernel module.

Anyway, asking for the whole dmesg isn't arbitrary, it saves times
having to ask for more later. The two things you've provided so far
aren't enough, any number of problems could result in those messages.
So my suggestion is when people ask for something, provide it or don't
provide it, but don't complain about what they're asking for. The
output from btrfs-debug-tree might be several hundred MB. The output
from btrfs-image might be several GB. So if you're not willing to
provide 100kB, let alone 20MB, of kernel messages that might give some
hint what's going on, the resistance itself is off putting. It's like
having to pull your own loose tooth for you, no one really wants to do
that.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-26 19:50                 ` Chris Murphy
@ 2015-12-26 20:02                   ` covici
  2015-12-26 20:33                     ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: covici @ 2015-12-26 20:02 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Duncan, Btrfs BTRFS

Chris Murphy <lists@colorremedies.com> wrote:

> On Sat, Dec 26, 2015 at 12:22 PM,  <covici@ccs.covici.com> wrote:
> > Chris Murphy <lists@colorremedies.com> wrote:
> >
> >> On Sat, Dec 26, 2015 at 4:38 AM,  <covici@ccs.covici.com> wrote:
> >> > Duncan <1i5t5.duncan@cox.net> wrote:
> >> >
> >> >> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
> >> >>
> >> >> > Chris Murphy <lists@colorremedies.com> wrote:
> >> >> >
> >> >> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
> >> >> >> to wrap that text and make it unreadable on list. I think the problems
> >> >> >> with your volume happened before the messages, but it's hard to say.
> >> >> >> Also, a generation of nearly 5000 is not that new?
> >> >> >
> >> >> > The file system was only a few days old.  It was on an lvm volume group
> >> >> > which consisted of two ssd drives, so I am not sure what you are saying
> >> >> > about lvm cache -- how could I do anything different?
> >> >> >
> >> >> >> On another thread someone said you probably need to specify the device
> >> >> >> to mount when using Btrfs and lvmcache? And the device to specify is
> >> >> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
> >> >> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
> >> >> >> origin, it can result in corruption.
> >> >> >
> >> >> > See above.
> >> >>
> >> >> I think he mixed up two threads and thought you were running lvm-cache,
> >> >> not just regular lvm, which should be good unless you're exposing lvm
> >> >> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't
> >> >> actually universal.  Since btrfs is multi-device and uses the UUID to
> >> >> track which devices belong to it (because they're _supposed_ to be
> >> >> universally unique, it's even in the _name_!), if it sees the same UUID
> >> >> it'll consider it part of the same filesystem, thus potentially causing
> >> >> corruption if it's a snapshot or something that's not actually supposed
> >> >> to be part of the (current) filesystem.
> >> >
> >> > I found a few more log entries, perhaps these may be helpful to track
> >> > this down, or maybe prevent the filesystem from going read-only.
> >>
> >> No, you need to post the entire dmesg. The "cut here" part is maybe
> >> useful for a developer diagnosing Btrfs's response to the problem, but
> >> the problem, or the pre-problem, happened before this.
> >
> > It would be a 20meg file, if I were to post the whole file.  but I can
> > tell you, no hardware errors at any time.
> 
> The kernel is tainted, looks like a proprietary kernel module, so you
> have to have very good familiarity with the workings of that module to
> know whether it might affect what's going on, or you'd have to retest
> without that kernel module.
> 
> Anyway, asking for the whole dmesg isn't arbitrary, it saves times
> having to ask for more later. The two things you've provided so far
> aren't enough, any number of problems could result in those messages.
> So my suggestion is when people ask for something, provide it or don't
> provide it, but don't complain about what they're asking for. The
> output from btrfs-debug-tree might be several hundred MB. The output
> from btrfs-image might be several GB. So if you're not willing to
> provide 100kB, let alone 20MB, of kernel messages that might give some
> hint what's going on, the resistance itself is off putting. It's like
> having to pull your own loose tooth for you, no one really wants to do
> that.

How far back do you want to go in terms of the messages?


-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici
         covici@ccs.covici.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: btrfs problems on new file system
  2015-12-26 20:02                   ` covici
@ 2015-12-26 20:33                     ` Chris Murphy
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2015-12-26 20:33 UTC (permalink / raw)
  To: John Covici; +Cc: Chris Murphy, Duncan, Btrfs BTRFS

On Sat, Dec 26, 2015 at 1:02 PM,  <covici@ccs.covici.com> wrote:
> Chris Murphy <lists@colorremedies.com> wrote:
>
>> On Sat, Dec 26, 2015 at 12:22 PM,  <covici@ccs.covici.com> wrote:
>> > Chris Murphy <lists@colorremedies.com> wrote:
>> >
>> >> On Sat, Dec 26, 2015 at 4:38 AM,  <covici@ccs.covici.com> wrote:
>> >> > Duncan <1i5t5.duncan@cox.net> wrote:
>> >> >
>> >> >> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
>> >> >>
>> >> >> > Chris Murphy <lists@colorremedies.com> wrote:
>> >> >> >
>> >> >> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
>> >> >> >> to wrap that text and make it unreadable on list. I think the problems
>> >> >> >> with your volume happened before the messages, but it's hard to say.
>> >> >> >> Also, a generation of nearly 5000 is not that new?
>> >> >> >
>> >> >> > The file system was only a few days old.  It was on an lvm volume group
>> >> >> > which consisted of two ssd drives, so I am not sure what you are saying
>> >> >> > about lvm cache -- how could I do anything different?
>> >> >> >
>> >> >> >> On another thread someone said you probably need to specify the device
>> >> >> >> to mount when using Btrfs and lvmcache? And the device to specify is
>> >> >> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
>> >> >> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
>> >> >> >> origin, it can result in corruption.
>> >> >> >
>> >> >> > See above.
>> >> >>
>> >> >> I think he mixed up two threads and thought you were running lvm-cache,
>> >> >> not just regular lvm, which should be good unless you're exposing lvm
>> >> >> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't
>> >> >> actually universal.  Since btrfs is multi-device and uses the UUID to
>> >> >> track which devices belong to it (because they're _supposed_ to be
>> >> >> universally unique, it's even in the _name_!), if it sees the same UUID
>> >> >> it'll consider it part of the same filesystem, thus potentially causing
>> >> >> corruption if it's a snapshot or something that's not actually supposed
>> >> >> to be part of the (current) filesystem.
>> >> >
>> >> > I found a few more log entries, perhaps these may be helpful to track
>> >> > this down, or maybe prevent the filesystem from going read-only.
>> >>
>> >> No, you need to post the entire dmesg. The "cut here" part is maybe
>> >> useful for a developer diagnosing Btrfs's response to the problem, but
>> >> the problem, or the pre-problem, happened before this.
>> >
>> > It would be a 20meg file, if I were to post the whole file.  but I can
>> > tell you, no hardware errors at any time.
>>
>> The kernel is tainted, looks like a proprietary kernel module, so you
>> have to have very good familiarity with the workings of that module to
>> know whether it might affect what's going on, or you'd have to retest
>> without that kernel module.
>>
>> Anyway, asking for the whole dmesg isn't arbitrary, it saves times
>> having to ask for more later. The two things you've provided so far
>> aren't enough, any number of problems could result in those messages.
>> So my suggestion is when people ask for something, provide it or don't
>> provide it, but don't complain about what they're asking for. The
>> output from btrfs-debug-tree might be several hundred MB. The output
>> from btrfs-image might be several GB. So if you're not willing to
>> provide 100kB, let alone 20MB, of kernel messages that might give some
>> hint what's going on, the resistance itself is off putting. It's like
>> having to pull your own loose tooth for you, no one really wants to do
>> that.
>
> How far back do you want to go in terms of the messages?

The kernel log buffer isn't that big by default which is why I asked
for the entire dmesg, not the entire /var/log/messages file. But if
you can reproduce the problem with a new boot, that'd certainly make
the kernel log shorter and cleaner if that's the concern.

The errno -95 is itself sufficiently rare there's no possible way to
answer your question because I don't know anyone would even know what
they're looking for until they find it. It's even possible it won't be
found by looking at kernel messages.

How was the fs created? Conversion? If mkfs.btrfs, what version of
progs and what options were used to create it?  And what was happening
at the time of the first errno=-95?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-12-26 20:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-25 10:03 btrfs problems on new file system covici
2015-12-25 19:01 ` Henk Slager
2015-12-25 21:14   ` covici
2015-12-26  3:53     ` Chris Murphy
2015-12-26  7:29       ` covici
2015-12-26 10:47         ` Duncan
2015-12-26 11:38           ` covici
2015-12-26 19:07             ` Chris Murphy
2015-12-26 19:22               ` covici
2015-12-26 19:50                 ` Chris Murphy
2015-12-26 20:02                   ` covici
2015-12-26 20:33                     ` Chris Murphy
2015-12-26  5:20     ` Duncan
2015-12-26  7:44       ` covici

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.