All of lore.kernel.org
 help / color / mirror / Atom feed
* Recover btrfs volume which can only be mounded in read-only mode
@ 2015-10-14 14:28 Dmitry Katsubo
  2015-10-14 14:40 ` Anand Jain
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Katsubo @ 2015-10-14 14:28 UTC (permalink / raw)
  To: linux-btrfs

Dear btrfs community,

I am facing several problems regarding to btrfs, and I will be very
thankful if someone can help me with. Also while playing with btrfs I
have few suggestions – would be nice if one can comment on those.

While starting the system, /var (which is btrfs volume) failed to be
mounted. That btrfs volume was created with the following options:

# mkfs.btrfs -d raid1 -m raid1 /dev/sdc2 /dev/sda /dev/sdd1

Here comes what is recorded in systemd journal during the startup:

[    2.931097] BTRFS: device fsid 57b828ee-5984-4f50-89ff-4c9be0fd3084
devid 2 transid 394288 /dev/sda
[    9.810439] BTRFS: device fsid 57b828ee-5984-4f50-89ff-4c9be0fd3084
devid 1 transid 394288 /dev/sdc2
Oct 11 13:00:22 systemd[1]: Job
dev-disk-by\x2duuid-57b828ee\x2d5984\x2d4f50\x2d89ff\x2d4c9be0fd3084.device/start
timed out.
Oct 11 13:00:22 systemd[1]: Timed out waiting for device
dev-disk-by\x2duuid-57b828ee\x2d5984\x2d4f50\x2d89ff\x2d4c9be0fd3084.device.

After the system started on runlevel 1, I attempted to mount the filesystem:

# mount /var
Oct 11 13:53:55 kernel: BTRFS info (device sdc2): disk space caching is enabled
Oct 11 13:53:55 kernel: BTRFS: failed to read chunk tree on sdc2
Oct 11 13:53:55 kernel: BTRFS: open_ctree failed

When I google for "failed to read chunk tree" the feedback was that
something really bad is happening, and it's time to restore the data /
give up with btrfs. In fact, this message is misleading because it
refers /dev/sdc2 which is a mount device in fstab but this is SSD
drive, so it is very unlikely to cause "read" error. Literally I read
the message as "BTRFS: tried to read something from sdc2 and failed".
Maybe it is better to re-phrase the message to "failed to construct
chunk tree on /var (sdc2,sda,sdd1)"?

Next I did a check:

# btrfs check /dev/sdc2
warning devid 3 not found already
checking extents
checking free space cache
Error reading 36818145280, -1
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/sdc2
UUID: 57b828ee-5984-4f50-89ff-4c9be0fd3084
failed to load free space cache for block group 36536582144
found 29602081783 bytes used err is 0
total csum bytes: 57681304
total tree bytes: 1047363584
total fs tree bytes: 843694080
total extent tree bytes: 121159680
btree space waste bytes: 207443742
file data blocks allocated: 77774524416
 referenced 60893913088

The message "devid 3 not found already" does not tell much to me. If I
understand correctly, btrfs does not store the list of devices in the
metadata, but maybe it would be a good idea to save the last seen
information about devices so that I would not need to guess what
"devid 3" means?

Next I tried to list all devices in my btrfs volume. I found this is
not possible (unless volume is mounted). Would be nice if "btrfs
device scan" outputs the detected volumes / devices to stdout (e.g.
with "-v" option) or there is any other way to do that.

Then I have mounted the volume in degraded mode and only after that I
could understand what the error message means:

# mount /var -o degraded
# btrfs device stats /var
btrfs device stats /var
[/dev/sdc2].write_io_errs   0
[/dev/sdc2].read_io_errs    0
[/dev/sdc2].flush_io_errs   0
[/dev/sdc2].corruption_errs 0
[/dev/sdc2].generation_errs 0
[/dev/sda].write_io_errs   0
[/dev/sda].read_io_errs    0
[/dev/sda].flush_io_errs   0
[/dev/sda].corruption_errs 0
[/dev/sda].generation_errs 0
[].write_io_errs   3160958
[].read_io_errs    0
[].flush_io_errs   0
[].corruption_errs 0
[].generation_errs 0

Now I can see that the device with devid 3 is actually /dev/sdd1,
which btrfs found not ready. Is it possible to improve btrfs output
and to list "last seen device" in that output, e.g.

[/dev/sdd1*].write_io_errs   3160958
[/dev/sdd1*].read_io_errs    0
...

where "*" means that device is missing.

I have listed all partitions and /dev/sdd1 was among them. I have also run

# badblocks /dev/sdd

and it found no bad blocks. Why btrfs considers the device "not ready"
– that is a question.

Afterwards I have decided to run scrub:

# btrfs scrub start /var
# btrfs scrub status /var
scrub status for 57b828ee-5984-4f50-89ff-4c9be0fd3084
    scrub started at Sun Oct 11 14:55:45 2015 and was aborted after 1365 seconds
    total bytes scrubbed: 89.52GiB with 0 errors

I have noticed that btrfs always reports "was aborted after X
seconds", even if scrub is still running (I check that X and number of
bytes scrubbed is increasing). That is confusing. After scrub
finished, I have no idea whether it scrubbed everything, or was really
aborted. And if it was aborted, what is the reason? Also it would be
nice if status displays the number of data bytes (without replicas)
scrubbed because the number 89.52GiB includes all replicas (of raid1
in my case):

total bytes scrubbed: 89.52GiB (data 55.03GiB, system 16.00KiB,
metadata 998.83MiB) with 0 errors

Then I can compare this number with "filesystem df" output to answer
the question: was all data successfully scrubbed?

# btrfs filesystem df /var
Data, RAID1: total=70.00GiB, used=55.03GiB
Data, single: total=8.00MiB, used=0.00B
System, RAID1: total=32.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=2.00GiB, used=998.83MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=336.00MiB, used=0.00B

Unfortunately, btrfs version I have (3.17) does not support "device
delete missing" command (it printed help to console), so I just
re-added /dev/sdd1 and started the balancing:

# btrfs device add /dev/sdd1 /var
# btrfs balance start /var
Done, had to relocate 76 out of 76 chunks

I assumed that last command should return the control quickly, but it
tool quite some time it to perform the necessary relocations.
Wishlist: "balance start" is truly asynchronous (initiates balancing
on the background and exits). Also it would be nice if "balance
status" remembers displays the status of last operation.

After that the picture was the following:

# btrfs fi show /var
Label: none  uuid: 57b828ee-5984-4f50-89ff-4c9be0fd3084
    Total devices 4 FS bytes used 55.99GiB
    devid    1 size 52.91GiB used 0.00B path /dev/sdc2
    devid    2 size 232.89GiB used 58.03GiB path /dev/sda
    devid    4 size 111.79GiB used 58.03GiB path /dev/sdd1
    *** Some devices missing

I was surprised to see that balance operation has moved everything
away from /dev/sdc2. That was not clever.

I thought that the problem is solved and rebooted. Unfortunately
/dev/sdd1 was again dropped off from the volume. This time I was not
able to mount in degraded mode, only in read-only mode:

# mount -o degraded /var
Oct 11 18:20:15 kernel: BTRFS: too many missing devices, writeable
mount is not allowed

# mount -o degraded,ro /var
# btrfs device add /dev/sdd1 /var
ERROR: error adding the device '/dev/sdd1' - Read-only file system

Now I am stuck: I cannot add device to the volume to satisfy raid pre-requisite.

Please, advise.

P.S. I know that sdd1 device is failing (the write error counter is
3160958) and needs replacing.

Extra information:
Debian jessie
Linux kernel v3.16.0-4-686-pae
btrfs v3.17-1.1

--
With best regards,
Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-14 14:28 Recover btrfs volume which can only be mounded in read-only mode Dmitry Katsubo
@ 2015-10-14 14:40 ` Anand Jain
  2015-10-14 20:27   ` Dmitry Katsubo
  0 siblings, 1 reply; 12+ messages in thread
From: Anand Jain @ 2015-10-14 14:40 UTC (permalink / raw)
  To: Dmitry Katsubo, linux-btrfs



> # mount -o degraded /var
> Oct 11 18:20:15 kernel: BTRFS: too many missing devices, writeable
> mount is not allowed
>
> # mount -o degraded,ro /var
> # btrfs device add /dev/sdd1 /var
> ERROR: error adding the device '/dev/sdd1' - Read-only file system
>
> Now I am stuck: I cannot add device to the volume to satisfy raid pre-requisite.

  This is a known issue. Would you be able to test below set of patches
  and update us..

    [PATCH 0/5] Btrfs: Per-chunk degradable check

Thanks, Anand

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-14 14:40 ` Anand Jain
@ 2015-10-14 20:27   ` Dmitry Katsubo
  2015-10-15  0:48     ` Duncan
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Katsubo @ 2015-10-14 20:27 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On 14/10/2015 16:40, Anand Jain wrote:
>> # mount -o degraded /var
>> Oct 11 18:20:15 kernel: BTRFS: too many missing devices, writeable
>> mount is not allowed
>>
>> # mount -o degraded,ro /var
>> # btrfs device add /dev/sdd1 /var
>> ERROR: error adding the device '/dev/sdd1' - Read-only file system
>>
>> Now I am stuck: I cannot add device to the volume to satisfy raid
>> pre-requisite.
> 
>  This is a known issue. Would you be able to test below set of patches
>  and update us..
> 
>    [PATCH 0/5] Btrfs: Per-chunk degradable check

Many thanks for the reply. Unfortunately I have no environment to
recompile the kernel, and setting it up will perhaps take a day. Can the
latest kernel be pushed to Debian sid?

1. Is there any way to recover btrfs at the moment? Or the easiest I can
do is to mount ro, copy all data to another drive, re-create btrfs
volume and copy back?

2. How to avoid such a trap in the future?

3. How can I know what version of kernel the patch "Per-chunk degradable
check" is targeting?

4. What is the best way to express/vote for new features or suggestions
(wikipage "Project_ideas" / bugzilla)?

Thanks!

-- 
With best regards,
Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-14 20:27   ` Dmitry Katsubo
@ 2015-10-15  0:48     ` Duncan
  2015-10-15 14:10       ` Dmitry Katsubo
  0 siblings, 1 reply; 12+ messages in thread
From: Duncan @ 2015-10-15  0:48 UTC (permalink / raw)
  To: linux-btrfs

Dmitry Katsubo posted on Wed, 14 Oct 2015 22:27:29 +0200 as excerpted:

> On 14/10/2015 16:40, Anand Jain wrote:
>>> # mount -o degraded /var Oct 11 18:20:15 kernel: BTRFS: too many
>>> missing devices, writeable mount is not allowed
>>>
>>> # mount -o degraded,ro /var # btrfs device add /dev/sdd1 /var ERROR:
>>> error adding the device '/dev/sdd1' - Read-only file system
>>>
>>> Now I am stuck: I cannot add device to the volume to satisfy raid
>>> pre-requisite.
>> 
>>  This is a known issue. Would you be able to test below set of patches
>>  and update us..
>> 
>>    [PATCH 0/5] Btrfs: Per-chunk degradable check
> 
> Many thanks for the reply. Unfortunately I have no environment to
> recompile the kernel, and setting it up will perhaps take a day. Can the
> latest kernel be pushed to Debian sid?

In the way of general information...

While btrfs is no longer entirely unstable (since 3.12 when the 
experimental tag was removed) and kernel patch backports are generally 
done where stability is a factor, it's not yet fully stable and mature, 
either.  As such, an expectation of true stability such that wishing to 
remain on kernels more than one LTS series behind the latest LTS kernel 
series (4.1, with 3.18 the one LTS series back version) can be considered 
incompatible with wishing to run the still under heavy development and 
not yet fully stable and mature btrfs, at least as soon as problems are 
reported.  A request to upgrade to current and/or to try various not yet 
mainline integrated patches is thus to be expected on report of problems.

As for userspace, the division between btrfs kernel and userspace works 
like this:  Under normal operating conditions, userspace simply makes 
requests of the kernel, which does the actual work.  Thus, under normal 
conditions, updated kernel code is most important.  However, once a 
problem occurs and repair/recovery is attempted, it's generally userspace 
code itself directly operating on the unmounted filesystem, so having the 
latest userspace code fixes becomes most important once something has 
gone wrong and you're trying to fix it.

So upgrading to a 3.18 series kernel, at minimum, is very strongly 
recommended for those running btrfs, with an expectation that an upgrade 
to 4.1 should be being planned and tested, for deployment as soon as it's 
passing on-site pre-deployment testing.  And an upgrade to current or 
close to current btrfs-progs 4.2.2 userspace is recommended as soon as 
you need its features, which include the latest patches for repair and 
recovery, so as soon as you have a filesystem that's not working as 
expected, if not before.  (Note that earlier btrfs-progs 4.2 releases, 
before 4.2.2, had a buggy mkfs.btrfs, so they should be skipped if you 
will be doing mkfs.btrfs with them, and any btrfs created with those 
versions should have what's on them backed up if it's not already, and 
the filesystems recreated with 4.2.2, as they'll be unstable and are 
subject to failure.)

> 1. Is there any way to recover btrfs at the moment? Or the easiest I can
> do is to mount ro, copy all data to another drive, re-create btrfs
> volume and copy back?

Sysadmin's rule of backups:  If data isn't backed up, by definition you 
value the data less than the cost of time/hassle/resources to do the 
backup, so loss of a filesystem is never a big problem, because if the 
data was of any value, it was backed up and can be restored from that 
backup, and if it wasn't backed up, then by definition you have already 
saved the more important to you commodity, the hassle/time/resources you 
would have spent doing the backup.  Therefore, loss of a filesystem is 
loss of throw-away data in any case, either because it was backed up (and 
a would-be backup that hasn't been tested restorable isn't yet a 
completed backup, so doesn't count), or because the data really was throw-
away data, not worth the hassle of backing up in the first place, even at 
risk of loss should the un-backed-up data be lost.

No exceptions.  Any after-the-fact protests to the contrary simply put 
the lie to claims that the value was considered valuable, since actions 
spoke louder than words and actions defined the data as throw-away.

Therefore, no worries.  Worst-case, you either recover the data from 
backup, or if it wasn't backed up, by definition, it wasn't valuable data 
in the first place.  Either way, no valuable data was, or can be, lost.

(It's worth noting that this rule nicely takes care of the loss of both 
the working copy and N'th backup case, as well, since again, either it 
was worth the cost of N+1 levels of backup, or that N+1 backup wasn't 
made, which automatically defines the data as not worth the cost of the 
the N+1 backup, at least relative to the risk factor that it might 
actually be needed.  That remains the case, regardless of whether N=0 or 
N=10^1000, since even at N=10^1000, backup to level N+1 is either worth 
the cost vs. risk -- the data really is THAT valuable -- or it's not.)

Thus, the easiest way is very possibly to blow away the filesystem, 
recreate and restore from backup, assuming the data was valuable enough 
to make that backup in the first place.  If it wasn't, then we already 
know the value of the data is relatively limited, and the question 
becomes one of whether the chance of recovery of the already known to be 
very limited value data is worth the hassle cost of trying to do that 
recovery.

FWIW, here, I do have backups, but I don't always keep them as current as 
I might.  By doing so, I know my actions are defining the value of the 
data in the delta between the backups and current status as very limited, 
but that's the choice I'm making.

Fortunately for me, btrfs restore (the actual btrfs restore command), 
working on the unmounted filesystem, can often restore the data from the 
filesystem even if it won't mount, so the risk of actual loss of that 
data is much lower than the risk of not actually being able to mount the 
filesystem, of course letting me get away with delaying backup updates 
even longer, as the risk of total loss of the data in the delta between 
the backup and current is much lower than it would be otherwise, thereby 
making the cost of backup updates relatively higher in comparison, 
meaning I can and do space them further apart.

FWIW I've had to use btrfs restore twice, since I started using btrfs.  
Newer btrfs restore (from newer btrfs-progs) works better than older 
versions, too, letting you optionally restore ownership/permissions and 
symlinks, where previously both were lost, symlinks simply not restored, 
and ownership/permissions the default for the btrfs restore process 
(root, obviously, umask defaults).  See what I mean about current 
userspace being recommended. =:^)

Since in your case you can mount, even if it must be read-only, the same 
logic applies, except that grabbing the data off the filesystem is easier 
since you can simply copy it off and don't need btrfs restore to do it.

Of course the existence of those patches gives you another alternative as 
well, letting you judge the hassle cost of setting up the build 
environment and updating, against that of doing the copy off the read-
only mounted filesystem, against that of simply declaring the filesystem 
a loss and blowing it away, to either restore from backup, or if it 
wasn't backed up, simply losing what is already defined as data of very 
limited value anyway.

> 2. How to avoid such a trap in the future?

Keep current. =:^)  At least to latest LTS kernel and last release of 
last-but-one userspace series (which would be 4.1.2 IIRC as I don't 
remember a 4.1.3 being released).

Or at the bigger picture, ask yourself whether running btrfs is really 
appropriate for you until it further stabilizes, since it's not fully 
stable and mature yet, and running it is thereby incompatible with the 
conservative stability objectives of those who wish to run older tried 
and tested really stable versions.  Perhaps ext4 (or even ext3), or 
reiserfs (my previous filesystem of choice, with which I've had extremely 
good experience) or xfs are more appropriate choices for you, if you 
really need that stability and maturity.

> 3. How can I know what version of kernel the patch "Per-chunk degradable
> check" is targeting?

It may be worth (re)reading the btrfs wiki page on sources.  Generally 
speaking, there's an integration branch, where patches deemed mostly 
ready (after on-list review) are included, before they're accepted into 
the mainline Linus kernel.  Otherwise, patches are generally based on 
mainline, currently 4.3-rcX, unless otherwise noted.  If you follow the 
list, you'll see the pull requests as they are posted, and for the Linus 
kernel, pulls are usually accepted within a day or so, if you're 
following Linus kernel git, as I am.

For userspace, git master branch is always the current release.  There's 
a devel branch that's effectively the same as current integration, except 
that it's no longer updated on the kernel.org mirrors.  The github mirror 
or .cz mirrors (again, as listed on the wiki) have the current devel 
branch, however, and that's what gentoo's "live" ebuild now points at, 
and what I'm running here (after I filed a gentoo bug because the live 
ebuild was pointed at the stale devel branch of the kernel.org kdave 
mirror and thus was no longer updating, that got the live ebuild pointed 
at the current devel branch on the .cz mirrors).

So you can either run current release and cherry-pick patches you want/
need as they are posted to the list, or if you want something live but a 
bit more managed than that, run the integration branches and/or for 
userspace, the devel branch.

> 4. What is the best way to express/vote for new features or suggestions
> (wikipage "Project_ideas" / bugzilla)?

Well, the wiki page is user-editable, if you register.  (Tho last I knew, 
there was some problem with at least some wiki user registrations, 
requiring admin intervention in some cases as posted to the list.)  
Personally, I'm more a list person, however, and have never registered on 
the wiki.

In general, however, there's only a few btrfs devs, and between bug 
tracking and fixing and development of the features they're already 
working on or have already roadmapped as their next project, with each 
feature typically taking a kernel cycle and often several kernel cycles 
to develop and stabilize, they don't so often pick "new" features to work 
on.

There are independent devs that sometimes pick a particular feature 
they're interested in, and submit patches for it, but those features may 
or may not be immediately integrated, depending on maturity of the patch 
set, how it meshes with the existing roadmap, whether the dev intends to 
continue to support that feature or leave it to existing devs to support 
after development, and in general, how well that dev works with existing 
longer-term btrfs devs.  In general, a dev interested in such a project 
should either be prepared to carry and maintain the patches as an 
independent patch set for some time if they're not immediately 
integrated, or should plan on a one-time "proof of concept" patch set 
that will then go stale if it's not integrated, tho it may still be 
better than starting from scratch, should somebody later want to pick up 
the set and update it for integration.

So definitely, I'd say add it to the wiki page, so it doesn't get lost 
and can be picked up when it fits into the roadmap, but be prepared for 
it to sit there, unimplemented, for some years, as there's simply way 
more ideas than resources to implement them, and the most in-demand 
features will obviously be already listed by now.

For more minor suggestions, tweaks to current functionality or output, 
etc, run current so you're suggestions are on top of a current base, and 
either post the suggestions here, or where they fit, add them as comments 
to proposed patches as they are posted.  Of course, if you're a dev and 
can code them up as patches yourself, so much the better! =:^)
(I'm not, FWIW. =:^( )

Many of your suggestions above fit this category, minor improvements to 
current output.  However, in some cases the wording in current is already 
better than what you were running, so your suggestions read as stale, and 
in others, they don't quite read (to me at least, tho I already said I'm 
not a dev) as practical.

In particular, tracking last seen device doesn't appear practical to me, 
since in many instances, device assignment is dynamic, and what was
/dev/sdc3 a couple boots ago may well be /dev/sde3 this time around, in 
which case listing /dev/sdc3 could well only confuse the user even more.

Tho that isn't to say that the suggestions don't have some merit, 
pointing out where some change of wording, if not to your suggested 
wording, might be useful.

In particular, btrfs filesystem show, should work with both mounted and 
unmounted filesystems, and would have perhaps given you some hints about 
what devices should have been in the filesystem.  The assumption seems to 
be implicit that a user will know to run that, now, but perhaps an 
explicit suggestion to run btrfs filesystem show, would be worthwhile.  
The case can of course be argued that such an explicit suggestion isn't 
appropriate for dmesg, as well, but at least to my thinking, it's at 
least practical and could be debated on the merits, where I don't 
consider the tracking of last seen device as practical at all.

Anyway, btrfs filesystem show, should work for unmounted as well as 
mounted filesystems, and is already designed to do what you were 
expecting btrfs device scan to do, in terms of output.  Meanwhile, btrfs 
device scan is designed purely to update the btrfs-kernel-module's idea 
of what btrfs filesystems are available, and as such, it doesn't output 
anything, tho if there was some change that the kernel module didn't know 
about, a btrfs filesystem show, followed by a btrfs device scan and 
another btrfs filesystem show, would produce different results for the 
two show outputs.  (Meanwhile, show's --mounted and --all-devices options 
can change what's listed as well, and if you're interested in just one 
filesystem, you can feed that to show as well, to get output for just it, 
instead of for all btrfs the system knows about.  See the manpage...)

Similarly, your btrfs scrub "was aborted after X seconds" issue is known, 
and I believe fixed in something that's not effectively ancient history, 
in terms of btrfs development.  So remarking on it simply highlights the 
fact that you're running ancient versions and complaining about long 
since fixed issues, instead of running current versions where at least 
your complaints might still have some validity.  And if you were running 
current and still had the problem, well at least I'd know that while I 
remember it being discussed, the fix could not have made it into current 
yet, since the bad output (which I don't recall seeing in older versions 
either, possibly because I run multiple small btrfs on partitioned ssds, 
so the other scrubs completed fast enough I didn't have a chance to see 
the aborted after one completed/aborted but before the others did) would 
then still be reported in current, tho I /think/ it has been fixed since 
it was discussed, but I didn't actually track that individual fix to see 
if it's in current or not, since I never saw the problem in my case 
anyway.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-15  0:48     ` Duncan
@ 2015-10-15 14:10       ` Dmitry Katsubo
  2015-10-15 14:55         ` Hugo Mills
  2015-10-16  8:18         ` Duncan
  0 siblings, 2 replies; 12+ messages in thread
From: Dmitry Katsubo @ 2015-10-15 14:10 UTC (permalink / raw)
  To: linux-btrfs

On 15 October 2015 at 02:48, Duncan <1i5t5.duncan@cox.net> wrote:
> Dmitry Katsubo posted on Wed, 14 Oct 2015 22:27:29 +0200 as excerpted:
>
>> On 14/10/2015 16:40, Anand Jain wrote:
>>>> # mount -o degraded /var Oct 11 18:20:15 kernel: BTRFS: too many
>>>> missing devices, writeable mount is not allowed
>>>>
>>>> # mount -o degraded,ro /var # btrfs device add /dev/sdd1 /var ERROR:
>>>> error adding the device '/dev/sdd1' - Read-only file system
>>>>
>>>> Now I am stuck: I cannot add device to the volume to satisfy raid
>>>> pre-requisite.
>>>
>>>  This is a known issue. Would you be able to test below set of patches
>>>  and update us..
>>>
>>>    [PATCH 0/5] Btrfs: Per-chunk degradable check
>>
>> Many thanks for the reply. Unfortunately I have no environment to
>> recompile the kernel, and setting it up will perhaps take a day. Can the
>> latest kernel be pushed to Debian sid?

Duncan, many thanks for verbose answer. I appreciate a lot.

> In the way of general information...
>
> While btrfs is no longer entirely unstable (since 3.12 when the
> experimental tag was removed) and kernel patch backports are generally
> done where stability is a factor, it's not yet fully stable and mature,
> either.  As such, an expectation of true stability such that wishing to
> remain on kernels more than one LTS series behind the latest LTS kernel
> series (4.1, with 3.18 the one LTS series back version) can be considered
> incompatible with wishing to run the still under heavy development and
> not yet fully stable and mature btrfs, at least as soon as problems are
> reported.  A request to upgrade to current and/or to try various not yet
> mainline integrated patches is thus to be expected on report of problems.
>
> As for userspace, the division between btrfs kernel and userspace works
> like this:  Under normal operating conditions, userspace simply makes
> requests of the kernel, which does the actual work.  Thus, under normal
> conditions, updated kernel code is most important.  However, once a
> problem occurs and repair/recovery is attempted, it's generally userspace
> code itself directly operating on the unmounted filesystem, so having the
> latest userspace code fixes becomes most important once something has
> gone wrong and you're trying to fix it.
>
> So upgrading to a 3.18 series kernel, at minimum, is very strongly
> recommended for those running btrfs, with an expectation that an upgrade
> to 4.1 should be being planned and tested, for deployment as soon as it's
> passing on-site pre-deployment testing.  And an upgrade to current or
> close to current btrfs-progs 4.2.2 userspace is recommended as soon as
> you need its features, which include the latest patches for repair and
> recovery, so as soon as you have a filesystem that's not working as
> expected, if not before.  (Note that earlier btrfs-progs 4.2 releases,
> before 4.2.2, had a buggy mkfs.btrfs, so they should be skipped if you
> will be doing mkfs.btrfs with them, and any btrfs created with those
> versions should have what's on them backed up if it's not already, and
> the filesystems recreated with 4.2.2, as they'll be unstable and are
> subject to failure.)

Thanks for this information. As far as I can see, btrfs-tools v4.1.2
in now in experimental Debian repo (but you anyway suggest at least
4.2.2, which is just 10 days ago released in master git). Kernel image
3.18 is still not there, perhaps because Debian jessie was frozen
before is was released (2014-12-07).

>> 1. Is there any way to recover btrfs at the moment? Or the easiest I can
>> do is to mount ro, copy all data to another drive, re-create btrfs
>> volume and copy back?
>
> Sysadmin's rule of backups:  If data isn't backed up, by definition you
> value the data less than the cost of time/hassle/resources to do the
> backup, so loss of a filesystem is never a big problem, because if the
> data was of any value, it was backed up and can be restored from that
> backup, and if it wasn't backed up, then by definition you have already
> saved the more important to you commodity, the hassle/time/resources you
> would have spent doing the backup.  Therefore, loss of a filesystem is
> loss of throw-away data in any case, either because it was backed up (and
> a would-be backup that hasn't been tested restorable isn't yet a
> completed backup, so doesn't count), or because the data really was throw-
> away data, not worth the hassle of backing up in the first place, even at
> risk of loss should the un-backed-up data be lost.
>
> No exceptions.  Any after-the-fact protests to the contrary simply put
> the lie to claims that the value was considered valuable, since actions
> spoke louder than words and actions defined the data as throw-away.
>
> Therefore, no worries.  Worst-case, you either recover the data from
> backup, or if it wasn't backed up, by definition, it wasn't valuable data
> in the first place.  Either way, no valuable data was, or can be, lost.
>
> (It's worth noting that this rule nicely takes care of the loss of both
> the working copy and N'th backup case, as well, since again, either it
> was worth the cost of N+1 levels of backup, or that N+1 backup wasn't
> made, which automatically defines the data as not worth the cost of the
> the N+1 backup, at least relative to the risk factor that it might
> actually be needed.  That remains the case, regardless of whether N=0 or
> N=10^1000, since even at N=10^1000, backup to level N+1 is either worth
> the cost vs. risk -- the data really is THAT valuable -- or it's not.)
>
> Thus, the easiest way is very possibly to blow away the filesystem,
> recreate and restore from backup, assuming the data was valuable enough
> to make that backup in the first place.  If it wasn't, then we already
> know the value of the data is relatively limited, and the question
> becomes one of whether the chance of recovery of the already known to be
> very limited value data is worth the hassle cost of trying to do that
> recovery.
>
> FWIW, here, I do have backups, but I don't always keep them as current as
> I might.  By doing so, I know my actions are defining the value of the
> data in the delta between the backups and current status as very limited,
> but that's the choice I'm making.
>
> Fortunately for me, btrfs restore (the actual btrfs restore command),
> working on the unmounted filesystem, can often restore the data from the
> filesystem even if it won't mount, so the risk of actual loss of that
> data is much lower than the risk of not actually being able to mount the
> filesystem, of course letting me get away with delaying backup updates
> even longer, as the risk of total loss of the data in the delta between
> the backup and current is much lower than it would be otherwise, thereby
> making the cost of backup updates relatively higher in comparison,
> meaning I can and do space them further apart.
>
> FWIW I've had to use btrfs restore twice, since I started using btrfs.
> Newer btrfs restore (from newer btrfs-progs) works better than older
> versions, too, letting you optionally restore ownership/permissions and
> symlinks, where previously both were lost, symlinks simply not restored,
> and ownership/permissions the default for the btrfs restore process
> (root, obviously, umask defaults).  See what I mean about current
> userspace being recommended. =:^)
>
> Since in your case you can mount, even if it must be read-only, the same
> logic applies, except that grabbing the data off the filesystem is easier
> since you can simply copy it off and don't need btrfs restore to do it.
>
> Of course the existence of those patches gives you another alternative as
> well, letting you judge the hassle cost of setting up the build
> environment and updating, against that of doing the copy off the read-
> only mounted filesystem, against that of simply declaring the filesystem
> a loss and blowing it away, to either restore from backup, or if it
> wasn't backed up, simply losing what is already defined as data of very
> limited value anyway.

Thanks for information concerning restore function. I would certainly
use your advise if I would need to use this function. I am using btrfs
mostly as playground, so I am ready that it can fail (partially the
data is synchronized with the cloud and the rest is not super
important). It is more just a challenge for me, if I can somehow
recover using btrfs only tools, provided that btrfs is designed to be
resistant against the failures.

If I may ask:

Provided that btrfs allowed to mount a volume in read-only mode – does
it mean that add data blocks are present (e.g. it has assured that add
files / directories can be read)?

Do you have any ideas why "btrfs balance" has pulled all data to two
drives (and not balanced between three)?

Does btrfs has the following optimization for mirrored data: if drive
is non-rotational, then prefer reads from it? Or it simply schedules
the read to the drive that performs faster (irrelative to rotational
status)?

>> 2. How to avoid such a trap in the future?
>
> Keep current. =:^)  At least to latest LTS kernel and last release of
> last-but-one userspace series (which would be 4.1.2 IIRC as I don't
> remember a 4.1.3 being released).
>
> Or at the bigger picture, ask yourself whether running btrfs is really
> appropriate for you until it further stabilizes, since it's not fully
> stable and mature yet, and running it is thereby incompatible with the
> conservative stability objectives of those who wish to run older tried
> and tested really stable versions.  Perhaps ext4 (or even ext3), or
> reiserfs (my previous filesystem of choice, with which I've had extremely
> good experience) or xfs are more appropriate choices for you, if you
> really need that stability and maturity.

No, it was particular my decision to use btrfs on various reasons.
First of all, I am using raid1 on all data. Second, I benefit from
transparent compression. Third I need CRC consistency: some of the
drives (like /dev/sdd in my case) seem to fail, also once I have a
buggy DIMM so btrfs helps me not to loose the data "silently". Anyway,
it much better then md-raid.

>> 3. How can I know what version of kernel the patch "Per-chunk degradable
>> check" is targeting?
>
> It may be worth (re)reading the btrfs wiki page on sources.  Generally
> speaking, there's an integration branch, where patches deemed mostly
> ready (after on-list review) are included, before they're accepted into
> the mainline Linus kernel.  Otherwise, patches are generally based on
> mainline, currently 4.3-rcX, unless otherwise noted.  If you follow the
> list, you'll see the pull requests as they are posted, and for the Linus
> kernel, pulls are usually accepted within a day or so, if you're
> following Linus kernel git, as I am.
>
> For userspace, git master branch is always the current release.  There's
> a devel branch that's effectively the same as current integration, except
> that it's no longer updated on the kernel.org mirrors.  The github mirror
> or .cz mirrors (again, as listed on the wiki) have the current devel
> branch, however, and that's what gentoo's "live" ebuild now points at,
> and what I'm running here (after I filed a gentoo bug because the live
> ebuild was pointed at the stale devel branch of the kernel.org kdave
> mirror and thus was no longer updating, that got the live ebuild pointed
> at the current devel branch on the .cz mirrors).
>
> So you can either run current release and cherry-pick patches you want/
> need as they are posted to the list, or if you want something live but a
> bit more managed than that, run the integration branches and/or for
> userspace, the devel branch.
>
>> 4. What is the best way to express/vote for new features or suggestions
>> (wikipage "Project_ideas" / bugzilla)?
>
> Well, the wiki page is user-editable, if you register.  (Tho last I knew,
> there was some problem with at least some wiki user registrations,
> requiring admin intervention in some cases as posted to the list.)
> Personally, I'm more a list person, however, and have never registered on
> the wiki.

I would be happy to add to wiki, but first it' better to check with
maillist, because as you noted below, some of the features / bugs have
already been fixed.

> In general, however, there's only a few btrfs devs, and between bug
> tracking and fixing and development of the features they're already
> working on or have already roadmapped as their next project, with each
> feature typically taking a kernel cycle and often several kernel cycles
> to develop and stabilize, they don't so often pick "new" features to work
> on.
>
> There are independent devs that sometimes pick a particular feature
> they're interested in, and submit patches for it, but those features may
> or may not be immediately integrated, depending on maturity of the patch
> set, how it meshes with the existing roadmap, whether the dev intends to
> continue to support that feature or leave it to existing devs to support
> after development, and in general, how well that dev works with existing
> longer-term btrfs devs.  In general, a dev interested in such a project
> should either be prepared to carry and maintain the patches as an
> independent patch set for some time if they're not immediately
> integrated, or should plan on a one-time "proof of concept" patch set
> that will then go stale if it's not integrated, tho it may still be
> better than starting from scratch, should somebody later want to pick up
> the set and update it for integration.
>
> So definitely, I'd say add it to the wiki page, so it doesn't get lost
> and can be picked up when it fits into the roadmap, but be prepared for
> it to sit there, unimplemented, for some years, as there's simply way
> more ideas than resources to implement them, and the most in-demand
> features will obviously be already listed by now.
>
> For more minor suggestions, tweaks to current functionality or output,
> etc, run current so you're suggestions are on top of a current base, and
> either post the suggestions here, or where they fit, add them as comments
> to proposed patches as they are posted.  Of course, if you're a dev and
> can code them up as patches yourself, so much the better! =:^)
> (I'm not, FWIW. =:^( )
>
> Many of your suggestions above fit this category, minor improvements to
> current output. However, in some cases the wording in current is already
> better than what you were running, so your suggestions read as stale, and
> in others, they don't quite read (to me at least, tho I already said I'm
> not a dev) as practical.
>
> In particular, tracking last seen device doesn't appear practical to me,
> since in many instances, device assignment is dynamic, and what was
> /dev/sdc3 a couple boots ago may well be /dev/sde3 this time around, in
> which case listing /dev/sdc3 could well only confuse the user even more.

Well, in that case btrfs can remember UUID of drives and translate
them to devices (if they are present) or display UUIDs. I think this
will help administrators that manage dozens of btrfs volumes in one
system, each volume consisting of several drives. What is two or more
drives are kicked off? Administrator at least should remember what
devices formed what volumes.

And dynamic assignment is not a problem since udev was introduced (so
one can add extra persistent symlinks):

https://wiki.debian.org/Persistent_disk_names

> Tho that isn't to say that the suggestions don't have some merit,
> pointing out where some change of wording, if not to your suggested
> wording, might be useful.
>
> In particular, btrfs filesystem show, should work with both mounted and
> unmounted filesystems, and would have perhaps given you some hints about
> what devices should have been in the filesystem.  The assumption seems to
> be implicit that a user will know to run that, now, but perhaps an
> explicit suggestion to run btrfs filesystem show, would be worthwhile.
> The case can of course be argued that such an explicit suggestion isn't
> appropriate for dmesg, as well, but at least to my thinking, it's at
> least practical and could be debated on the merits, where I don't
> consider the tracking of last seen device as practical at all.
>
> Anyway, btrfs filesystem show, should work for unmounted as well as
> mounted filesystems, and is already designed to do what you were
> expecting btrfs device scan to do, in terms of output.  Meanwhile, btrfs
> device scan is designed purely to update the btrfs-kernel-module's idea
> of what btrfs filesystems are available, and as such, it doesn't output
> anything, tho if there was some change that the kernel module didn't know
> about, a btrfs filesystem show, followed by a btrfs device scan and
> another btrfs filesystem show, would produce different results for the
> two show outputs.  (Meanwhile, show's --mounted and --all-devices options
> can change what's listed as well, and if you're interested in just one
> filesystem, you can feed that to show as well, to get output for just it,
> instead of for all btrfs the system knows about.  See the manpage...)

If "btrfs device scan" is user-space, then I think doing some output
is better then outputting nothing :) (perhaps with "-v" flag). If it
is kernel-space, then I agree that logging to dmesg is not very
evident (from perspective that user should remember where to look),
but I think has a value.

> Similarly, your btrfs scrub "was aborted after X seconds" issue is known,
> and I believe fixed in something that's not effectively ancient history,
> in terms of btrfs development.  So remarking on it simply highlights the
> fact that you're running ancient versions and complaining about long
> since fixed issues, instead of running current versions where at least
> your complaints might still have some validity.  And if you were running
> current and still had the problem, well at least I'd know that while I
> remember it being discussed, the fix could not have made it into current
> yet, since the bad output (which I don't recall seeing in older versions
> either, possibly because I run multiple small btrfs on partitioned ssds,
> so the other scrubs completed fast enough I didn't have a chance to see
> the aborted after one completed/aborted but before the others did) would
> then still be reported in current, tho I /think/ it has been fixed since
> it was discussed, but I didn't actually track that individual fix to see
> if it's in current or not, since I never saw the problem in my case
> anyway.

Thanks. I have carefully read changelog wiki page and found that:

btrfs-progs 4.2.2:
scrub: report status 'running' until all devices are finished

Idea concerning balance is listed on wiki page "Project ideas":

balance: allow to run it in background (fork) and report status periodically

So you're right: most of the issues are already recorded.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-15 14:10       ` Dmitry Katsubo
@ 2015-10-15 14:55         ` Hugo Mills
  2015-10-16  8:18         ` Duncan
  1 sibling, 0 replies; 12+ messages in thread
From: Hugo Mills @ 2015-10-15 14:55 UTC (permalink / raw)
  To: Dmitry Katsubo; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

On Thu, Oct 15, 2015 at 04:10:13PM +0200, Dmitry Katsubo wrote:
[snip]
> If I may ask:
> 
> Provided that btrfs allowed to mount a volume in read-only mode – does
> it mean that add data blocks are present (e.g. it has assured that add
> files / directories can be read)?
> 
> Do you have any ideas why "btrfs balance" has pulled all data to two
> drives (and not balanced between three)?

   If you're using a non-striped RAID level (single, 1), btrfs will
start by filling up the largest devices first: balance attempts to
make the free space equal across the devices, not to make the used
space equal.

   If you're using a striped RAID level (0, 5, 6), then the FS will
fill up the devices equally, until one is full, and then switch to
using the remaining devices (until one is full, etc).

> Does btrfs has the following optimization for mirrored data: if drive
> is non-rotational, then prefer reads from it? Or it simply schedules
> the read to the drive that performs faster (irrelative to rotational
> status)?

   No, it'll read arbitrarily from the available devices at the moment.

   Hugo.

-- 
Hugo Mills             | People are too unreliable to be replaced by
hugo@... carfax.org.uk | machines.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                              Nathan Spring, Star Cops

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-15 14:10       ` Dmitry Katsubo
  2015-10-15 14:55         ` Hugo Mills
@ 2015-10-16  8:18         ` Duncan
  2015-10-18  9:44           ` Dmitry Katsubo
  1 sibling, 1 reply; 12+ messages in thread
From: Duncan @ 2015-10-16  8:18 UTC (permalink / raw)
  To: linux-btrfs

Dmitry Katsubo posted on Thu, 15 Oct 2015 16:10:13 +0200 as excerpted:

> On 15 October 2015 at 02:48, Duncan <1i5t5.duncan@cox.net> wrote:
> 
>> [snipped] 
> 
> Thanks for this information. As far as I can see, btrfs-tools v4.1.2 in
> now in experimental Debian repo (but you anyway suggest at least 4.2.2,
> which is just 10 days ago released in master git). Kernel image 3.18 is
> still not there, perhaps because Debian jessie was frozen before is was
> released (2014-12-07).

For userspace, as long as it's supporting the features you need at 
runtime (where it generally simply has to know how to make the call to 
the kernel, to do the actual work), and you're not running into anything 
really hairy that you're trying to offline-recover, which is where the 
latest userspace code becomes critical...

Running a userspace series behind, or even more (as long as it's not 
/too/ far), isn't all /that/ critical a problem.

It generally becomes a problem in one of three ways: 1) You have a bad 
filesystem and want the best chance at fixing it, in which case you 
really want the latest code, including the absolute latest fixups for the 
most recently discovered possible problems. 2) You want/need a new 
feature that's simply not supported in your old userspace.  3) The 
userspace gets so old that the output from its diagnostics commands no 
longer easily compares with that of current tools, giving people on-list 
difficulties when trying to compare the output in your posts to the 
output they get.

As a very general rule, at least try to keep the userspace version 
comparable to the kernel version you are running.  Since the userspace 
version numbering syncs to kernelspace version numbering, and userspace 
of a particular version is normally released shortly after the similarly 
numbered kernel series is released, with a couple minor updates before 
the next kernel-series-synced release, keeping userspace to at least the 
kernel space version, means you're at least running the userspace release 
that was made with that kernel series release in mind.

Then, as long as you don't get too far behind on kernel version, you 
should remain at least /somewhat/ current on userspace as well, since 
you'll be upgrading to near the same userspace (at least), when you 
upgrade the kernel.

Using that loose guideline, since you're aiming for the 3.18 stable 
kernel, you should be running at least a 3.18 btrfs-progs as well.

In that context, btrfs-progs 4.1.2 should be fine, as long as you're not 
trying to fix any problems that a newer version fixed.  And, my 
recommendation of the latest 4.2.2 was in the "fixing problems" context, 
in which case, yes, getting your hands on 4.2.2, even if it means 
building from sources to do so, could be critical, depending of course on 
the problem you're trying to fix.  But otherwise, 4.1.2, or even back to 
the last 3.18.whatever release since that's the kernel version you're 
targeting, should be fine.

Just be sure that whenever you do upgrade to later, you avoid the known-
bad-mkfs.btrfs in 4.2.0 and/or 4.2.1 -- be sure if you're doing the btrfs-
progs-4.2 series, that you get 4.2.2 or later.

As for finding a current 3.18 series kernel released for Debian, I'm not 
a Debian user so my my knowledge of the ecosystem around it is limited, 
but I've been very much under the impression that there are various 
optional repos available that you can choose to include and update from 
as well, and I'm quite sure based on previous discussions with others 
that there's a well recognized and fairly commonly enabled repo that 
includes debian kernel updates thru current release, or close to it.

Of course you could also simply run a mainstream Linus kernel and build 
it yourself, and it's not too horribly hard to do either, as there's all 
sorts of places with instructions for doing so out there, and back when I 
switched from MS to freedomware Linux in late 2001, I learned the skill, 
at at least the reasonably basic level of mostly taking a working config 
from my distro's kernel and using it as a basis for my mainstream kernel 
config as well, within about two months of switching.

Tho of course just because you can doesn't mean you want to, and for 
many, finding their distro's experimental/current kernel repos and simply 
installing the packages from it, will be far simpler.

But regardless of the method used, finding or building and keeping 
current with your own copy of at least the lastest couple of LTS 
releases, shouldn't be /horribly/ difficult.  While I've not used them as 
actual package resources in years, I do still know a couple rpm-based 
package resources from my time back on Mandrake (and do still check them 
in contexts like this for others, or to quickly see what files a package 
I don't have installed on gentoo might include, etc), and would point you 
at them if Debian was an rpm-based distro, but of course it's not, so 
they won't do any good.  But I'd guess a google might. =:^)

> If I may ask:
> 
> Provided that btrfs allowed to mount a volume in read-only mode – does
> it mean that add data blocks are present (e.g. it has assured that add
> files / directories can be read)

I'm not /absolutely/ sure I understand your question, here.  But assuming 
it's what I believe it is... here's an answer in typical Duncan fashion, 
answering the question... and rather more! =:^)

In this particular scenario, yes, everything should still be accessible, 
as at least one copy of every raid1 chunk should exist on a still 
detected and included device.  This is because of the balance after the 
loss of the first device, making sure there was two copies of each chunk 
on remaining devices, before loss of the second device.  But because 
btrfs device delete missing didn't work, you couldn't remove that first 
device, even tho you now had two copies of each chunk on existing 
devices.  So when another device dropped, you had two missing devices, 
but because of the balance between, you still had at least one copy of 
all chunks.

The reason it's not letting you mount read-write is that btrfs sees now 
two devices missing on a raid1, the one that you actually replaced but 
couldn't device delete, and the new missing one that it didn't detect 
this time.  To btrfs' rather simple way of thinking about it, that means 
anything with one of the only two raid1 copies on each of the two missing 
devices is now entirely gone, and to avoid making changes that would 
complicate things and prevent return of at least one of those missing 
devices, it won't let you mount writable, even in degraded mode.  It 
doesn't understand that there's actually still at least one copy of 
everything available, as it simply sees the two missing devices and gives 
up without actually checking.

And in the situation where btrfs' fears were correct, where chunks 
existed with each of the two copies on one of the now missing devices, 
no, not everything /would/ be accessible, and btrfs forcing read-only 
mounting is its way of not letting you make the problem even worse, 
forcing you to copy the data you can actually get to off to somewhere 
else, while you can still get to it in read-only mode, at least.  Also, 
of course, forcing the filesystem read-only when there's two devices 
missing, at least in theory preserves a state where a device might be 
able to return, allowing repair of the filesystem, while allowing 
writable could prevent a returning device allowing the healing of the 
filesystem.

So in this particular scenario, yes, all your data should be there, 
intact.  However, a forced read-only mount normally indicates a serious 
issue, and in other scenarios, it could well indicate that some of the 
data is now indeed *NOT* accessible.

Which is where AJ's patch comes in.  That teaches btrfs to actually check 
each chunk.  Once it sees that there's actually at least one copy of each 
chunk available, it'll allow mounting degraded, writable, again, so you 
can fix the problem.

(Tho the more direct scenario that the patch addresses is a bit 
different, loss of one device of a two-device raid1, in which case 
mounting degraded writable will force new chunks to be written in single 
mode, because there's not a second device to write to so writing raid1 is 
no longer possible.  So far, so good.  But then on an unmount and attempt 
to mount again, btrfs sees single mode chunks on a two-device btrfs, and 
knows that single mode normally won't allow a missing device, so forces 
read-only, thus blocking adding a new device and rebalancing all the 
single chunks back to raid1.  But in actuality, the only single mode 
chunks there are the ones written when the second device wasn't 
available, so they HAD to be written to the available device, and it's 
not POSSIBLE for any to be on the missing device.  Again, the patch 
teaches btrfs to actually look at what's there and see that it can 
actually deal with it, thus allowing writable mounting, instead of 
jumping to conclusions and giving up, as soon as it sees a situation 
that /could/, in a different situation, mean entirely missing chunks with 
no available copies on remaining devices.)

Again, these patches are in newer kernel versions, so there (assuming no 
further bugs) they "just work".  On older kernels, however, you either 
have to cherry-pick the patches yourself, or manually avoid or work 
around the problem they fix.  This is why we typically stress new 
versions so much -- they really /do/ fix active bugs and make problems 
/much/ easier to deal with. =:^)

> Do you have any ideas why "btrfs balance" has pulled all data to two
> drives (and not balanced between three)?

Hugo did much better answering that, than I would have initially done, as 
most of my btrfs are raid1 here, but they're all exactly two-device, with 
the two devices exactly the same size, so I'm not used to thinking in 
terms of different sizes and didn't actually notice the situation, thus 
leaving me clueless, until Hugo pointed it out.

But he's right.  Here's my much more detailed way of saying the same 
thing, now that he reminded me of why that would be the deciding factor 
here.

Given that (1) your devices are different sizes, that (2) btrfs raid1 
means exactly two copies, not one per device, and that (3), the btrfs 
chunk-allocator allocates chunks from the device with the most free space 
left, subject to the restriction that both copies of a raid1 chunk can't 
be allocated to the same device...

A rebalance of raid1 chunks would indeed start filling the two biggest 
devices first, until the space available on the smallest of the two 
biggest devices (thus the second largest) was equal to the space 
available on the third largest device, at which point it would continue 
allocating from the largest for one copy (until it too reached equivalent 
space available), while alternating between the others for the second 
copy.

Given that the amount of data you had fit a copy each on the two largest 
devices, before the space available on either one dwindled to that 
available on the third largest device, only the two largest devices 
actually had chunk allocations, leaving the third device, still with less 
space total than the other two each had remaining available, entirely 
empty.

> Does btrfs has the following optimization for mirrored data: if drive is
> non-rotational, then prefer reads from it? Or it simply schedules the
> read to the drive that performs faster (irrelative to rotational
> status)?

Such optimizations have in general not yet been done to btrfs -- not even 
scheduling to the faster drive.  In fact, the lack of such optimizations 
is arguably the biggest "objective" proof that btrfs devs themselves 
don't yet consider btrfs truly stable.

As any good dev knows there's a real danger to "premature optimization", 
with that danger appearing in one or both of two forms: (a) We've now 
severely limited the alternative code paths we can take, because 
implementing things differently will force throwing away all that 
optimization work we did as it won't work with what would otherwise be 
the better alternative, and (b) We're now throwing away all that 
optimization work we did, making it a waste, because the previous 
implementation didn't work, and the new one does, but doesn't work with 
the current optimization code, so that work must now be redone as well.

Thus, good devs tend to leave moderate to complex optimization code until 
they know the implementation is stable and won't be changing out from 
under the optimization.  To do differently is "premature optimization", 
and devs tend to be well aware of the problem, often because of the 
number of times they did it themselves earlier in their career.

It follows that looking at whether devs (assuming you consider them good 
enough to be aware of the dangers of premature optimization, which if 
they're doing the code that runs your filesystem, you better HOPE they're 
at least that good, or you and your data are in serious trouble!) have 
actually /done/ that sort of optimization, ends up being a pretty good 
indicator of whether they consider the code actually stable enough to 
avoid the dangers of premature optimization, or not.

In this case, definitely not, since these sorts of optimizations in 
general remain to be done.

Meanwhile, the present btrfs raid1 read-scheduler is both pretty simple 
to code up and pretty simple to arrange tests for that run either one 
side or the other, but not both, or that are well balanced to both.  
However, it's pretty poor in terms of ensuring optimized real-world 
deployment read-scheduling.

What it does is simply this.  Remember, btrfs raid1 is specifically two 
copies.  It chooses which copy of the two will be read very simply, based 
on the PID making the request.  Odd PIDs get assigned one copy, even PIDs 
the other.  As I said, simple to code, great for ensuring testing of one 
copy or the other or both, but not really optimized at all for real-world 
usage.

If your workload happens to be a bunch of all odd or all even PIDs, well, 
enjoy your testing-grade read-scheduler, bottlenecking everything reading 
one copy, while the other sits entirely idle.

(Of course on fast SSDs with their zero seek-time, which is what I'm 
using for my own btrfs, that's not the issue it'd be on spinning rust.  
I'm still using my former reiserfs standard for spinning rust, which I 
use for backup and media files.  But normal operations are on btrfs on 
ssd, and despite btrfs lack of optimization, on ssd, it's fast /enough/ 
for my usage, and I particularly like the data integrity features of 
btrfs raid1 mode, so...)

> No, it was particular my decision to use btrfs on various reasons.
> First of all, I am using raid1 on all data. Second, I benefit from
> transparent compression. Third I need CRC consistency: some of the
> drives (like /dev/sdd in my case) seem to fail, also once I have a buggy
> DIMM so btrfs helps me not to loose the data "silently". Anyway,
> it much better then md-raid.

The fact that despite it being available, mdraid couldn't be configured 
to runtime-verify integrity using either parity or redundancy, nor 
checksums (which weren't available) was a very strong disappointment for 
me.

To me, the fact that btrfs /does/ do runtime checksumming on write and 
data integrity checking on read, and in raid1/10 mode, will actually 
fallback to the second copy if the first one fails checksum verification, 
is one of its best features, and why I use btrfs raid1 (or on a couple 
single-device btrfs, mixed-bg mode dup). =:^)

That's also why my personally most hotly anticipated features is N-way-
mirroring, with 3-way being my ideal balance, since that will give me a 
fallback to the fallback, if both the first read copy and the first 
fallback copy fail verification.  Four-way would be too much, but I just 
don't quite rest as easy as I otherwise could, because I know that if 
both the primary-read copy and the fallback happen to be bad, same 
logical place at the same time, there's no third copy to fall back on!  
It seems as much of a shame not to have that on btrfs with its data 
integrity, as it did to have mdraid with N-way-mirroring but no runtime 
data integrity.  But at least btrfs does have N-way-mirroring on the 
roadmap, actually for after raid56, which is now done, so N-way-mirroring 
should be coming up rather soon (even if on btrfs, "soon" is relative), 
while AFAIK, mdraid has no plans to implement runtime data integrity 
checking.

> And dynamic assignment is not a problem since udev was introduced (so
> one can add extra persistent symlinks):
> 
> https://wiki.debian.org/Persistent_disk_names

FWIW, I actually use labels as my own form of "human-readable" UUID, 
here.  I came up with the scheme back when I was on reiserfs, with 15-
character label limits, so that's what mine are.  Using this scheme, I 
encode the purpose of the filesystem (root/home/media/whatever), the size 
and brand of the media, the sequence number of the media (since I often 
have more than one of the same brand and size), the machine the media is 
targeted at, the date I did the formatting, and the sequence-number of 
the partition (root-working, root-backup1, root-backup2, etc).

hm0238gcnx+35l0

home, on a 238 gig corsair neutron, #x (the filesystem is multidevice, 
across #0 and #1), targeted at + (the workstation), originally 
partitioned in (201)3, on May (5) 21 (l), working copy (0)

I use GPT partitioning, which takes partition labels (aka names) as 
well.  The two partitions hosting that filesystem are on identically 
partitioned corsair neutrons, 256 GB = 238 GiB.  The gpt labels on those 
two partitions are identical to the above, except one will have a 0 
replacing the x, while the other has a 1, as they are my first and second 
media of that size and brand.

hm0238gcn0+35l0
hm0238gcn1+35l0

The primary backup of home, on a different pair of partitions on the same 
physical devices, is labeled identically, except the partition number is 
one:

hm0238gcnx+35l1

... and its partitions:

hm0238gcn0+35l1
hm0238gcn1+35l1

The secondary backup is on a reiserfs, on spinning rust:

hm0465gsg0+47f0

In that case the partition label and filesystem label are the same, since 
the partition and its filesystem correspond 1:1.  It's home on the 465 
GiB (aka 500 GB) seagate #0, targeted at the workstation, first formatted 
in (201)4, on July 15, first (0) copy there.  (I could make it #3 instead 
of #0, indicating second backup, but didn't, as I know that 0465gsg0+ is 
the media and backups spinning rust device for the workstation.)

Both my internal and USB attached devices have the same labeling scheme, 
media identified by size, brand, media sequence number and what it's 
targetting, partition/filesystem identified by purpose, original 
partition/format date, and partition sequence number.

As I said, it's effectively human-readable GUID, my own scheme for my own 
devices.

And I use LABEL= in fstab as well, running gdisk -l to get a listing of 
partitions with their gpt-labels when I need to associate actual sdN 
mapping to specific partitions (if I don't already have the mapping from 
mount or whatever).

Which makes it nice when btrfs fi show outputs filesystem label as well. 
=:^)

The actual GUID is simply machine-readable but not necessary for the 
human to deal with "noise", to me, as the label (of either the gpt 
partition or the filesystem it hosts) gives me *FAR* more and more useful 
information, while being entirely unique within my ID system.

> If "btrfs device scan" is user-space, then I think doing some output is
> better then outputting nothing :) (perhaps with "-v" flag). If it is
> kernel-space, then I agree that logging to dmesg is not very evident
> (from perspective that user should remember where to look),
> but I think has a value.

Well, btrfs is a userspace tool, but in this case, btrfs device scan's 
use is purely to make a particular kernel call, which triggers the btrfs 
module to do a device rescan to update its own records, *not* for human 
consumption.  -v to force output could work if it had been designed that 
way, but getting that output is precisely what btrfs filesystem show is 
for, printing for both mounted and unmounted filesystems unless told 
otherwise.

Put it this way.  If neither your initr* nor some service started before 
whatever mounts local filesystems doesn't do a btrfs device scan, then 
attempting to mount a multi-device btrfs will fail, unless all its 
component devices have been fed in using device= options.  Why?  Because 
mount takes exactly one device to mount.  With traditional filesystems, 
that's enough, since they only consist of a single device.  And with 
single-device btrfs, it's enough as well.  But with a multi-device btrfs, 
something has to supply the other devices to btrfs, along with the one 
that mount tells it about.  It is possible to list all those component 
devices in device= options, but those take /dev/sd* style device nodes, 
and those may change from boot to boot, so that's not very reliable.  
Which is where btrfs device scan comes in.  It tells the btrfs module to 
do a general scan and map out internally which devices belong to which 
filesystems, after which a mount supplying just one of them can work, 
since this internal map, the generation or refresh of which is triggered 
by btrfs device scan, supplies the others.

IOW, btrfs device scan needs no output, because all the userspace command 
does is call a kernel function, which triggers the mapping internal to 
the btrfs kernel module, so it can then handle mounts with just one of 
the possibly many devices handed to it from mount.

Outputting that mapping is an entirely different function, with the 
userspace side of that being btrfs filesystem show, which calls a kernel 
function that generates output back to the btrfs userspace app, which 
then further formats it for output back to the user.

> Thanks. I have carefully read changelog wiki page and found that:
> 
> btrfs-progs 4.2.2:
> scrub: report status 'running' until all devices are finished

Thanks.  As I said, I had seen the patch on the list, and /thought/ it 
was now in, but had lost track of specifically when it went in, or 
indeed, /whether/ it had gone in.

Now I know it's in 4.2.2, without having to actually go look it up in the 
git log again, myself.

> Idea concerning balance is listed on wiki page "Project ideas":
> 
> balance: allow to run it in background (fork) and report status
> periodically

FWIW, it sort of does that today, except that the btrfs bal start doesn't 
actually return to the command prompt.  But again, what it actually does 
is call a kernel function to initiate the balance, and then it's simply 
waiting.  On my relatively small btrfs on partitioned ssd, the return is 
often within a minute or two anyway, but on multi-TB spinning rust...

In any case, once the kernel function has triggered the balance, ctrl-C 
should I believe terminate the userspace side and get you back to the 
prompt, without terminating the balance as that continues on in kernel 
space.

But it would still be useful to have balance start actually return 
quickly, instead of having to ctrl-C it.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-16  8:18         ` Duncan
@ 2015-10-18  9:44           ` Dmitry Katsubo
  2015-10-26  7:09             ` Duncan
  2015-10-26  9:14             ` Duncan
  0 siblings, 2 replies; 12+ messages in thread
From: Dmitry Katsubo @ 2015-10-18  9:44 UTC (permalink / raw)
  To: linux-btrfs

On 16/10/2015 10:18, Duncan wrote:
> Dmitry Katsubo posted on Thu, 15 Oct 2015 16:10:13 +0200 as excerpted:
> 
>> On 15 October 2015 at 02:48, Duncan <1i5t5.duncan@cox.net> wrote:
>>
>>> [snipped] 
>>
>> Thanks for this information. As far as I can see, btrfs-tools v4.1.2 in
>> now in experimental Debian repo (but you anyway suggest at least 4.2.2,
>> which is just 10 days ago released in master git). Kernel image 3.18 is
>> still not there, perhaps because Debian jessie was frozen before is was
>> released (2014-12-07).
> 
> For userspace, as long as it's supporting the features you need at 
> runtime (where it generally simply has to know how to make the call to 
> the kernel, to do the actual work), and you're not running into anything 
> really hairy that you're trying to offline-recover, which is where the 
> latest userspace code becomes critical...
> 
> Running a userspace series behind, or even more (as long as it's not 
> /too/ far), isn't all /that/ critical a problem.
> 
> It generally becomes a problem in one of three ways: 1) You have a bad 
> filesystem and want the best chance at fixing it, in which case you 
> really want the latest code, including the absolute latest fixups for the 
> most recently discovered possible problems. 2) You want/need a new 
> feature that's simply not supported in your old userspace.  3) The 
> userspace gets so old that the output from its diagnostics commands no 
> longer easily compares with that of current tools, giving people on-list 
> difficulties when trying to compare the output in your posts to the 
> output they get.
> 
> As a very general rule, at least try to keep the userspace version 
> comparable to the kernel version you are running.  Since the userspace 
> version numbering syncs to kernelspace version numbering, and userspace 
> of a particular version is normally released shortly after the similarly 
> numbered kernel series is released, with a couple minor updates before 
> the next kernel-series-synced release, keeping userspace to at least the 
> kernel space version, means you're at least running the userspace release 
> that was made with that kernel series release in mind.
> 
> Then, as long as you don't get too far behind on kernel version, you 
> should remain at least /somewhat/ current on userspace as well, since 
> you'll be upgrading to near the same userspace (at least), when you 
> upgrade the kernel.
> 
> Using that loose guideline, since you're aiming for the 3.18 stable 
> kernel, you should be running at least a 3.18 btrfs-progs as well.
> 
> In that context, btrfs-progs 4.1.2 should be fine, as long as you're not 
> trying to fix any problems that a newer version fixed.  And, my 
> recommendation of the latest 4.2.2 was in the "fixing problems" context, 
> in which case, yes, getting your hands on 4.2.2, even if it means 
> building from sources to do so, could be critical, depending of course on 
> the problem you're trying to fix.  But otherwise, 4.1.2, or even back to 
> the last 3.18.whatever release since that's the kernel version you're 
> targeting, should be fine.
> 
> Just be sure that whenever you do upgrade to later, you avoid the known-
> bad-mkfs.btrfs in 4.2.0 and/or 4.2.1 -- be sure if you're doing the btrfs-
> progs-4.2 series, that you get 4.2.2 or later.
> 
> As for finding a current 3.18 series kernel released for Debian, I'm not 
> a Debian user so my my knowledge of the ecosystem around it is limited, 
> but I've been very much under the impression that there are various 
> optional repos available that you can choose to include and update from 
> as well, and I'm quite sure based on previous discussions with others 
> that there's a well recognized and fairly commonly enabled repo that 
> includes debian kernel updates thru current release, or close to it.
> 
> Of course you could also simply run a mainstream Linus kernel and build 
> it yourself, and it's not too horribly hard to do either, as there's all 
> sorts of places with instructions for doing so out there, and back when I 
> switched from MS to freedomware Linux in late 2001, I learned the skill, 
> at at least the reasonably basic level of mostly taking a working config 
> from my distro's kernel and using it as a basis for my mainstream kernel 
> config as well, within about two months of switching.
> 
> Tho of course just because you can doesn't mean you want to, and for 
> many, finding their distro's experimental/current kernel repos and simply 
> installing the packages from it, will be far simpler.
> 
> But regardless of the method used, finding or building and keeping 
> current with your own copy of at least the lastest couple of LTS 
> releases, shouldn't be /horribly/ difficult.  While I've not used them as 
> actual package resources in years, I do still know a couple rpm-based 
> package resources from my time back on Mandrake (and do still check them 
> in contexts like this for others, or to quickly see what files a package 
> I don't have installed on gentoo might include, etc), and would point you 
> at them if Debian was an rpm-based distro, but of course it's not, so 
> they won't do any good.  But I'd guess a google might. =:^)

Thanks, Duncan. The information you give is of the greatest value for
me. Finally I have decided not to play with the fate and copy the data
off, re-create btrfs and copy it back. That is anyway a good exercise.

>> If I may ask:
>>
>> Provided that btrfs allowed to mount a volume in read-only mode – does
>> it mean that add data blocks are present (e.g. it has assured that add
>> files / directories can be read)
> 
> I'm not /absolutely/ sure I understand your question, here.  But assuming 
> it's what I believe it is... here's an answer in typical Duncan fashion, 
> answering the question... and rather more! =:^)
> 
> In this particular scenario, yes, everything should still be accessible, 
> as at least one copy of every raid1 chunk should exist on a still 
> detected and included device.  This is because of the balance after the 
> loss of the first device, making sure there was two copies of each chunk 
> on remaining devices, before loss of the second device.  But because 
> btrfs device delete missing didn't work, you couldn't remove that first 
> device, even tho you now had two copies of each chunk on existing 
> devices.  So when another device dropped, you had two missing devices, 
> but because of the balance between, you still had at least one copy of 
> all chunks.
> 
> The reason it's not letting you mount read-write is that btrfs sees now 
> two devices missing on a raid1, the one that you actually replaced but 
> couldn't device delete, and the new missing one that it didn't detect 
> this time.  To btrfs' rather simple way of thinking about it, that means 
> anything with one of the only two raid1 copies on each of the two missing 
> devices is now entirely gone, and to avoid making changes that would 
> complicate things and prevent return of at least one of those missing 
> devices, it won't let you mount writable, even in degraded mode.  It 
> doesn't understand that there's actually still at least one copy of 
> everything available, as it simply sees the two missing devices and gives 
> up without actually checking.
> 
> And in the situation where btrfs' fears were correct, where chunks 
> existed with each of the two copies on one of the now missing devices, 
> no, not everything /would/ be accessible, and btrfs forcing read-only 
> mounting is its way of not letting you make the problem even worse, 
> forcing you to copy the data you can actually get to off to somewhere 
> else, while you can still get to it in read-only mode, at least.  Also, 
> of course, forcing the filesystem read-only when there's two devices 
> missing, at least in theory preserves a state where a device might be 
> able to return, allowing repair of the filesystem, while allowing 
> writable could prevent a returning device allowing the healing of the 
> filesystem.
> 
> So in this particular scenario, yes, all your data should be there, 
> intact.  However, a forced read-only mount normally indicates a serious 
> issue, and in other scenarios, it could well indicate that some of the 
> data is now indeed *NOT* accessible.
> 
> Which is where AJ's patch comes in.  That teaches btrfs to actually check 
> each chunk.  Once it sees that there's actually at least one copy of each 
> chunk available, it'll allow mounting degraded, writable, again, so you 
> can fix the problem.
> 
> (Tho the more direct scenario that the patch addresses is a bit 
> different, loss of one device of a two-device raid1, in which case 
> mounting degraded writable will force new chunks to be written in single 
> mode, because there's not a second device to write to so writing raid1 is 
> no longer possible.  So far, so good.  But then on an unmount and attempt 
> to mount again, btrfs sees single mode chunks on a two-device btrfs, and 
> knows that single mode normally won't allow a missing device, so forces 
> read-only, thus blocking adding a new device and rebalancing all the 
> single chunks back to raid1.  But in actuality, the only single mode 
> chunks there are the ones written when the second device wasn't 
> available, so they HAD to be written to the available device, and it's 
> not POSSIBLE for any to be on the missing device.  Again, the patch 
> teaches btrfs to actually look at what's there and see that it can 
> actually deal with it, thus allowing writable mounting, instead of 
> jumping to conclusions and giving up, as soon as it sees a situation 
> that /could/, in a different situation, mean entirely missing chunks with 
> no available copies on remaining devices.)
> 
> Again, these patches are in newer kernel versions, so there (assuming no 
> further bugs) they "just work".  On older kernels, however, you either 
> have to cherry-pick the patches yourself, or manually avoid or work 
> around the problem they fix.  This is why we typically stress new 
> versions so much -- they really /do/ fix active bugs and make problems 
> /much/ easier to deal with. =:^)

Thanks for explanation. You understood the question correctly, basically
I wondered if btrfs checks that all data can be read before allowing
read-only mount. In my case I was luck and I just copied the date from
mounted volume to another place and then copied it back.

>> Do you have any ideas why "btrfs balance" has pulled all data to two
>> drives (and not balanced between three)?
> 
> Hugo did much better answering that, than I would have initially done, as 
> most of my btrfs are raid1 here, but they're all exactly two-device, with 
> the two devices exactly the same size, so I'm not used to thinking in 
> terms of different sizes and didn't actually notice the situation, thus 
> leaving me clueless, until Hugo pointed it out.
> 
> But he's right.  Here's my much more detailed way of saying the same 
> thing, now that he reminded me of why that would be the deciding factor 
> here.
> 
> Given that (1) your devices are different sizes, that (2) btrfs raid1 
> means exactly two copies, not one per device, and that (3), the btrfs 
> chunk-allocator allocates chunks from the device with the most free space 
> left, subject to the restriction that both copies of a raid1 chunk can't 
> be allocated to the same device...
> 
> A rebalance of raid1 chunks would indeed start filling the two biggest 
> devices first, until the space available on the smallest of the two 
> biggest devices (thus the second largest) was equal to the space 
> available on the third largest device, at which point it would continue 
> allocating from the largest for one copy (until it too reached equivalent 
> space available), while alternating between the others for the second 
> copy.
> 
> Given that the amount of data you had fit a copy each on the two largest 
> devices, before the space available on either one dwindled to that 
> available on the third largest device, only the two largest devices 
> actually had chunk allocations, leaving the third device, still with less 
> space total than the other two each had remaining available, entirely 
> empty.

I think the mentioned strategy (fill in the device with most free space)
is not most effective. If the data is spread equally, the read
performance would be higher (reading from 3 disks instead of 2). In my
case this is even crucial, because the smallest drive is SSD (and it is
not loaded at all).

Maybe I don't see the benefit from the strategy which is currently
implemented (besides that it is robust and well-tested)?

>> Does btrfs has the following optimization for mirrored data: if drive is
>> non-rotational, then prefer reads from it? Or it simply schedules the
>> read to the drive that performs faster (irrelative to rotational
>> status)?
> 
> Such optimizations have in general not yet been done to btrfs -- not even 
> scheduling to the faster drive.  In fact, the lack of such optimizations 
> is arguably the biggest "objective" proof that btrfs devs themselves 
> don't yet consider btrfs truly stable.
> 
> As any good dev knows there's a real danger to "premature optimization", 
> with that danger appearing in one or both of two forms: (a) We've now 
> severely limited the alternative code paths we can take, because 
> implementing things differently will force throwing away all that 
> optimization work we did as it won't work with what would otherwise be 
> the better alternative, and (b) We're now throwing away all that 
> optimization work we did, making it a waste, because the previous 
> implementation didn't work, and the new one does, but doesn't work with 
> the current optimization code, so that work must now be redone as well.
> 
> Thus, good devs tend to leave moderate to complex optimization code until 
> they know the implementation is stable and won't be changing out from 
> under the optimization.  To do differently is "premature optimization", 
> and devs tend to be well aware of the problem, often because of the 
> number of times they did it themselves earlier in their career.
> 
> It follows that looking at whether devs (assuming you consider them good 
> enough to be aware of the dangers of premature optimization, which if 
> they're doing the code that runs your filesystem, you better HOPE they're 
> at least that good, or you and your data are in serious trouble!) have 
> actually /done/ that sort of optimization, ends up being a pretty good 
> indicator of whether they consider the code actually stable enough to 
> avoid the dangers of premature optimization, or not.
> 
> In this case, definitely not, since these sorts of optimizations in 
> general remain to be done.
> 
> Meanwhile, the present btrfs raid1 read-scheduler is both pretty simple 
> to code up and pretty simple to arrange tests for that run either one 
> side or the other, but not both, or that are well balanced to both.  
> However, it's pretty poor in terms of ensuring optimized real-world 
> deployment read-scheduling.
> 
> What it does is simply this.  Remember, btrfs raid1 is specifically two 
> copies.  It chooses which copy of the two will be read very simply, based 
> on the PID making the request.  Odd PIDs get assigned one copy, even PIDs 
> the other.  As I said, simple to code, great for ensuring testing of one 
> copy or the other or both, but not really optimized at all for real-world 
> usage.
> 
> If your workload happens to be a bunch of all odd or all even PIDs, well, 
> enjoy your testing-grade read-scheduler, bottlenecking everything reading 
> one copy, while the other sits entirely idle.
> 
> (Of course on fast SSDs with their zero seek-time, which is what I'm 
> using for my own btrfs, that's not the issue it'd be on spinning rust.  
> I'm still using my former reiserfs standard for spinning rust, which I 
> use for backup and media files.  But normal operations are on btrfs on 
> ssd, and despite btrfs lack of optimization, on ssd, it's fast /enough/ 
> for my usage, and I particularly like the data integrity features of 
> btrfs raid1 mode, so...)

I think PID-based solution is not the best one. Why not simply take a
random device? Then at least all drives in the volume are equally loaded
(in average).

>From what you said I believe that certain servers will not benefit from
btrfs, e.g. dedicated server that runs only one "fat" Java process, or
one "huge" MySQL database.

In general I think that btrfs should not check for rotational flag, as
even SATA-III is two times faster than SATA-II. So ideal scheduler
should assign read requests to the drive that simply copes with reads
faster :) If SSD drive can read 10 blocks while normal HDD reads only
one during the same time - let it do it.

Maybe my case is a corner one, as I am mixing "fast" and "slow" drives
in one volume, more over, faster drive is the smallest. If I would have
the drives of the same performance - the strategy I suggest would not
matter.

>> No, it was particular my decision to use btrfs on various reasons.
>> First of all, I am using raid1 on all data. Second, I benefit from
>> transparent compression. Third I need CRC consistency: some of the
>> drives (like /dev/sdd in my case) seem to fail, also once I have a buggy
>> DIMM so btrfs helps me not to loose the data "silently". Anyway,
>> it much better then md-raid.
> 
> The fact that despite it being available, mdraid couldn't be configured 
> to runtime-verify integrity using either parity or redundancy, nor 
> checksums (which weren't available) was a very strong disappointment for 
> me.
> 
> To me, the fact that btrfs /does/ do runtime checksumming on write and 
> data integrity checking on read, and in raid1/10 mode, will actually 
> fallback to the second copy if the first one fails checksum verification, 
> is one of its best features, and why I use btrfs raid1 (or on a couple 
> single-device btrfs, mixed-bg mode dup). =:^)
> 
> That's also why my personally most hotly anticipated features is N-way-
> mirroring, with 3-way being my ideal balance, since that will give me a 
> fallback to the fallback, if both the first read copy and the first 
> fallback copy fail verification.  Four-way would be too much, but I just 
> don't quite rest as easy as I otherwise could, because I know that if 
> both the primary-read copy and the fallback happen to be bad, same 
> logical place at the same time, there's no third copy to fall back on!  
> It seems as much of a shame not to have that on btrfs with its data 
> integrity, as it did to have mdraid with N-way-mirroring but no runtime 
> data integrity.  But at least btrfs does have N-way-mirroring on the 
> roadmap, actually for after raid56, which is now done, so N-way-mirroring 
> should be coming up rather soon (even if on btrfs, "soon" is relative), 
> while AFAIK, mdraid has no plans to implement runtime data integrity 
> checking.
> 
>> And dynamic assignment is not a problem since udev was introduced (so
>> one can add extra persistent symlinks):
>>
>> https://wiki.debian.org/Persistent_disk_names
> 
> FWIW, I actually use labels as my own form of "human-readable" UUID, 
> here.  I came up with the scheme back when I was on reiserfs, with 15-
> character label limits, so that's what mine are.  Using this scheme, I 
> encode the purpose of the filesystem (root/home/media/whatever), the size 
> and brand of the media, the sequence number of the media (since I often 
> have more than one of the same brand and size), the machine the media is 
> targeted at, the date I did the formatting, and the sequence-number of 
> the partition (root-working, root-backup1, root-backup2, etc).
> 
> hm0238gcnx+35l0
> 
> home, on a 238 gig corsair neutron, #x (the filesystem is multidevice, 
> across #0 and #1), targeted at + (the workstation), originally 
> partitioned in (201)3, on May (5) 21 (l), working copy (0)
> 
> I use GPT partitioning, which takes partition labels (aka names) as 
> well.  The two partitions hosting that filesystem are on identically 
> partitioned corsair neutrons, 256 GB = 238 GiB.  The gpt labels on those 
> two partitions are identical to the above, except one will have a 0 
> replacing the x, while the other has a 1, as they are my first and second 
> media of that size and brand.
> 
> hm0238gcn0+35l0
> hm0238gcn1+35l0
> 
> The primary backup of home, on a different pair of partitions on the same 
> physical devices, is labeled identically, except the partition number is 
> one:
> 
> hm0238gcnx+35l1
> 
> ... and its partitions:
> 
> hm0238gcn0+35l1
> hm0238gcn1+35l1
> 
> The secondary backup is on a reiserfs, on spinning rust:
> 
> hm0465gsg0+47f0
> 
> In that case the partition label and filesystem label are the same, since 
> the partition and its filesystem correspond 1:1.  It's home on the 465 
> GiB (aka 500 GB) seagate #0, targeted at the workstation, first formatted 
> in (201)4, on July 15, first (0) copy there.  (I could make it #3 instead 
> of #0, indicating second backup, but didn't, as I know that 0465gsg0+ is 
> the media and backups spinning rust device for the workstation.)
> 
> Both my internal and USB attached devices have the same labeling scheme, 
> media identified by size, brand, media sequence number and what it's 
> targetting, partition/filesystem identified by purpose, original 
> partition/format date, and partition sequence number.
> 
> As I said, it's effectively human-readable GUID, my own scheme for my own 
> devices.
> 
> And I use LABEL= in fstab as well, running gdisk -l to get a listing of 
> partitions with their gpt-labels when I need to associate actual sdN 
> mapping to specific partitions (if I don't already have the mapping from 
> mount or whatever).
> 
> Which makes it nice when btrfs fi show outputs filesystem label as well. 
> =:^)
> 
> The actual GUID is simply machine-readable but not necessary for the 
> human to deal with "noise", to me, as the label (of either the gpt 
> partition or the filesystem it hosts) gives me *FAR* more and more useful 
> information, while being entirely unique within my ID system.
> 
>> If "btrfs device scan" is user-space, then I think doing some output is
>> better then outputting nothing :) (perhaps with "-v" flag). If it is
>> kernel-space, then I agree that logging to dmesg is not very evident
>> (from perspective that user should remember where to look),
>> but I think has a value.
> 
> Well, btrfs is a userspace tool, but in this case, btrfs device scan's 
> use is purely to make a particular kernel call, which triggers the btrfs 
> module to do a device rescan to update its own records, *not* for human 
> consumption.  -v to force output could work if it had been designed that 
> way, but getting that output is precisely what btrfs filesystem show is 
> for, printing for both mounted and unmounted filesystems unless told 
> otherwise.
> 
> Put it this way.  If neither your initr* nor some service started before 
> whatever mounts local filesystems doesn't do a btrfs device scan, then 
> attempting to mount a multi-device btrfs will fail, unless all its 
> component devices have been fed in using device= options.  Why?  Because 
> mount takes exactly one device to mount.  With traditional filesystems, 
> that's enough, since they only consist of a single device.  And with 
> single-device btrfs, it's enough as well.  But with a multi-device btrfs, 
> something has to supply the other devices to btrfs, along with the one 
> that mount tells it about.  It is possible to list all those component 
> devices in device= options, but those take /dev/sd* style device nodes, 
> and those may change from boot to boot, so that's not very reliable.  
> Which is where btrfs device scan comes in.  It tells the btrfs module to 
> do a general scan and map out internally which devices belong to which 
> filesystems, after which a mount supplying just one of them can work, 
> since this internal map, the generation or refresh of which is triggered 
> by btrfs device scan, supplies the others.
> 
> IOW, btrfs device scan needs no output, because all the userspace command 
> does is call a kernel function, which triggers the mapping internal to 
> the btrfs kernel module, so it can then handle mounts with just one of 
> the possibly many devices handed to it from mount.
> 
> Outputting that mapping is an entirely different function, with the 
> userspace side of that being btrfs filesystem show, which calls a kernel 
> function that generates output back to the btrfs userspace app, which 
> then further formats it for output back to the user.

I understand that. If btrfs can show the mapping for *unmounted* volume
(e.g. "btrfs fi show /dev/sdb") that would be great. Also I think that
btrfs kernel-space can be smart enough and perform a scan, if mount was
attempted without a prio scan. So one should be able to mount (provided
that all devices are present) without a hassle.

>> Thanks. I have carefully read changelog wiki page and found that:
>>
>> btrfs-progs 4.2.2:
>> scrub: report status 'running' until all devices are finished
> 
> Thanks.  As I said, I had seen the patch on the list, and /thought/ it 
> was now in, but had lost track of specifically when it went in, or 
> indeed, /whether/ it had gone in.
> 
> Now I know it's in 4.2.2, without having to actually go look it up in the 
> git log again, myself.
> 
>> Idea concerning balance is listed on wiki page "Project ideas":
>>
>> balance: allow to run it in background (fork) and report status
>> periodically
> 
> FWIW, it sort of does that today, except that the btrfs bal start doesn't 
> actually return to the command prompt.  But again, what it actually does 
> is call a kernel function to initiate the balance, and then it's simply 
> waiting.  On my relatively small btrfs on partitioned ssd, the return is 
> often within a minute or two anyway, but on multi-TB spinning rust...
> 
> In any case, once the kernel function has triggered the balance, ctrl-C 
> should I believe terminate the userspace side and get you back to the 
> prompt, without terminating the balance as that continues on in kernel 
> space.
> 
> But it would still be useful to have balance start actually return 
> quickly, instead of having to ctrl-C it.

Thanks for expression your thoughts. I will keep my eye on new features
development.

-- 
With best regards,
Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-18  9:44           ` Dmitry Katsubo
@ 2015-10-26  7:09             ` Duncan
  2015-10-26  9:14             ` Duncan
  1 sibling, 0 replies; 12+ messages in thread
From: Duncan @ 2015-10-26  7:09 UTC (permalink / raw)
  To: linux-btrfs

Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:

[Regarding the btrfs raid1 "device-with-the-most-space" chunk-allocation 
strategy.]

> I think the mentioned strategy (fill in the device with most free space)
> is not most effective. If the data is spread equally, the read
> performance would be higher (reading from 3 disks instead of 2). In my
> case this is even crucial, because the smallest drive is SSD (and it is
> not loaded at all).
> 
> Maybe I don't see the benefit from the strategy which is currently
> implemented (besides that it is robust and well-tested)?

Two comments:

1) As Hugo alluded to, in striped mode (raid0/5/6 and I believe 10), the 
chunk allocator goes wide, allocating a chunk from each device with free 
space, then striping at something smaller (64 KiB maybe?).  When the 
smallest device is full, it reduces the width by one and continues 
allocating, down to the minimum stripe width for the raid type.  However, 
raid1 and single do device-with-the-most-space first, thus, particularly 
for raid1, ensuring maximum usage of available space.

Were raid1 to do width-first, capacity would be far lower and much more 
of the largest device would remain unusable, because some chunk pairs 
would be allocated entirely on the smaller devices, meaning less of the 
largest device would be used before the smaller devices fill up and no 
more raid1 chunks could be allocated as only the single largest device 
has free space left and raid1 requires allocation on two separate devices.

In the three-device raid1 case, the difference in usable capacity would 
be 1/3 the capacity of the smallest device, since until it is full, 1/3 
of all allocations would be to the two smaller devices, leaving that much 
more space unusable on the largest device.

So you see there's a reason for most-space-first, that being that it 
forces one chunk from each pair-allocation to the largest device, thereby 
most efficiently distributing space so as to leave as little space as 
possible unusable due to only one device left when pair-allocation is 
required.

2) There has been talk of a more flexible chunk allocator with an admin-
specified strategy allowing smart use of hybrid ssd/disk filesystems, for 
instance.  Perhaps put the metadata on the ssds, for instance, since 
btrfs metadata is relatively hot as in addition to the traditional 
metadata, it contains the checksums which btrfs of course checks on read.

However, this sort of thing is likely to be some time off, as it's 
relatively lower priority than various other possible features.  
Unfortunately, given the rate of btrfs development, "some time off" is in 
practice likely to be at least five years out.

In the mean time, there's technologies such as bcache that allow hybrid 
caching of "hot" data, designed to present themselves as virtual block 
devices so btrfs as well as other filesystems can layer on top.

And in fact, we have some regular users that have btrfs on top of bcache 
actually deployed, and from reports, it now works quite well.  (There 
were some problems awhile in the past, but they're several years in the 
past now, back well before the last couple LTS kernel series that's the 
oldest recommended for btrfs deployment.)

If you're interested, start a new thread with btrfs on bcache in the 
subject line, and you'll likely get some very useful replies. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-18  9:44           ` Dmitry Katsubo
  2015-10-26  7:09             ` Duncan
@ 2015-10-26  9:14             ` Duncan
  2015-10-26  9:24               ` Hugo Mills
  1 sibling, 1 reply; 12+ messages in thread
From: Duncan @ 2015-10-26  9:14 UTC (permalink / raw)
  To: linux-btrfs

Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:

>> Meanwhile, the present btrfs raid1 read-scheduler is both pretty simple
>> to code up and pretty simple to arrange tests for that run either one
>> side or the other, but not both, or that are well balanced to both.
>> However, it's pretty poor in terms of ensuring optimized real-world
>> deployment read-scheduling.
>> 
>> What it does is simply this.  Remember, btrfs raid1 is specifically two
>> copies.  It chooses which copy of the two will be read very simply,
>> based on the PID making the request.  Odd PIDs get assigned one copy,
>> even PIDs the other.  As I said, simple to code, great for ensuring
>> testing of one copy or the other or both, but not really optimized at
>> all for real-world usage.
>> 
>> If your workload happens to be a bunch of all odd or all even PIDs,
>> well, enjoy your testing-grade read-scheduler, bottlenecking everything
>> reading one copy, while the other sits entirely idle.
> 
> I think PID-based solution is not the best one. Why not simply take a
> random device? Then at least all drives in the volume are equally loaded
> (in average).

Nobody argues that the even/odd-PID-based read-scheduling solution is 
/optimal/, in a production sense at least.  But at the time and for the 
purpose it was written it was pretty good, arguably reasonably close to 
"best", because the implementation is at once simple and transparent for 
debugging purposes, and real easy to test either one side or the other, 
or both, and equally important, to duplicate the results of those tests, 
by simply arranging for the testing to have either all even or all odd 
PIDs, or both.  And for ordinary use, it's good /enough/, as ordinarily, 
PIDs will be evenly distributed even/odd.

In that context, your random device read-scheduling algorithm would be 
far worse, because while being reasonably simple, it's anything *but* 
easy to ensure reads go to only one side or equally to both, or for that 
matter, to duplicate the tests, because randomization, by definition 
does /not/ lend itself to duplication.

And with both simplicity/transparency/debuggability and duplicatability 
of testing being primary factors when the code went in...

And again, the fact that it hasn't been optimized since then, in the 
context of "premature optimization", really says quite a bit about what 
the btrfs devs themselves consider btrfs' status to be -- obviously *not* 
production-grade stable and mature, or optimizations like this would have 
already been done.

Like it or not, that's btrfs' status at the moment.

Actually, the coming N-way-mirroring may very well be why they've not yet 
optimized the even/odd-PID mechanism already, because doing an optimized 
two-way would obviously be premature-optimization given the coming N-way, 
and doing an N-way clearly couldn't be properly tested at present, 
because only two-way is possible.  Introducing an optimized N-way 
scheduler together with the N-way-mirroring code necessary to properly 
test it thus becomes a no-brainer.

> From what you said I believe that certain servers will not benefit from
> btrfs, e.g. dedicated server that runs only one "fat" Java process, or
> one "huge" MySQL database.

Indeed.  But with btrfs still "stabilizing, but not entirely stable and 
mature", and indeed, various features still set to drop, and various 
optimizations still yet to do including this one, nobody, leastwise not 
the btrfs devs and knowledgeable regulars on this list, is /claiming/ 
that btrfs is at this time the be-all and end-all optimal solution for 
every single use-case.  Rather far from it!

As for the claims of salespeople... should any of them be making wild 
claims about btrfs, who in their sane mind takes salespeople's claims at 
face value in any case?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-26  9:14             ` Duncan
@ 2015-10-26  9:24               ` Hugo Mills
  2015-10-27  5:58                 ` Duncan
  0 siblings, 1 reply; 12+ messages in thread
From: Hugo Mills @ 2015-10-26  9:24 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2624 bytes --]

On Mon, Oct 26, 2015 at 09:14:00AM +0000, Duncan wrote:
> Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:
> 
> >> Meanwhile, the present btrfs raid1 read-scheduler is both pretty simple
> >> to code up and pretty simple to arrange tests for that run either one
> >> side or the other, but not both, or that are well balanced to both.
> >> However, it's pretty poor in terms of ensuring optimized real-world
> >> deployment read-scheduling.
> >> 
> >> What it does is simply this.  Remember, btrfs raid1 is specifically two
> >> copies.  It chooses which copy of the two will be read very simply,
> >> based on the PID making the request.  Odd PIDs get assigned one copy,
> >> even PIDs the other.  As I said, simple to code, great for ensuring
> >> testing of one copy or the other or both, but not really optimized at
> >> all for real-world usage.
> >> 
> >> If your workload happens to be a bunch of all odd or all even PIDs,
> >> well, enjoy your testing-grade read-scheduler, bottlenecking everything
> >> reading one copy, while the other sits entirely idle.
> > 
> > I think PID-based solution is not the best one. Why not simply take a
> > random device? Then at least all drives in the volume are equally loaded
> > (in average).
> 
> Nobody argues that the even/odd-PID-based read-scheduling solution is 
> /optimal/, in a production sense at least.  But at the time and for the 
> purpose it was written it was pretty good, arguably reasonably close to 
> "best", because the implementation is at once simple and transparent for 
> debugging purposes, and real easy to test either one side or the other, 
> or both, and equally important, to duplicate the results of those tests, 
> by simply arranging for the testing to have either all even or all odd 
> PIDs, or both.  And for ordinary use, it's good /enough/, as ordinarily, 
> PIDs will be evenly distributed even/odd.
> 
> In that context, your random device read-scheduling algorithm would be 
> far worse, because while being reasonably simple, it's anything *but* 
> easy to ensure reads go to only one side or equally to both, or for that 
> matter, to duplicate the tests, because randomization, by definition 
> does /not/ lend itself to duplication.

   For what it's worth, David tried implementing round-robin (IIRC)
some time ago, and found that it performed *worse* than the pid-based
system. (It may have been random, but memory says it was round-robin).

   Hugo.

-- 
Hugo Mills             | Great films about cricket: The Umpire Strikes Back
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover btrfs volume which can only be mounded in read-only mode
  2015-10-26  9:24               ` Hugo Mills
@ 2015-10-27  5:58                 ` Duncan
  0 siblings, 0 replies; 12+ messages in thread
From: Duncan @ 2015-10-27  5:58 UTC (permalink / raw)
  To: linux-btrfs

Hugo Mills posted on Mon, 26 Oct 2015 09:24:57 +0000 as excerpted:

> On Mon, Oct 26, 2015 at 09:14:00AM +0000, Duncan wrote:
>> Dmitry Katsubo posted on Sun, 18 Oct 2015 11:44:08 +0200 as excerpted:
>> 
>>> I think PID-based solution is not the best one. Why not simply take a
>>> random device? Then at least all drives in the volume are equally
>>> loaded (in average).
>> 
>> Nobody argues that the even/odd-PID-based read-scheduling solution is
>> /optimal/, in a production sense at least.  But [it's near ideal for
>> testing, and "good enough" for the most general case].
> 
> For what it's worth, David tried implementing round-robin (IIRC)
> some time ago, and found that it performed *worse* than the pid-based
> system. (It may have been random, but memory says it was round-robin).

What I'd like to know is what mdraid1 uses, and if btrfs can get that.  
Because some upgrades worth ago, after trying mdraid6 for the main system 
and mdraid0 for some parts (with mdraid1 for boot since grub1 could deal 
with it, but not the others), I eventually settled on 4-way mdraid1 for 
everything, using the same disks I had used for the raid6 and raid0.

And I was rather blown away by the mdraid1 speed, in comparison, 
especially compared to raid0, which I thought would be better than 
raid1.  I guess my use-case is multi-thread read-heavy enough that the 
whatever mdraid1 uses, I was getting upto four separate reads (one per 
spindle) going at once, while writes still happened at single-spindle 
speed as with SATA (as opposed to the older IDE, this was when SATA was 
still new), each spindle had its own channel and they could write in 
parallel with bottleneck being the speed at which the slowest of the four 
completed its write.  So writes were single-spindle-speed, still far 
faster than the raid6 read-modify-write cycle, while reads... it really 
did appear to multitask one per spindle.

Also, the mdraid1 may have actually taken into account spindle head 
location as well, and scheduled reads to the spindle with the head 
already positioned closest to the target, tho I'm not sure on that.

But whatever mdraid1 scheduling does, I was totally astonished at how 
efficient it was, and it really did turn my thinking on most efficient 
raid choices upside down.  So if btrfs could simply take that scheduler 
and modify it as necessary for btrfs specifics, provided the 
modifications weren't /too/ heavy (and the fact that btrfs does read-time 
checksum verification could very well mean the algorithm as directly 
adapted as possible may not reach anything like the same efficiency), I 
really do think that'd be the ideal.  And of course it's freedomware code 
in the same kernel, so reusing the mdraid read-scheduler shouldn't be the 
problem it might be in other circumstances, tho the possible caveat of 
btrfs specific implementation issues does remain.

And of course someone would have to take the time to adapt it to work 
with btrfs, which gets us back onto the practical side of things, the 
"opportunity rich, developer-time poor" situation that is btrfs coding 
reality, premature optimization, possibly doing it at the same time as N-
way-mirroring, etc.

But anyway, mdraid's raid1 read-scheduler really does seem to be 
impressively efficient, the benchmark to try to match, if possible.  If 
that can be done by reusing some of the same code, so much the better. 
=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-10-27  5:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-14 14:28 Recover btrfs volume which can only be mounded in read-only mode Dmitry Katsubo
2015-10-14 14:40 ` Anand Jain
2015-10-14 20:27   ` Dmitry Katsubo
2015-10-15  0:48     ` Duncan
2015-10-15 14:10       ` Dmitry Katsubo
2015-10-15 14:55         ` Hugo Mills
2015-10-16  8:18         ` Duncan
2015-10-18  9:44           ` Dmitry Katsubo
2015-10-26  7:09             ` Duncan
2015-10-26  9:14             ` Duncan
2015-10-26  9:24               ` Hugo Mills
2015-10-27  5:58                 ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.