All of lore.kernel.org
 help / color / mirror / Atom feed
* strange 3.16.3 problem
@ 2014-10-18  3:54 Russell Coker
       [not found] ` <CAHGunUkzXZ-ybUR_y3tHzGwtn_45gq8YQJyEqteBX3zqWzUakA@mail.gmail.com>
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Russell Coker @ 2014-10-18  3:54 UTC (permalink / raw)
  To: Btrfs BTRFS

I have a system running the Debian 3.16.3-2 AMD64 kernel for the Xen Dom0 and 
the DomUs.

The Dom0 has a pair of 500G SATA disks in a BTRFS RAID-1 array.  The RAID-1 
array has some subvols exported by NFS as well as a subvol for the disk images 
for the DomUs - I am not using NoCOW as performance is fine without it and I 
like having checksums on everything.

I have started having some problems with a mail server that is running in a 
DomU.  The mail server has 32bit user-space because it was copied from a 32bit 
system and I had no reason to upgrade it to 64bit, but it's running a 64bit 
kernel so I don't think that 32bit user-space is related to my problem.

# find . -name "*546"
./1412233213.M638209P10546
# ls -l ./1412233213.M638209P10546
ls: cannot access ./1412233213.M638209P10546: No such file or directory

Above is the problem, find says that the file in question exists but ls 
doesn't think so, the file in question is part of a Maildir spool that's NFS 
mounted.  This problem persisted across a reboot of the DomU, so it's a 
problem with the Dom0 (the NFS server).

The dmesg output on the Dom0 doesn't appear to have anything relevant, and a 
find command doesn't find the file.  I don't know if this is a NFS problem or 
a BTRFS problem.  I haven't rebooted the Dom0 yet because a remote reboot of a 
server running a kernel from Debian/Unstable is something I try to avoid.

Any suggestions?

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
       [not found] ` <CAHGunUkzXZ-ybUR_y3tHzGwtn_45gq8YQJyEqteBX3zqWzUakA@mail.gmail.com>
@ 2014-10-18 10:29   ` Russell Coker
  0 siblings, 0 replies; 22+ messages in thread
From: Russell Coker @ 2014-10-18 10:29 UTC (permalink / raw)
  To: Michael Johnson - MJ; +Cc: Btrfs BTRFS

On Sat, 18 Oct 2014, "Michael Johnson - MJ" <mj@revmj.com> wrote:
> The NFS client is part of the kernel iirc, so it should be 64 bit.  This
> would allow the creation of files larger than 4gb and create possible
> issues with a 32 bit user space utility.

A correctly written 32bit application will handle files >4G in size.

While some applications may have problems, I'm fairly sure that ls will be ok.

# dd if=/dev/zero of=/tmp/test bs=1024k count=1 seek=5000
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00383089 s, 274 MB/s
# /bin/ls -lh /tmp/test
-rw-r--r--. 1 root root 4.9G Oct 18 20:47 /tmp/test
# file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically 
linked (uses shared libs), for GNU/Linux 2.6.26, 
BuildID[sha1]=0xd3280633faaabf56a14a26693d2f810a32222e51, stripped

A quick test shows that a 32bit ls can handle this.

> I would mount from a client with 64 bit user space and see if the problem
> occurs there.  If so, it is probably not a btrfs issue (if I am
> understanding your environment correctly).

I'll try that later.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-18  3:54 strange 3.16.3 problem Russell Coker
       [not found] ` <CAHGunUkzXZ-ybUR_y3tHzGwtn_45gq8YQJyEqteBX3zqWzUakA@mail.gmail.com>
@ 2014-10-18 13:33 ` Robert White
  2014-10-18 23:41   ` Russell Coker
  2014-10-19 10:46 ` Chris Samuel
  2014-10-20  4:38 ` Duncan
  3 siblings, 1 reply; 22+ messages in thread
From: Robert White @ 2014-10-18 13:33 UTC (permalink / raw)
  To: russell, Btrfs BTRFS

On 10/17/2014 08:54 PM, Russell Coker wrote:
> # find . -name "*546"
> ./1412233213.M638209P10546
> # ls -l ./1412233213.M638209P10546
> ls: cannot access ./1412233213.M638209P10546: No such file or directory
>
> Any suggestions?
>

Does "ls -l *546" show the file to exist? e.g. what happens if you use 
the exact same wildcard in the ls command as you used in the find?

It is possible (and back in the day it was quite common) for files to be 
created with non-renderable nonsense in the name. for instance if the 
first four characters of the name were "13^H4" (where ^H is the single 
backspace character) the file wold look like it was named 14* but it 
would be listed by ls using "13*". If the file name is "damaged", which 
is usually a failing in the program that created the file, then it can 
be "hidden in plain sight".

Note that this sort of name is hidden from the copy-paste done in the 
terminal window because the binary nonsense is just not in the output 
any more by the time you select it with the mouse.

It doesn't have to be a backspace, BTW, it can be any character that the 
terminal window will not render.

If things get really ugly you may need to remove the file using

find . -name "*546" -exec rm "{}" \;

(This takes the wildcard expansion out of the hands of the shell and 
makes it happen in the find command, which may have different 
functionality in your build.)

Anyway, this sort of mangled file name can happen in any file system as 
the various binary and non-printable name elements are completely legal 
in the POSIX standard.

-- Rob.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-18 13:33 ` Robert White
@ 2014-10-18 23:41   ` Russell Coker
  2014-10-19  5:37     ` Duncan
  2014-10-20 17:37     ` Robert White
  0 siblings, 2 replies; 22+ messages in thread
From: Russell Coker @ 2014-10-18 23:41 UTC (permalink / raw)
  To: Robert White; +Cc: Btrfs BTRFS

On Sun, 19 Oct 2014, Robert White <rwhite@pobox.com> wrote:
> On 10/17/2014 08:54 PM, Russell Coker wrote:
> > # find . -name "*546"
> > ./1412233213.M638209P10546
> > # ls -l ./1412233213.M638209P10546
> > ls: cannot access ./1412233213.M638209P10546: No such file or directory
> > 
> > Any suggestions?
> 
> Does "ls -l *546" show the file to exist? e.g. what happens if you use
> the exact same wildcard in the ls command as you used in the find?

# ls -l *546 
ls: cannot access 1412233213.M638209P10546: No such file or directory

That gives the same result as find, the shell matches the file name but then 
ls can't view it.

lstat64("1412233213.M638209P10546", 0x9fab0c8) = -1 ENOENT (No such file or 
directory)

>From strace, the lstat64 system call fails.
 
> It is possible (and back in the day it was quite common) for files to be
> created with non-renderable nonsense in the name. for instance if the
> first four characters of the name were "13^H4" (where ^H is the single
> backspace character) the file wold look like it was named 14* but it
> would be listed by ls using "13*". If the file name is "damaged", which
> is usually a failing in the program that created the file, then it can
> be "hidden in plain sight".

If that's the case then it's still a kernel bug somewhere.  Maildrop and 
Dovecot don't create files with any unusual characters in the names.

> Note that this sort of name is hidden from the copy-paste done in the
> terminal window because the binary nonsense is just not in the output
> any more by the time you select it with the mouse.
> 
> It doesn't have to be a backspace, BTW, it can be any character that the
> terminal window will not render.
> 
> If things get really ugly you may need to remove the file using
> 
> find . -name "*546" -exec rm "{}" \;

# find . -name "*546" -exec rm "{}" \;
rm: cannot remove `./1412233213.M638209P10546': No such file or directory

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-18 23:41   ` Russell Coker
@ 2014-10-19  5:37     ` Duncan
  2014-10-19 10:19       ` Duncan
  2014-10-20 17:37     ` Robert White
  1 sibling, 1 reply; 22+ messages in thread
From: Duncan @ 2014-10-19  5:37 UTC (permalink / raw)
  To: linux-btrfs

Russell Coker posted on Sun, 19 Oct 2014 10:41:41 +1100 as excerpted:

> # find . -name "*546" -exec rm "{}" \;
> rm: cannot remove `./1412233213.M638209P10546': No such file or
> directory

Going with the non-printable-character theory, what happens if you expand 
that *546 find one character at a time?  Does *0546 work? *10546? etc.

Additionally, I'd say use the default print instead of the -exec rm.  
Because once you find it, you might want to do other tests (doing a file 
on it to find type, finding the size, possibly catting it...) to figure 
out what it is and possibly how it came to get there, before ultimate 
removal.

When you find a boundary where it goes from working to not-working, what 
happens if you stick a wildcard in that boundary?  Assuming *0546 doesn't 
work, for instance, thus creating a boundary between the 0 and the 5, 
what about *0*546 or *0?546?

... Just things I'd be trying were I to see such a thing here.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-19  5:37     ` Duncan
@ 2014-10-19 10:19       ` Duncan
  0 siblings, 0 replies; 22+ messages in thread
From: Duncan @ 2014-10-19 10:19 UTC (permalink / raw)
  To: linux-btrfs

Duncan posted on Sun, 19 Oct 2014 05:37:36 +0000 as excerpted:

> Russell Coker posted on Sun, 19 Oct 2014 10:41:41 +1100 as excerpted:
> 
>> # find . -name "*546" -exec rm "{}" \;
>> rm: cannot remove `./1412233213.M638209P10546': No such file or
>> directory
> 
> Going with the non-printable-character theory, what happens if you
> expand that *546 find one character at a time?  Does *0546 work? *10546?
> etc.
> 
> When you find a boundary where it goes from working to not-working, what
> happens if you stick a wildcard in that boundary?  Assuming *0546
> doesn't work, for instance, thus creating a boundary between the 0 and
> the 5, what about *0*546 or *0?546?

FWIW, I just had something similar happen here, except ls could see the 
files and tell me what happened, tho for a moment I was wondering...  In 
my case it was a couple symlinks, dead because the partition they pointed 
into wasn't mounted.  But with this thread fresh in my mind, of course it 
was the first thing to come to mind...


Another idea for potentially figuring out what's going on...

If you have tab-completion active, what sort of auto-completes does it 
offer with for instance ls 141<tab> ?  If necessary, again you can try 
expanding one character at a time, except of course from the left here 
instead of from the right as above.

For things like colons, I know bash-completion here fills in \: in place 
of simply colon.  I just tested what it'd do with a backspace char 
embedded in a filename, and tab-completion substitutes ^H (while ls 
substitutes ? ).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-18  3:54 strange 3.16.3 problem Russell Coker
       [not found] ` <CAHGunUkzXZ-ybUR_y3tHzGwtn_45gq8YQJyEqteBX3zqWzUakA@mail.gmail.com>
  2014-10-18 13:33 ` Robert White
@ 2014-10-19 10:46 ` Chris Samuel
  2014-10-20  4:38 ` Duncan
  3 siblings, 0 replies; 22+ messages in thread
From: Chris Samuel @ 2014-10-19 10:46 UTC (permalink / raw)
  To: linux-btrfs

Hiya Russell,

On Sat, 18 Oct 2014 02:54:19 PM Russell Coker wrote:

> # find . -name "*546"
> ./1412233213.M638209P10546
> # ls -l ./1412233213.M638209P10546
> ls: cannot access ./1412233213.M638209P10546: No such file or directory

Does:

find . -name "*546" -ls

work at all?

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-18  3:54 strange 3.16.3 problem Russell Coker
                   ` (2 preceding siblings ...)
  2014-10-19 10:46 ` Chris Samuel
@ 2014-10-20  4:38 ` Duncan
  2014-10-20 13:02   ` Zygo Blaxell
  3 siblings, 1 reply; 22+ messages in thread
From: Duncan @ 2014-10-20  4:38 UTC (permalink / raw)
  To: linux-btrfs

Russell Coker posted on Sat, 18 Oct 2014 14:54:19 +1100 as excerpted:

> # find . -name "*546"
> ./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot
> access ./1412233213.M638209P10546: No such file or directory

Does your mail server do a lot of renames?  Is one perhaps stuck?  If so, 
that sounds like the same thing "Zygo Blaxell" is reporting in the 
"3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014 
15:25:26 -400, Msg-ID: <20141019192525.GA29401@hungrycats.org>, as linked 
here:

<http://permalink.gmane.org/gmane.comp.file-systems.btrfs/39539>

I pointed him at this thread too.  I hadn't seen you mention a hung 
rename, but the other symptoms sound similar.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-20  4:38 ` Duncan
@ 2014-10-20 13:02   ` Zygo Blaxell
  2014-10-20 13:19     ` Austin S Hemmelgarn
  2014-10-21 10:13     ` Russell Coker
  0 siblings, 2 replies; 22+ messages in thread
From: Zygo Blaxell @ 2014-10-20 13:02 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2728 bytes --]

On Mon, Oct 20, 2014 at 04:38:28AM +0000, Duncan wrote:
> Russell Coker posted on Sat, 18 Oct 2014 14:54:19 +1100 as excerpted:
> 
> > # find . -name "*546"
> > ./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot
> > access ./1412233213.M638209P10546: No such file or directory
> 
> Does your mail server do a lot of renames?  Is one perhaps stuck?  If so, 
> that sounds like the same thing "Zygo Blaxell" is reporting in the 
> "3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014 
> 15:25:26 -400, Msg-ID: <20141019192525.GA29401@hungrycats.org>, as linked 
> here:
> 
> <http://permalink.gmane.org/gmane.comp.file-systems.btrfs/39539>
> 
> I pointed him at this thread too.  I hadn't seen you mention a hung 
> rename, but the other symptoms sound similar.

Not really.  It looks like Russell having a NFS client-side problem,
I'm having a server-side one (maybe).  Also, all Russell's system calls
seem to be returning promptly, while some of mine are not.  Even if
there were timeouts, an NFS server timeout gives a different error than
'No such file or directory'.  Finally, the one and only thing I _can_
do with my bug is 'ls' on the renamed files (for me, the find would get
stuck before returning any output).

For Russell's issue...most of the stuff I can think of has been
tried already.  I didn't see if there was any attempt try to ls the
file from the NFS server as well as the client side.  If ls is OK on
the server but not the client, it's an NFS issue (possibly interacting
with some btrfs-specific quirk); otherwise, it's likely a corrupted
filesystem (mail servers seem to be unusually good at making these).

Most of the I/O time on mail servers tends to land in the fsync() system
call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after
3.16, and not in the 3.16.x stable update for x <= 5 (the last one
I've checked)).  That said, I'm not familiar with how fsync() translates
over NFS, so it might not be relevant after all.

If the NFS server's view of the filesystem is OK, check the NFS protocol
version from /proc/mounts on the client.  Sometimes NFS clients will
get some transient network error during connection and fall back to some
earlier (and potentially buggier) NFS version.  I've seen very different
behavior in some important corner cases from v4 and v3 clients, for
example, and if the client is falling all the way back to v2 the bugs
and their workarounds start to get just plain _weird_ (e.g. filenames
which produce specific values from some hash function or that contain
specific character sequences are unusable).  v2 is so old it may even
have issues with 64-bit inode numbers.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-20 13:02   ` Zygo Blaxell
@ 2014-10-20 13:19     ` Austin S Hemmelgarn
  2014-10-21 10:13     ` Russell Coker
  1 sibling, 0 replies; 22+ messages in thread
From: Austin S Hemmelgarn @ 2014-10-20 13:19 UTC (permalink / raw)
  To: Zygo Blaxell, Duncan; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3069 bytes --]

On 2014-10-20 09:02, Zygo Blaxell wrote:
> On Mon, Oct 20, 2014 at 04:38:28AM +0000, Duncan wrote:
>> Russell Coker posted on Sat, 18 Oct 2014 14:54:19 +1100 as excerpted:
>>
>>> # find . -name "*546"
>>> ./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot
>>> access ./1412233213.M638209P10546: No such file or directory
>>
>> Does your mail server do a lot of renames?  Is one perhaps stuck?  If so,
>> that sounds like the same thing "Zygo Blaxell" is reporting in the
>> "3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014
>> 15:25:26 -400, Msg-ID: <20141019192525.GA29401@hungrycats.org>, as linked
>> here:
>>
>> <http://permalink.gmane.org/gmane.comp.file-systems.btrfs/39539>
>>
>> I pointed him at this thread too.  I hadn't seen you mention a hung
>> rename, but the other symptoms sound similar.
>
> Not really.  It looks like Russell having a NFS client-side problem,
> I'm having a server-side one (maybe).  Also, all Russell's system calls
> seem to be returning promptly, while some of mine are not.  Even if
> there were timeouts, an NFS server timeout gives a different error than
> 'No such file or directory'.  Finally, the one and only thing I _can_
> do with my bug is 'ls' on the renamed files (for me, the find would get
> stuck before returning any output).
>
> For Russell's issue...most of the stuff I can think of has been
> tried already.  I didn't see if there was any attempt try to ls the
> file from the NFS server as well as the client side.  If ls is OK on
> the server but not the client, it's an NFS issue (possibly interacting
> with some btrfs-specific quirk); otherwise, it's likely a corrupted
> filesystem (mail servers seem to be unusually good at making these).
>
> Most of the I/O time on mail servers tends to land in the fsync() system
> call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after
> 3.16, and not in the 3.16.x stable update for x <= 5 (the last one
> I've checked)).  That said, I'm not familiar with how fsync() translates
> over NFS, so it might not be relevant after all.
>
> If the NFS server's view of the filesystem is OK, check the NFS protocol
> version from /proc/mounts on the client.  Sometimes NFS clients will
> get some transient network error during connection and fall back to some
> earlier (and potentially buggier) NFS version.  I've seen very different
> behavior in some important corner cases from v4 and v3 clients, for
> example, and if the client is falling all the way back to v2 the bugs
> and their workarounds start to get just plain _weird_ (e.g. filenames
> which produce specific values from some hash function or that contain
> specific character sequences are unusable).  v2 is so old it may even
> have issues with 64-bit inode numbers.
>
Just now saw this thread, but IIRC 'No such file or directory' also gets 
returned sometimes when trying to automount a share that can't be 
enumerated by the client, and also sometimes when there is a stale NFS 
file handle.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-18 23:41   ` Russell Coker
  2014-10-19  5:37     ` Duncan
@ 2014-10-20 17:37     ` Robert White
  2014-10-20 20:21       ` Goffredo Baroncelli
  1 sibling, 1 reply; 22+ messages in thread
From: Robert White @ 2014-10-20 17:37 UTC (permalink / raw)
  To: russell; +Cc: Btrfs BTRFS

On 10/18/2014 04:41 PM, Russell Coker wrote:
> On Sun, 19 Oct 2014, Robert White <rwhite@pobox.com> wrote:
>> On 10/17/2014 08:54 PM, Russell Coker wrote:
>>> # find . -name "*546"
>>> ./1412233213.M638209P10546
>>> # ls -l ./1412233213.M638209P10546
>>> ls: cannot access ./1412233213.M638209P10546: No such file or directory
>>>
>>> Any suggestions?
>>
>> Does "ls -l *546" show the file to exist? e.g. what happens if you use
>> the exact same wildcard in the ls command as you used in the find?
>
> # ls -l *546
> ls: cannot access 1412233213.M638209P10546: No such file or directory
>
> That gives the same result as find, the shell matches the file name but then
> ls can't view it.
>
> lstat64("1412233213.M638209P10546", 0x9fab0c8) = -1 ENOENT (No such file or
> directory)
>
>  From strace, the lstat64 system call fails.

Okay, from the strace output the shell _is_ finding the file in the 
directory read and expand (readdir) pass. That is "*546" is being 
expanded to the full file name text "1412233213.M638209P10546" but then 
the actual operation fails because the name is apparently not associated 
with anything.

So what pass of scrub or btrfsck checks directory connectedness? Does 
that pass give your file system a clean bill of health?

Also you said that you are using a 32bit user space "copied from another 
server" under a 64bit kernel. Is the "ls" command a 32 bit executable then?

What happens if you stop the Xen domain for the mail server and then 
mount the disks into a native 64bit environment and then ls the file name?

I ask because the man page for lstat64 says its a "wrapper" for the 
underlying system call (fstatat64). It is not impossible that you might 
have a case where the wrapper is failing inside glibc due to some 32/64 
bit conversion taking place.

Since you copied the entire 32bit environment from another (older?) 
server there may be some nonsense happening where the two interfaces meet.

I'd check the file system against a native 64bit kernel and user-space 
next. Possibly from a distro CD if necessary, just to isolate the 
potential file system causes from the user-space causes. If the native 
64bit environment fails then its a fs issue, if the natvie 64bit 
operations work, then its a userspace problem and you win the fun of 
remaking the mail server from scratch.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-20 17:37     ` Robert White
@ 2014-10-20 20:21       ` Goffredo Baroncelli
  2014-10-21  9:50         ` Duncan
  0 siblings, 1 reply; 22+ messages in thread
From: Goffredo Baroncelli @ 2014-10-20 20:21 UTC (permalink / raw)
  To: Robert White, russell; +Cc: Btrfs BTRFS

On 10/20/2014 07:37 PM, Robert White wrote:
> On 10/18/2014 04:41 PM, Russell Coker wrote:
[...]
> Also you said that you are using a 32bit user space "copied from
> another server" under a 64bit kernel. Is the "ls" command a 32 bit
> executable then?

Could this be related to the inode overflow in 32 bit system 
(see inode_cache options) ? If so running a 64bit "ls -i" should
work....


> -- To unsubscribe from this list: send the line "unsubscribe
> linux-btrfs" in the body of a message to majordomo@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-20 20:21       ` Goffredo Baroncelli
@ 2014-10-21  9:50         ` Duncan
  2014-10-21 10:16           ` inode_cache " Roman Mamedov
  2014-10-21 16:40           ` Goffredo Baroncelli
  0 siblings, 2 replies; 22+ messages in thread
From: Duncan @ 2014-10-21  9:50 UTC (permalink / raw)
  To: linux-btrfs

Goffredo Baroncelli posted on Mon, 20 Oct 2014 22:21:04 +0200 as
excerpted:

> On 10/20/2014 07:37 PM, Robert White wrote:
>> On 10/18/2014 04:41 PM, Russell Coker wrote:
> [...]
>> Also you said that you are using a 32bit user space "copied from
>> another server" under a 64bit kernel. Is the "ls" command a 32 bit
>> executable then?
> 
> Could this be related to the inode overflow in 32 bit system (see
> inode_cache options) ? If so running a 64bit "ls -i" should work....

Good point.  Russell might just owe you a beverage of choice.  =:^)

The inode_cache mount option isn't recommended for any bitness.

@ Russ, are you mounting with inode_cache?  If so, definitely try running 
without it and see if it changes the results.

(FWIW I wish that mount option would just go away as it would definitely 
remove an invitation to a Russian roulette party with their data for the 
unwary, but I suppose there's someone paying some bills somewhere that 
wants it kept for some specific use-case where the performance gain must 
be worth the calculated risk, thus continuing that invitation to data 
Russian roulette for everyone else.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-20 13:02   ` Zygo Blaxell
  2014-10-20 13:19     ` Austin S Hemmelgarn
@ 2014-10-21 10:13     ` Russell Coker
  2014-10-21 10:42       ` Russell Coker
                         ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Russell Coker @ 2014-10-21 10:13 UTC (permalink / raw)
  To: linux-btrfs

On Tue, 21 Oct 2014, Zygo Blaxell <zblaxell@furryterror.org> wrote:
> On Mon, Oct 20, 2014 at 04:38:28AM +0000, Duncan wrote:
> > Russell Coker posted on Sat, 18 Oct 2014 14:54:19 +1100 as excerpted:
> > > # find . -name "*546"
> > > ./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls:
> > > cannot access ./1412233213.M638209P10546: No such file or directory
> > 
> > Does your mail server do a lot of renames?  Is one perhaps stuck?  If so,
> > that sounds like the same thing "Zygo Blaxell" is reporting in the
> > "3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014
> > 15:25:26 -400, Msg-ID: <20141019192525.GA29401@hungrycats.org>, as linked
> > here:

It's a Maildir server so it does a lot of renames, but I don't think anything 
is stuck.  I've just rebooted the Dom0 and nothing has changed.

> For Russell's issue...most of the stuff I can think of has been
> tried already.  I didn't see if there was any attempt try to ls the
> file from the NFS server as well as the client side.  If ls is OK on
> the server but not the client, it's an NFS issue (possibly interacting
> with some btrfs-specific quirk); otherwise, it's likely a corrupted
> filesystem (mail servers seem to be unusually good at making these).

# ls -l *546
ls: cannot access *546: No such file or directory

Above is on the server.

# ls -l *546
ls: cannot access 1412233213.M638209P10546: No such file or directory

Above is on the client.  Note that wildcard expansion worked because readdir() 
found the file even though stat can't.

> Most of the I/O time on mail servers tends to land in the fsync() system
> call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after
> 3.16, and not in the 3.16.x stable update for x <= 5 (the last one
> I've checked)).  That said, I'm not familiar with how fsync() translates
> over NFS, so it might not be relevant after all.

That's going to suck for people running mail servers on Debian.

> If the NFS server's view of the filesystem is OK, check the NFS protocol
> version from /proc/mounts on the client.  Sometimes NFS clients will
> get some transient network error during connection and fall back to some
> earlier (and potentially buggier) NFS version.  I've seen very different
> behavior in some important corner cases from v4 and v3 clients, for
> example, and if the client is falling all the way back to v2 the bugs
> and their workarounds start to get just plain _weird_ (e.g. filenames
> which produce specific values from some hash function or that contain
> specific character sequences are unusable).  v2 is so old it may even
> have issues with 64-bit inode numbers.

Rebooting the client multiple times and rebooting the server once doesn't 
change it.  I don't think it's any transient error.

On Tue, 21 Oct 2014, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:
> Just now saw this thread, but IIRC 'No such file or directory' also gets 
> returned sometimes when trying to automount a share that can't be 
> enumerated by the client, and also sometimes when there is a stale NFS 
> file handle.

I think that rebooting both client and server precludes the possibility of a 
stale file handle.  Even rebooting the client (which I have done several 
times) should fix it.

On Tue, 21 Oct 2014, Robert White <rwhite@pobox.com> wrote:
> Okay, from the strace output the shell _is_ finding the file in the
> directory read and expand (readdir) pass. That is "*546" is being
> expanded to the full file name text "1412233213.M638209P10546" but then
> the actual operation fails because the name is apparently not associated
> with anything.
> 
> So what pass of scrub or btrfsck checks directory connectedness? Does
> that pass give your file system a clean bill of health?

That's inconvenient for a remote system with a single BTRFS filesystem.

> Also you said that you are using a 32bit user space "copied from another
> server" under a 64bit kernel. Is the "ls" command a 32 bit executable then?

Yes.

> What happens if you stop the Xen domain for the mail server and then
> mount the disks into a native 64bit environment and then ls the file name?

The filesystem in question is NFS mounted from a server with 64bit kernel+user 
to a virtual server with 64bit kernel+32bit user.  On the file server (the Xen 
Dom0) ls doesn't even see that file in readdir.

> I ask because the man page for lstat64 says its a "wrapper" for the
> underlying system call (fstatat64). It is not impossible that you might
> have a case where the wrapper is failing inside glibc due to some 32/64
> bit conversion taking place.

If there is a 32/64 conversion then we have another problem.  The mail server 
is configured to reject messages bigger than about 50M, I don't recall the 
exact number but it's a lot smaller than 2G.

On Tue, 21 Oct 2014, Goffredo Baroncelli <kreijack@inwind.it> wrote:
> Could this be related to the inode overflow in 32 bit system 
> (see inode_cache options) ? If so running a 64bit "ls -i" should
> work....

I've just installed coreutils:amd64 on the NFS client and I get the same 
results.

On Tue, 21 Oct 2014, Duncan <1i5t5.duncan@cox.net> wrote:
> The inode_cache mount option isn't recommended for any bitness.
> 
> @ Russ, are you mounting with inode_cache?  If so, definitely try running 
> without it and see if it changes the results.

/dev/sda3 / btrfs rw,seclabel,noatime,space_cache,skip_balance 0 0

The above is in /proc/mounts.  I have configured my systems to use 
skip_balance because in the past I've had a balance cause big problems on 
several occasions and I've never had a resumed balance do any good.  I think 
that noatime is unlikely to cause any problems.  I don't know what space_cache 
is about, is that something the kernel adds automatically?

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* inode_cache  Re: strange 3.16.3 problem
  2014-10-21  9:50         ` Duncan
@ 2014-10-21 10:16           ` Roman Mamedov
  2014-10-21 12:08             ` Duncan
  2014-10-21 16:40           ` Goffredo Baroncelli
  1 sibling, 1 reply; 22+ messages in thread
From: Roman Mamedov @ 2014-10-21 10:16 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Tue, 21 Oct 2014 09:50:37 +0000 (UTC)
Duncan <1i5t5.duncan@cox.net> wrote:

> (FWIW I wish that mount option would just go away as it would definitely 
> remove an invitation to a Russian roulette party with their data for the 
> unwary, but I suppose there's someone paying some bills somewhere that 
> wants it kept for some specific use-case where the performance gain must 
> be worth the calculated risk, thus continuing that invitation to data 
> Russian roulette for everyone else.)

Why do you think it is so dangerous? Just because of possible bugs? But bugs
can be anywhere in Btrfs, why specifically single out one mount option.

Let's take a look at its description in the wiki:

"inode_cache (since 3.0)
Enable free inode number caching. Not recommended to use unless files on your
filesystem get assigned inode numbers that are approaching 2^64. Normally, new
files in each subvolume get assigned incrementally (plus one from the last
time) and are not reused. The mount option turns on caching of the existing
inode numbers and reuse of inode numbers of deleted files. This option may
slow down your system at first run, or after mounting without the option."
https://btrfs.wiki.kernel.org/index.php/Mount_options

As you can see it's not about performance, but rather more of a recognition
that a filesystem with some pre-determined finite lifetime expectancy is not a
good thing to have; even though 2^64 is a lot, there are various scenarios out
there, including millions of files and constant creation and removal of
snapshots, that may make the FS hit the limit faster than you would expect.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-21 10:13     ` Russell Coker
@ 2014-10-21 10:42       ` Russell Coker
  2014-10-21 15:23         ` strange 3.16.3 problem (er... never mind 8-) Robert White
  2014-10-21 12:25       ` strange 3.16.3 problem Duncan
  2014-10-21 15:10       ` Robert White
  2 siblings, 1 reply; 22+ messages in thread
From: Russell Coker @ 2014-10-21 10:42 UTC (permalink / raw)
  To: linux-btrfs

I've just upgraded the Dom0 (NFS server) from 3.16.3 to 3.16.5 and it all 
works.

Prior to upgrading the Dom0 I had the same problem occur with different file 
names.  All the names in question were truncated names of files that exist.  
It seems that 3.16.3 has a bug with NFS serving files with long names.

Thanks for all the suggestions.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: inode_cache  Re: strange 3.16.3 problem
  2014-10-21 10:16           ` inode_cache " Roman Mamedov
@ 2014-10-21 12:08             ` Duncan
  0 siblings, 0 replies; 22+ messages in thread
From: Duncan @ 2014-10-21 12:08 UTC (permalink / raw)
  To: linux-btrfs

Roman Mamedov posted on Tue, 21 Oct 2014 16:16:11 +0600 as excerpted:

> On Tue, 21 Oct 2014 09:50:37 +0000 (UTC)
> Duncan <1i5t5.duncan@cox.net> wrote:
> 
>> (FWIW I wish that mount option would just go away as it would
>> definitely remove an invitation to a Russian roulette party with their
>> data for the unwary, but I suppose there's someone paying some bills
>> somewhere that wants it kept for some specific use-case where the
>> performance gain must be worth the calculated risk, thus continuing
>> that invitation to data Russian roulette for everyone else.)
> 
> Why do you think it is so dangerous? Just because of possible bugs? But
> bugs can be anywhere in Btrfs, why specifically single out one mount
> option.
> 
> Let's take a look at its description in the wiki:
> 
> "inode_cache (since 3.0)
> Enable free inode number caching. Not recommended to use unless files on
> your filesystem get assigned inode numbers that are approaching 2^64.
> Normally, new files in each subvolume get assigned incrementally (plus
> one from the last time) and are not reused. The mount option turns on
> caching of the existing inode numbers and reuse of inode numbers of
> deleted files.
> This option may slow down your system at first run, or after mounting
> without the option."
> https://btrfs.wiki.kernel.org/index.php/Mount_options
> 
> As you can see it's not about performance, but rather more of a
> recognition that a filesystem with some pre-determined finite lifetime
> expectancy is not a good thing to have; even though 2^64 is a lot, there
> are various scenarios out there, including millions of files and
> constant creation and removal of snapshots, that may make the FS hit the
> limit faster than you would expect.

inode_cache is generally not needed on 64-bit, and it is known to cause 
problems on 32-bit where a cache overflow and non-unique cached inode-
numbers is possible on large filesystems, as well as boot-time slowdowns 
(including timeouts on mounting, for filesystems mounted at boot) on 64-
bit.

I guess the real trouble is that the problems with it aren't well 
documented and relatively few people know about them, mostly regulars on 
this list, so people end up enabling it even on 64-bit where about the 
only effect is a boot-time slowdown and the increased chance of crash-
corruption of yet another cache, as well as on 32-bit where it's actually 
useful if somewhat risky (especially for large filesystems), thus getting 
themselves in needless trouble.  If there was a big IF YOU USE THIS AND 
IT GOES BAD YOU GET TO KEEP THE PIECES warning on it, I guess fewer 
people would use it, but then people would be asking questions about why 
it's there in the first place.

And I don't know why, as I've only seen it cause needless problems, never 
actually help, and I know that everyone here recommends turning it off 
without any exception I've seen.  But the conspiracy theory side of me 
says if it's causing problems and not helping, and it's still there, 
there must be a reason...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-21 10:13     ` Russell Coker
  2014-10-21 10:42       ` Russell Coker
@ 2014-10-21 12:25       ` Duncan
  2014-10-21 15:10       ` Robert White
  2 siblings, 0 replies; 22+ messages in thread
From: Duncan @ 2014-10-21 12:25 UTC (permalink / raw)
  To: linux-btrfs

Russell Coker posted on Tue, 21 Oct 2014 21:13:29 +1100 as excerpted:

>  I don't know what
> space_cache is about, is that something the kernel adds automatically?

Yes, space_cache is the default.

Apparently early in space_cache history you had to mount with space_cache 
once, and the kernel would then detect the existence of the space-cache-
tree and always use the option after that.

But for quite some time now, over a year since I've not ever added that 
to my mount options since I got the ssds and began using btrfs on them, 
the kernel seems to enable it automatically from the first mount, unless 
you specifically tell it not to.

Similarly for the ssd option, if the kernel detects that you are running 
an ssd (I believe it checks the ata/scsi rotational media property, which 
it should detect properly on raw hardware, but which can get lost if 
btrfs is layered over top of lvm/mdraid/dmcrypt/etc), it'll automatically 
enable the ssd mount option, which is exactly what it does here.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-21 10:13     ` Russell Coker
  2014-10-21 10:42       ` Russell Coker
  2014-10-21 12:25       ` strange 3.16.3 problem Duncan
@ 2014-10-21 15:10       ` Robert White
  2 siblings, 0 replies; 22+ messages in thread
From: Robert White @ 2014-10-21 15:10 UTC (permalink / raw)
  To: russell, linux-btrfs

On 10/21/2014 03:13 AM, Russell Coker wrote:
> On Tue, 21 Oct 2014, Robert White <rwhite@pobox.com> wrote:
>> What happens if you stop the Xen domain for the mail server and then
>> mount the disks into a native 64bit environment and then ls the file name?
>
> The filesystem in question is NFS mounted from a server with 64bit kernel+user
> to a virtual server with 64bit kernel+32bit user.  On the file server (the Xen
> Dom0) ls doesn't even see that file in readdir.

So we need to do some variable isolation as I am now not sure what Xen 
would have to do with anything.

If the file doesn't exist under that name on the NFS server, then _that_ 
is where you need to do the find/ls checks for various name expansions. 
That is, all the various wildcard checks need to happen on the real 
server that has mounted the BTRFS in order to find the actual file that 
is leading to the phantom file. E.g. if the file "isn't there" on the 
BTRFS then the problem is really an NFS translation problem of some sort.

This problem involves two physical servers or just one?

The network connection between the two semantic servers is physical 
(real cables) or semantic (a Xen bridge etc)?

You are using NFS version? Over udp or tcp? using what options?

You are or you are not using any sort of secondary cache on top of your 
NFS? e.g. a cachefiles directory on a little local slice somewhere on 
either system. If so you have or have not cleared that cache manually?

You have or have not cleared the NFS server state (typically found in 
/var/lib/nfs or some such)?

The means you are using to synchronize time between the systems is?

Understand that at this point you've described an NFS problem (possibly 
an NFS server problem with BTRFS) but not a BTRFS problem per-se, so we 
have to figure out what the server sees on the file system before we can 
guess why the client is seeing what it is seeing.

>
>> I ask because the man page for lstat64 says its a "wrapper" for the
>> underlying system call (fstatat64). It is not impossible that you might
>> have a case where the wrapper is failing inside glibc due to some 32/64
>> bit conversion taking place.
>
> If there is a 32/64 conversion then we have another problem.  The mail server
> is configured to reject messages bigger than about 50M, I don't recall the
> exact number but it's a lot smaller than 2G.

This potential conversion issue has nothing to do with file size and 
everything to do with internal structure alignment and significant bits 
in things like file handles. (though now I'm not sure what matters now.)

NFS is sort of old and crufty in some cases, particularly it's own 
internal file handles operation, that was originally designed around 
absolute inodes-by-number. Technology moved on while NFS was just sort 
of cruft-patched to deal with what it could no longer understand. NFSv4 
is intended to fix lots of those problems (and if you aren't using it, 
it might be worth a stab, but it has its own departures and issues, 
particularly with trying to mount a v4 root without an initramfs stage).

(NOTE: I think there _is_ something NFS-server-from-BTRFS related as 
when I wireshark a particular problem I've been having with an NFS root 
environment, I've been getting some unexpected NOENT responses in the 
NFS data stream. If you are comfortable with wireshark/tcpdump etc you 
might want to look there as well. Coercing a mount point at the point of 
service and using fsid= in /etc/exports seems to have given me some 
progresss, but it sounds like that might be a bit much for your problem.)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem (er... never mind 8-)
  2014-10-21 10:42       ` Russell Coker
@ 2014-10-21 15:23         ` Robert White
  0 siblings, 0 replies; 22+ messages in thread
From: Robert White @ 2014-10-21 15:23 UTC (permalink / raw)
  To: russell, linux-btrfs

On 10/21/2014 03:42 AM, Russell Coker wrote:
> I've just upgraded the Dom0 (NFS server) from 3.16.3 to 3.16.5 and it all
> works.
>
> Prior to upgrading the Dom0 I had the same problem occur with different file
> names.  All the names in question were truncated names of files that exist.
> It seems that 3.16.3 has a bug with NFS serving files with long names.
>
> Thanks for all the suggestions.
>

Well never mind my message from a few minutes ago...

But thanks for finding that problem/solution. I've been having an NFS 
problem of my own and it is from a server running 3.16.3... so you may 
have just made my day. 8-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-21  9:50         ` Duncan
  2014-10-21 10:16           ` inode_cache " Roman Mamedov
@ 2014-10-21 16:40           ` Goffredo Baroncelli
  2014-10-22  7:12             ` Duncan
  1 sibling, 1 reply; 22+ messages in thread
From: Goffredo Baroncelli @ 2014-10-21 16:40 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 10/21/2014 11:50 AM, Duncan wrote:
> Goffredo Baroncelli posted on Mon, 20 Oct 2014 22:21:04 +0200 as
> excerpted:
> 
[...]
>> > 
>> > Could this be related to the inode overflow in 32 bit system (see
>> > inode_cache options) ? If so running a 64bit "ls -i" should work....
> Good point.  Russell might just owe you a beverage of choice.  =:^)
> 
> The inode_cache mount option isn't recommended for any bitness.


Hi Ducan, 
could you elaborate this sentence ? From my understanding 
inode_cache is *needed* on 32bit system in order to avoid inode number
overflow. Why are you saying that it is not recommended ?
Even if there are bugs, these have to be corrected. A bugs cannot be
a reason to remove a needed option.

Inode exhaustion is worse than a slowness... Otherwise BTRFS would be not
suitable to a 32 bit system... But please tell me your opinion because may 
be I misunderstood something... 

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: strange 3.16.3 problem
  2014-10-21 16:40           ` Goffredo Baroncelli
@ 2014-10-22  7:12             ` Duncan
  0 siblings, 0 replies; 22+ messages in thread
From: Duncan @ 2014-10-22  7:12 UTC (permalink / raw)
  To: linux-btrfs

Goffredo Baroncelli posted on Tue, 21 Oct 2014 18:40:19 +0200 as
excerpted:

> On 10/21/2014 11:50 AM, Duncan wrote:
>> Goffredo Baroncelli posted on Mon, 20 Oct 2014 22:21:04 +0200 as
>> excerpted:
>> 
> [...]
>>> > 
>>> > Could this be related to the inode overflow in 32 bit system (see
>>> > inode_cache options) ? If so running a 64bit "ls -i" should work....
>> Good point.  Russell might just owe you a beverage of choice.  =:^)
>> 
>> The inode_cache mount option isn't recommended for any bitness.
> 
> 
> Hi Ducan,
> could you elaborate this sentence ? From my understanding inode_cache is
> *needed* on 32bit system in order to avoid inode number overflow. Why
> are you saying that it is not recommended ?

My understanding of this is limited as I'm a sysadmin and list regular, 
not a dev let alone a btrfs dev, but see the btrfs (5) manpage (aka btrfs-
mount), under mount options:

"""""

inode_cache: Enable free inode number caching. Defaults to off due to an 
overflow problem when the free space crcs don’t fit inside a single page.

"""""

As I understand it based on developer comments to this effect, 64-bit 
doesn't need it at all, and on 32-bit, in theory there are cases where 
it'd be useful, but in practice, this overflow problem, among others (see 
the discussion below), limits its usefulness to such an extent that it's 
not recommended for use, even on 32-bit where in theory it could be of 
use.

That's the extent of the theory I know on the subject.

Then there's the real-world reports and their effect on things:

* In part because inode_cache is pretty well universally negative-
recommended when it is seen, it's a poorly tested feature, and reported 
bugs never get traced to it because as soon as people see it, they say 
turn it off as its problems are worse than the problem it's trying to 
cure, so it's turned off and the bugs disappear and everbody's happy, 
without tracing down the problem.

One solid example of that was a report that btrfs was consistently taking 
an unreasonably long time (five minutes plus) to mount, making it 
unworkable as a filesystem mounted at boot from fstab.  (I believe that 
user was on systemd and systemd was timing out the localmount service, 
but it would have been similar on any other init system, as very few will 
by default let anything but fsck go for five minutes without timing it 
out.)  inode_cache was apparently reinitializing at every mount, instead 
of just once.  Were that space_cache, the bug would have almost certainly 
been traced and ultimately fixed.  But being inode_cache which isn't 
recommended anyway, we recommended that he turn inode_cache off, and he 
did, and btrfs suddenly behaved itself, effectively confirming opinions 
that inode_cache isn't worth the trouble.

I believe I've also seen failure to boot due to inode_cache corruption 
issues reported, very similar to the ones that used to plague space_cache 
and that hit me at one point so I know how bad they were, as nospace_cache 
and/or clear_cache could fix the space_cache problem, but back then it 
was a manual fix so you had to know about it.  But the space_cache issues 
were traced and fixed since it's the default and detection and recovery 
for space_cache corruption is normally automatic these days, while who 
knows _what_ happened to the same sorts of issues with inode_cache, 
because the recommendation is simply to turn it off and be done with the 
problem, instead.  However, I'm not as sure on this one as on the long-
mount-time issue, I believe because I was still getting my own btrfs 
bearings at the time, and the link wasn't as strong to me as it got lost 
in the blur of everything else I was learning about btrfs at the same 
time.

I guess I just changed my own mind a bit on it as I wrote that, but the 
end-user effect is almost the same, except there's an exception now.  
Basically, the situation is still the same for ordinary users, don't 
touch it as it's likely to result in needless problems.  But for that 
stubborn but tech inclined user willing to be a guinea pig, particularly 
if they're a dev that can actively help trace down bugs in the code as 
well as usefully write up in sysadmin's or plainer English exactly where 
it makes sense to use this option and what its real problems are, there's 
definitely an opening to help make this a (hopefully much, but just about 
anything would be an improvement) better documented and less buggy mount 
option. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-10-22  7:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-18  3:54 strange 3.16.3 problem Russell Coker
     [not found] ` <CAHGunUkzXZ-ybUR_y3tHzGwtn_45gq8YQJyEqteBX3zqWzUakA@mail.gmail.com>
2014-10-18 10:29   ` Russell Coker
2014-10-18 13:33 ` Robert White
2014-10-18 23:41   ` Russell Coker
2014-10-19  5:37     ` Duncan
2014-10-19 10:19       ` Duncan
2014-10-20 17:37     ` Robert White
2014-10-20 20:21       ` Goffredo Baroncelli
2014-10-21  9:50         ` Duncan
2014-10-21 10:16           ` inode_cache " Roman Mamedov
2014-10-21 12:08             ` Duncan
2014-10-21 16:40           ` Goffredo Baroncelli
2014-10-22  7:12             ` Duncan
2014-10-19 10:46 ` Chris Samuel
2014-10-20  4:38 ` Duncan
2014-10-20 13:02   ` Zygo Blaxell
2014-10-20 13:19     ` Austin S Hemmelgarn
2014-10-21 10:13     ` Russell Coker
2014-10-21 10:42       ` Russell Coker
2014-10-21 15:23         ` strange 3.16.3 problem (er... never mind 8-) Robert White
2014-10-21 12:25       ` strange 3.16.3 problem Duncan
2014-10-21 15:10       ` Robert White

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.