All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 107301] New: system hang during ext4 xattr operation
@ 2015-11-05 13:54 bugzilla-daemon
  2015-11-05 15:45 ` [Bug 107301] " bugzilla-daemon
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-05 13:54 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

            Bug ID: 107301
           Summary: system hang during ext4 xattr operation
           Product: File System
           Version: 2.5
    Kernel Version: 4.2.3 3.19 3.16
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: sileht@sileht.net
        Regression: No

Created attachment 192191
  --> https://bugzilla.kernel.org/attachment.cgi?id=192191&action=edit
dmesg received via netconsole before the system hang

Hi,

We are running a ceph cluster on ext4 filesystem, we recently got a hardware
failure, the ceph recovery process provokes a huge amount of data write on all
our ext4 filesystems (~40 disks).

Now, we are experienced random nodes hang, we catch some partial backtrace
(that can be found on the ceph bug tracker). And recently we got the full dmesg
log via netconsole (attached to this BZ).

When the freeze occurs, it seems ceph processes lockup all the CPUs, each CPUs
backtrace is related to a xattr operation. 

bug report on ceph side: http://tracker.ceph.com/issues/13662

We have some nodes on debian and some other on ubuntu, we tried kernels 3.16,
3.19, 4.2.3. The issue occurs with all of them. 

Cheers,

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
@ 2015-11-05 15:45 ` bugzilla-daemon
  2015-11-05 17:02 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-05 15:45 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

Jan Kara <jack@suse.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #192191|application/octet-stream    |text/plain
          mime type|                            |
                 CC|                            |jack@suse.cz

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
  2015-11-05 15:45 ` [Bug 107301] " bugzilla-daemon
@ 2015-11-05 17:02 ` bugzilla-daemon
  2015-11-05 17:41 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-05 17:02 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

Jan Kara <jack@suse.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #1 from Jan Kara <jack@suse.cz> ---
Hum, from the stack traces in the log it seems mb_cache_entry_alloc() is racing
with other operations on the LRU and restarting all the time. Another
possibility is that there are lots of entries in the LRU and it takes a long
time to scan (if I remember right ceph is a heavy user of xattrs).

The first problem would be easily fixed by adding cond_resched() at appropriate
place, the second problem would require more intrusive changes in how LRU is
handled.

Can you reproduce the issue?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
  2015-11-05 15:45 ` [Bug 107301] " bugzilla-daemon
  2015-11-05 17:02 ` bugzilla-daemon
@ 2015-11-05 17:41 ` bugzilla-daemon
  2015-11-06  2:43 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-05 17:41 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #2 from Mehdi Abaakouk <sileht@sileht.net> ---
Yes ceph heavy uses xattr

I don't have a 'step by step to reproduce' list. But I have the issue ~4-5
times per day on a ceph cluster since the first incident.

On November 5, 2015 6:02:57 PM GMT+01:00, bugzilla-daemon@bugzilla.kernel.org
wrote:
>https://bugzilla.kernel.org/show_bug.cgi?id=107301
>
>Jan Kara <jack@suse.cz> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>             Status|NEW                         |NEEDINFO
>
>--- Comment #1 from Jan Kara <jack@suse.cz> ---
>Hum, from the stack traces in the log it seems mb_cache_entry_alloc()
>is racing
>with other operations on the LRU and restarting all the time. Another
>possibility is that there are lots of entries in the LRU and it takes a
>long
>time to scan (if I remember right ceph is a heavy user of xattrs).
>
>The first problem would be easily fixed by adding cond_resched() at
>appropriate
>place, the second problem would require more intrusive changes in how
>LRU is
>handled.
>
>Can you reproduce the issue?
>
>-- 
>You are receiving this mail because:
>You reported the bug.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (2 preceding siblings ...)
  2015-11-05 17:41 ` bugzilla-daemon
@ 2015-11-06  2:43 ` bugzilla-daemon
  2015-11-06 20:29 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-06  2:43 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

Theodore Tso <tytso@mit.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu

--- Comment #3 from Theodore Tso <tytso@mit.edu> ---
If Ceph is using xattrs that are all different for each inode, the mbcache is
not really useful at all.   So I wonder if we should have a mount option which
disables the mbcache.   This will prove the scalability of the xattr functions,
and should be a pretty big performance win for Ceph.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (3 preceding siblings ...)
  2015-11-06  2:43 ` bugzilla-daemon
@ 2015-11-06 20:29 ` bugzilla-daemon
  2015-11-07 15:24 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-06 20:29 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #4 from Laurent GUERBY <laurent@guerby.net> ---
Don't know if this is still around:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/mbcache.c?id=335e92e8a515420bd47a6b0f01cb9a206c0ed6e4

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (4 preceding siblings ...)
  2015-11-06 20:29 ` bugzilla-daemon
@ 2015-11-07 15:24 ` bugzilla-daemon
  2015-11-07 17:58 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-07 15:24 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #5 from Laurent GUERBY <laurent@guerby.net> ---
My reading of the mbcache code is that

cache->c_max_entries = bucket_count << 4

is limiting to on average 16 entries per bucket. For ext4 bucket_bits is 10 so
1024 buckets.

Once the cache is full which is very likely situation given ceph use of xattr
with differing values all ceph threads will compete for the global mbcache
locks on *each* xattr operation.

Am I correct here?

ceph is designed to have one process managing one non shared device the mbcache
design is very likely limiting scalability here by having global locks. Plus
probably has a locking bug.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (5 preceding siblings ...)
  2015-11-07 15:24 ` bugzilla-daemon
@ 2015-11-07 17:58 ` bugzilla-daemon
  2015-11-09  8:50 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-07 17:58 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #6 from Laurent GUERBY <laurent@guerby.net> ---
Created attachment 192351
  --> https://bugzilla.kernel.org/attachment.cgi?id=192351&action=edit
Remove ext4 mbcache

Would this (untested/mechanical) patch be ok to test mbcache removal?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (6 preceding siblings ...)
  2015-11-07 17:58 ` bugzilla-daemon
@ 2015-11-09  8:50 ` bugzilla-daemon
  2015-11-09  9:24 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-09  8:50 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #7 from Jan Kara <jack@suse.cz> ---
Yeah, that would test what the performance would look like without mbcache.
BTW: Among xattrs ceph is using how many of them are the same? Mbcache is a win
for the common case where xattrs are mostly used for ACLs or SE Linux labels
and thus the reuse is big (mbcache is essentially a deduplication layer for
xattrs). 

And yes, it is in a need of some updates to meet current scalability demands (I
don't think what you hit is a bug as such, rather an inefficiency that becomes
lethal at your scale) - other users than ceph occasionally report issues as
well. Probably we should track things per-fs, not globally, hook each per-fs
mbcache into the shrinker framework and don't introduce artificial upper bounds
on the number of entries and instead let natural memory pressure deal with it.

For now I'm not convinced adding a mount option to disable mbcache is the right
way to go. Rather we should make it lightweight enough that it doesn't add too
much overhead for the cases where it is not needed. With the mount option there
is always the trouble that someone has to know it to turn it on/off and
sometimes there even isn't a good choice as you can have heavy xattr reuse for
some inodes and also quite a few unique xattrs for other inodes...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (7 preceding siblings ...)
  2015-11-09  8:50 ` bugzilla-daemon
@ 2015-11-09  9:24 ` bugzilla-daemon
  2015-11-09 10:11 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-09  9:24 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #8 from Laurent GUERBY <laurent@guerby.net> ---
Created attachment 192491
  --> https://bugzilla.kernel.org/attachment.cgi?id=192491&action=edit
xattr -l on random ceph files

I ran xattr on various ceph files on our cluster and there are small ones that
look mostly constant (or small value set), but the big ones "user.ceph._" are
clearly all differents (beginning common but there's probably hash or something
inside near the end).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (8 preceding siblings ...)
  2015-11-09  9:24 ` bugzilla-daemon
@ 2015-11-09 10:11 ` bugzilla-daemon
  2015-11-09 14:31 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-09 10:11 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #9 from Laurent GUERBY <laurent@guerby.net> ---
Note: ceph files are all big in our use case since we use it for rbd, about 4
Mbyte each, so two or three additional xattr blocks per file won't change disk
usage in volume for us. For people using ceph as object store and with their
own use of xattr the situation could be different.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (9 preceding siblings ...)
  2015-11-09 10:11 ` bugzilla-daemon
@ 2015-11-09 14:31 ` bugzilla-daemon
  2015-11-09 15:22 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-09 14:31 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

Sage Weil <sage@newdream.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sage@newdream.net

--- Comment #10 from Sage Weil <sage@newdream.net> ---
The most common Ceph xattr is ~270 bytes and always has a unique value.  There
are also a couple that are a few bytes each (and always the same), but i would
expect the little ones get inlined in the inode?

Anyway, it sounds like disabling mbcache would be a win!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (10 preceding siblings ...)
  2015-11-09 14:31 ` bugzilla-daemon
@ 2015-11-09 15:22 ` bugzilla-daemon
  2015-11-09 18:13 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-09 15:22 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #11 from Jan Kara <jack@suse.cz> ---
Actually if most of inodes have xattrs like this, then the biggest win would
probably be to create filesystem with 512-byte inodes where the ~270-byte xattr
will fit into the inode as well. Then you save on extra block lookup etc.

WRT disabling mbcache, it seems disabling it would really help your usecase.
However I'm not a big fan of mount options for disabling/enabling various
functionality in the filesystem as it blows up a test matrix and tends to be a
usability issue as well (people don't know the option or don't know when to
enable / disable it). So I'd prefer if we just fixed mbcache...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (11 preceding siblings ...)
  2015-11-09 15:22 ` bugzilla-daemon
@ 2015-11-09 18:13 ` bugzilla-daemon
  2015-11-10 11:16 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-09 18:13 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

Andreas Dilger <adilger.kernelbugzilla@dilger.ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adilger.kernelbugzilla@dilg
                   |                            |er.ca

--- Comment #12 from Andreas Dilger <adilger.kernelbugzilla@dilger.ca> ---
Created attachment 192551
  --> https://bugzilla.kernel.org/attachment.cgi?id=192551&action=edit
add "no_mbcache

We have a patch for Lustre ldiskfs (ext4 modified with Lustre patches) to
disable mbcache, since it has similar performance impact for Lustre servers,
and provides no value because the xattrs Lustre stores on each file are unique
and cannot be shared.  

Canonical patch location:
http://git.hpdd.intel.com/fs/lustre-release.git/blob_plain/HEAD:/ldiskfs/kernel_patches/patches/rhel7/ext4-disable-mb-cache.patch

While the referenced patch is for RHEL 7, it is small enough to port easily to
the upstream kernel.

This patch adds a "no_mbcache" mount option, which Lustre automatically adds to
the filesystem options when the servers are mounted.  There was a patch to
improve mbcache performance in ext4 by making the cache per-sb, but that
doesn't improve the contention within a filesystem, and doesn't avoid the fact
that mbcache provides absolutely no value for Lustre (or Ceph, it seems). 
Doing no work at all is better than doing it somewhat more efficiently.

The only way I could see this working automatically is if mbcache disabled
itself after having a low/zero cache hit ratio within a certain number of
inserts, and if not finding any shared xattr blocks when reading from disk.  In
the meantime, I think having a mount option is a viable alternative for those
people who know better than the auto-detection heuristics.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (12 preceding siblings ...)
  2015-11-09 18:13 ` bugzilla-daemon
@ 2015-11-10 11:16 ` bugzilla-daemon
  2015-11-11  2:37 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-10 11:16 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #13 from Laurent GUERBY <laurent@guerby.net> ---
We restarted all of our cluster machines with ubuntu 3.19 + my ext4 mbcache
removal patch: so far no data loss, no new freeze and no visible change in
performance (production too noisy to detect small changes).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (13 preceding siblings ...)
  2015-11-10 11:16 ` bugzilla-daemon
@ 2015-11-11  2:37 ` bugzilla-daemon
  2015-11-11 10:28 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-11  2:37 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #14 from Theodore Tso <tytso@mit.edu> ---
In response to Jan's concerns in comment #11, I wonder if we can use some
hueristics to decide when to use mbcache and when to skip it.   So if there are
ways that we can identify certain xattr's by name or type as being almost
always unique, then if there are any of those xattrs in the external xattr
block, there's no point using the mbcache.   If we had such a function, we
could also use it to bias putting the unique xattrs in the extended inode
space, which might increase the chance that the external xattr block is more
likely to be shareable.

The final thing I'll note is that because of the mbcache locking, it can turn
into a real scalability bottlencheck for ext4, and it's not clear this is a
soluble problem.   So if you are running on a high-CPU count machine, and you
are using xattrs, and (for example) the SELinux xattr is so piggy that it won't
fit in your 256-byte inodes, things can get pretty painful.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (14 preceding siblings ...)
  2015-11-11  2:37 ` bugzilla-daemon
@ 2015-11-11 10:28 ` bugzilla-daemon
  2015-11-11 11:37 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-11 10:28 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #15 from Jan Kara <jack@suse.cz> ---
So I guess there are two separate issues:
1) mbcache scales poorly - that is worth addressing regardless of whether ceph
/ lustre really need it or not since as you mention there are cases where
mbcache helps and scalability is an issue.
2) some usecases do not benefit from mbache much (or at all) and so we could
possibly have a heuristic to disable mbcache altogether.

My current feeling is that if mbcache is implemented properly, then the
overhead of it is a hash-table insertion / deletion when creating / removing
xattr and that should be pretty cheap compared to all the other work we do to
create external xattr (although cache line contention could be an issue here to
some degree).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (15 preceding siblings ...)
  2015-11-11 10:28 ` bugzilla-daemon
@ 2015-11-11 11:37 ` bugzilla-daemon
  2015-11-11 14:53 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-11 11:37 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #16 from Andreas Dilger <adilger.kernelbugzilla@dilger.ca> ---
I think that regardless of how efficient mbcache can be made, it is not as
efficient (both CPU and RAM) as not using it at all in cases where it is not
providing any benefit.  I'm not against improving the efficiency, but I think
it still makes sense to have an option to disable it completely. Since mbcache
is already a per-sb cache, having a per-sb mount option makes sense and doesn't
interfere with other improvements to mbcache.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (16 preceding siblings ...)
  2015-11-11 11:37 ` bugzilla-daemon
@ 2015-11-11 14:53 ` bugzilla-daemon
  2015-11-18 20:48 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-11 14:53 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #17 from Jan Kara <jack@suse.cz> ---
Ultimately it is Ted's call but your argument is like (randomly taken out of
top of my head): "Proper locking to protect from hole punching costs us some
performance and our workload doesn't use hole punching so let's create mount
option to disable the locking". Sure it can be done and it will benefit your
workload but how much you gain? And how many users need this? This has to be
weighted against the cost of the new mount option in terms of testing and
usability (and code complexity but that is fairly small in this case so I'm not
that concerned).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (17 preceding siblings ...)
  2015-11-11 14:53 ` bugzilla-daemon
@ 2015-11-18 20:48 ` bugzilla-daemon
  2015-11-19  0:49 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-18 20:48 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #18 from Laurent GUERBY <laurent@guerby.net> ---
Should I formally submit the mbcache removal patch?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (18 preceding siblings ...)
  2015-11-18 20:48 ` bugzilla-daemon
@ 2015-11-19  0:49 ` bugzilla-daemon
  2015-11-19 15:11 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-19  0:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #19 from Andreas Dilger <adilger.kernelbugzilla@dilger.ca> ---
Jan, your example is a red herring, because the removal of mbcache at most
affects the performance and space efficiency, not correctness. 

I agree that making mbcache more efficient is a good goal, but I don't think
that ot should block the landing of this patch. Rather, it makes sense to land
the patch and we can still fix mbcache performance if anyone actually wants to
use it. It will also serve as the basis for any heuristic to turn mbcache on
and off dynamically instead of (or in addition to) the mount option.

Laurent, I'd be greatful if you pushed the patch upstream for the latest kernel
as I'm traveling this week.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (19 preceding siblings ...)
  2015-11-19  0:49 ` bugzilla-daemon
@ 2015-11-19 15:11 ` bugzilla-daemon
  2015-12-10  2:51 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-11-19 15:11 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #20 from Theodore Tso <tytso@mit.edu> ---
Apologies for not responding right away.  I'm currently on vacation.
So this also means that if someone sends me a patch right away, I'm
not likely to have time to look at it for a week or two.

As far as trying to make mbcache more scalable, this would be great,
but I suspect it's not going to be very easy because it requires a
global data structure, which is fundamentally hard to scale.  I can
imagine some schemes that sacrifice some of mbcache's ability to spot
duplicate xattr sets --- for example, you could just use a trylock,
and if there is lock contention just give up on detecting duplicate
xattrs.

Or maybe we could use some hueristics based on the xattr type/key to
decide whether using mbcache is likely to be successful.  So if we
know that Posix ACL's are very likely to be shared, and ceph xattrs
are very unlikely to be shared, this could be used to decide whether
or not to use mbcache automatically, without requiring a mount option.

Even better would be one where for unknonw xattr type/key
combinations, we use some learning algorithm which determines after
using mbcache for N instances of that xattr, if the percentage of
cache hits is too low, we stop using mbcache for that type/key
combination.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (20 preceding siblings ...)
  2015-11-19 15:11 ` bugzilla-daemon
@ 2015-12-10  2:51 ` bugzilla-daemon
  2015-12-10 16:51 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-12-10  2:51 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #21 from Andreas Gruenbacher <agruen@kernel.org> ---
As far as ceph is concerned, are there reasons for not using 512-byte inodes?
In-inode xattrs are generally much faster.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (21 preceding siblings ...)
  2015-12-10  2:51 ` bugzilla-daemon
@ 2015-12-10 16:51 ` bugzilla-daemon
  2016-03-31 13:53 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2015-12-10 16:51 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #22 from Theodore Tso <tytso@mit.edu> ---
Using 512 byte inodes for Ceph is a really good idea.

Before folks go there, can people give Jan Kara's mbcache rewrite a try?

http://thread.gmane.org/gmane.comp.file-systems.ext4/51094

I'd really appreciate comments and any performance numbers you could give me on
your workloads.

Thanks!!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (22 preceding siblings ...)
  2015-12-10 16:51 ` bugzilla-daemon
@ 2016-03-31 13:53 ` bugzilla-daemon
  2016-04-08 10:09 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2016-03-31 13:53 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

Erwan Velu <erwan@redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |erwan@redhat.com

--- Comment #23 from Erwan Velu <erwan@redhat.com> ---
I'm making a short review on that issue that Ceph user could trigger.

Regarding the current state, it seems that Jan got his work to be present in
the 4.6.x series.

Laurent, Medhi, could you make a try with Jan's patch to confirm that its ok
for you too ?

Testing the 4.6-rc1 would be ideal as we could tell Ceph's users then that 4.6
is fixing the issue you had.

Thanks,

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (23 preceding siblings ...)
  2016-03-31 13:53 ` bugzilla-daemon
@ 2016-04-08 10:09 ` bugzilla-daemon
  2016-04-08 20:46 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2016-04-08 10:09 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

Philipp Matthias Hahn <pmhahn@pmhahn.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pmhahn@pmhahn.de

--- Comment #24 from Philipp Matthias Hahn <pmhahn@pmhahn.de> ---
Created attachment 212131
  --> https://bugzilla.kernel.org/attachment.cgi?id=212131&action=edit
Multiple NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [smbd:8822]

We have a similar situation with Samba4, which also heavily uses xattr to store
the NTACLs.

>From OCR (may contain errors):

[281537.032878] Call Trace:
[281537.032882]  [<ffffffffa0117c7c>] ? mb_cache_entry_get+0x1ac/0x1f0
[mbcache]
[281537.032883]  [<ffffffff81208f1f>] ? __find_get_block+0xef/0x120
[281537.032889]  [<ffffffffa033de70>] ? ext4_xattr_block_set+0x80/0xa50 [ext4]
[281537.032890]  [<ffffffff81208f1f>] ? __find_get_block+0xef/0x120
[281537.032896]  [<ffffffffa033d1da>] ? ext4_xattr_set_entrg+0x2a/0x350 [ext4]
[281537.032901]  [<ffffffffa033f3e6>] ? ext4_xattr_set_handle+0x376/0x4d0
[ext4]
[281537.032907]  [<ffffffffa033f614>] ? ext4_xattr_set+0xd4/0x130 [ext4]
[281537.032909]  [<ffffffff811f917e>] ? generic_setxattr+0x6e/0xa0
[281537.032910]  [<ffffffff811f9d91>] ? __ufs_setxattr_noperm+0x71/0x1d0
[281537.032912]  [<ffffffff811f9fb4>] ? ufs_setxattr+0xc4/0xd0
[281537.032914]  [<ffffffff811fa0f4>] ? setxattr+0x134/0x1f0
[281537.032916]  [<ffffffff811e1301>] ? filenane_lookup+0x31/0xd0
[281537.032917]  [<ffffffff811e50bc>] ? user_path_at_emptg+0x60/0xc0
[281537.032918]  [<ffffffff811d7473>] ? __sb_start_urite+0x53/0x100
[281537.032919]  [<ffffffff811fa240>] ? path_setxattr+0x90/0xc0
[281537.032921]  [<ffffffff811fa314>] ? SgS_setxattr+0x14/0x20
[281537.032922]  [<ffffffff8159de72>] ? system_call_fast_compare_end+0x0/0x6b
[281537.032933] Code: f0 0f c1 07 89 02 c1 ea 10 66 39 02 75 01 03 0f b7 f2 b8
00 80 00 00 0f b7 0f 41 89 08 41 31 d0 41 81 e0 f
e ff 00 00 74 10 f3 90 <83> e8 01 75 e7 0f 1f 80 00 00 00 00 eb d9 0f b7 f1 e8
08 75 ff

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (24 preceding siblings ...)
  2016-04-08 10:09 ` bugzilla-daemon
@ 2016-04-08 20:46 ` bugzilla-daemon
  2016-04-12 12:52 ` bugzilla-daemon
  2016-04-13  9:07 ` bugzilla-daemon
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2016-04-08 20:46 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #25 from Jan Kara <jack@suse.cz> ---
Phillip, I see you are using some variant of 4.1 kernel. As comment 23
mentions, fixes for the problem have landed in 4.6-rc1. So is it possible for
you to test that kernel?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (25 preceding siblings ...)
  2016-04-08 20:46 ` bugzilla-daemon
@ 2016-04-12 12:52 ` bugzilla-daemon
  2016-04-13  9:07 ` bugzilla-daemon
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2016-04-12 12:52 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #26 from Philipp Matthias Hahn <pmhahn@pmhahn.de> ---
(In reply to Jan Kara from comment #25)
> Phillip, I see you are using some variant of 4.1 kernel. As comment 23
> mentions, fixes for the problem have landed in 4.6-rc1. So is it possible
> for you to test that kernel?

As that problem happend in production, I'm unwilling to install an -rc kernel
there.
Currently I don't have the time to try to reproduce it in my development
environment, but that may change if that lockup happens again.
Can you be more specific on which change in 4.6-rc1 you're referencing?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107301] system hang during ext4 xattr operation
  2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
                   ` (26 preceding siblings ...)
  2016-04-12 12:52 ` bugzilla-daemon
@ 2016-04-13  9:07 ` bugzilla-daemon
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2016-04-13  9:07 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=107301

--- Comment #27 from Jan Kara <jack@suse.cz> ---
(In reply to Philipp Matthias Hahn from comment #26)
> Can you be more specific on which change in 4.6-rc1 you're referencing?

Well, it is a series of changes, commits:
f9a61eb4e2471c56a63cd804c7474128138c38ac
82939d7999dfc1f1998c4b1c12e2f19edbdff272
be0726d33cb8f411945884664924bed3cb8c70ee
ecd1e64412d5242b8afdef58a714bab3c5464f79
c2f3140fe2eceb3a6c1615b2648b9471544881c6
f0c8b46238db9d51ef9ea0858259958d0c601cec
7a2508e1b657cfc7e1371550f88c7a7bc4288f32
2335d05f3a83f5290ec28c1ed30c1c742a37edc9
dc8d5e565f00c9442fa1cbf9acc115475628527c
3fd164629d25b04f291a79a013dcc7ce1a301269
6048c64b26097a0ffbd966866b599f990e674e9b

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-04-13  9:07 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-05 13:54 [Bug 107301] New: system hang during ext4 xattr operation bugzilla-daemon
2015-11-05 15:45 ` [Bug 107301] " bugzilla-daemon
2015-11-05 17:02 ` bugzilla-daemon
2015-11-05 17:41 ` bugzilla-daemon
2015-11-06  2:43 ` bugzilla-daemon
2015-11-06 20:29 ` bugzilla-daemon
2015-11-07 15:24 ` bugzilla-daemon
2015-11-07 17:58 ` bugzilla-daemon
2015-11-09  8:50 ` bugzilla-daemon
2015-11-09  9:24 ` bugzilla-daemon
2015-11-09 10:11 ` bugzilla-daemon
2015-11-09 14:31 ` bugzilla-daemon
2015-11-09 15:22 ` bugzilla-daemon
2015-11-09 18:13 ` bugzilla-daemon
2015-11-10 11:16 ` bugzilla-daemon
2015-11-11  2:37 ` bugzilla-daemon
2015-11-11 10:28 ` bugzilla-daemon
2015-11-11 11:37 ` bugzilla-daemon
2015-11-11 14:53 ` bugzilla-daemon
2015-11-18 20:48 ` bugzilla-daemon
2015-11-19  0:49 ` bugzilla-daemon
2015-11-19 15:11 ` bugzilla-daemon
2015-12-10  2:51 ` bugzilla-daemon
2015-12-10 16:51 ` bugzilla-daemon
2016-03-31 13:53 ` bugzilla-daemon
2016-04-08 10:09 ` bugzilla-daemon
2016-04-08 20:46 ` bugzilla-daemon
2016-04-12 12:52 ` bugzilla-daemon
2016-04-13  9:07 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.