linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Internal error xfs_trans_cancel
@ 2016-06-01  5:52 Daniel Wagner
  2016-06-01  7:10 ` Dave Chinner
  2016-06-14  4:29 ` Josh Poimboeuf
  0 siblings, 2 replies; 12+ messages in thread
From: Daniel Wagner @ 2016-06-01  5:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, xfs

Hi,

I got the error message below while compiling a kernel 
on that system. I can't really say if I did something
which made the file system unhappy before the crash.


[Jun 1 07:41] XFS (sde1): Internal error xfs_trans_cancel at line 984 of file fs/xfs/xfs_trans.c.  Caller xfs_rename+0x453/0x960 [xfs]
[  +0.000095] CPU: 22 PID: 8640 Comm: gcc Not tainted 4.7.0-rc1 #16
[  +0.000035] Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
[  +0.000048]  0000000000000286 00000000c8be6bc3 ffff885fa9473cb0 ffffffff813d146e
[  +0.000056]  ffff885fa9ac5ed0 0000000000000001 ffff885fa9473cc8 ffffffffa0213cdc
[  +0.000053]  ffffffffa02257b3 ffff885fa9473cf0 ffffffffa022eb36 ffff883faa502d00
[  +0.000053] Call Trace:
[  +0.000028]  [<ffffffff813d146e>] dump_stack+0x63/0x85
[  +0.000069]  [<ffffffffa0213cdc>] xfs_error_report+0x3c/0x40 [xfs]
[  +0.000065]  [<ffffffffa02257b3>] ? xfs_rename+0x453/0x960 [xfs]
[  +0.000064]  [<ffffffffa022eb36>] xfs_trans_cancel+0xb6/0xe0 [xfs]
[  +0.000065]  [<ffffffffa02257b3>] xfs_rename+0x453/0x960 [xfs]
[  +0.000062]  [<ffffffffa021fa63>] xfs_vn_rename+0xb3/0xf0 [xfs]
[  +0.000040]  [<ffffffff8124f92c>] vfs_rename+0x58c/0x8d0
[  +0.000032]  [<ffffffff81253fb1>] SyS_rename+0x371/0x390
[  +0.000036]  [<ffffffff817d2032>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[  +0.000040] XFS (sde1): xfs_do_force_shutdown(0x8) called from line 985 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffffa022eb4f
[  +0.027680] XFS (sde1): Corruption of in-memory data detected.  Shutting down filesystem
[  +0.000057] XFS (sde1): Please umount the filesystem and rectify the problem(s)
[Jun 1 07:42] XFS (sde1): xfs_log_force: error -5 returned.
[ +30.081016] XFS (sde1): xfs_log_force: error -5 returned.


cheers,
daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-01  5:52 Internal error xfs_trans_cancel Daniel Wagner
@ 2016-06-01  7:10 ` Dave Chinner
  2016-06-01 13:50   ` Daniel Wagner
  2016-06-14  4:29 ` Josh Poimboeuf
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2016-06-01  7:10 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-fsdevel, linux-kernel, xfs

On Wed, Jun 01, 2016 at 07:52:31AM +0200, Daniel Wagner wrote:
> Hi,
> 
> I got the error message below while compiling a kernel 
> on that system. I can't really say if I did something
> which made the file system unhappy before the crash.
> 
> 
> [Jun 1 07:41] XFS (sde1): Internal error xfs_trans_cancel at line 984 of file fs/xfs/xfs_trans.c.  Caller xfs_rename+0x453/0x960 [xfs]

Anything in the log before this?

> [  +0.000095] CPU: 22 PID: 8640 Comm: gcc Not tainted 4.7.0-rc1 #16
> [  +0.000035] Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
> [  +0.000048]  0000000000000286 00000000c8be6bc3 ffff885fa9473cb0 ffffffff813d146e
> [  +0.000056]  ffff885fa9ac5ed0 0000000000000001 ffff885fa9473cc8 ffffffffa0213cdc
> [  +0.000053]  ffffffffa02257b3 ffff885fa9473cf0 ffffffffa022eb36 ffff883faa502d00
> [  +0.000053] Call Trace:
> [  +0.000028]  [<ffffffff813d146e>] dump_stack+0x63/0x85
> [  +0.000069]  [<ffffffffa0213cdc>] xfs_error_report+0x3c/0x40 [xfs]
> [  +0.000065]  [<ffffffffa02257b3>] ? xfs_rename+0x453/0x960 [xfs]
> [  +0.000064]  [<ffffffffa022eb36>] xfs_trans_cancel+0xb6/0xe0 [xfs]
> [  +0.000065]  [<ffffffffa02257b3>] xfs_rename+0x453/0x960 [xfs]
> [  +0.000062]  [<ffffffffa021fa63>] xfs_vn_rename+0xb3/0xf0 [xfs]
> [  +0.000040]  [<ffffffff8124f92c>] vfs_rename+0x58c/0x8d0
> [  +0.000032]  [<ffffffff81253fb1>] SyS_rename+0x371/0x390
> [  +0.000036]  [<ffffffff817d2032>] entry_SYSCALL_64_fastpath+0x1a/0xa4
> [  +0.000040] XFS (sde1): xfs_do_force_shutdown(0x8) called from line 985 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffffa022eb4f
> [  +0.027680] XFS (sde1): Corruption of in-memory data detected.  Shutting down filesystem
> [  +0.000057] XFS (sde1): Please umount the filesystem and rectify the problem(s)
> [Jun 1 07:42] XFS (sde1): xfs_log_force: error -5 returned.
> [ +30.081016] XFS (sde1): xfs_log_force: error -5 returned.

Doesn't normally happen, and there's not a lot to go on here. Can
you provide the info listed in the link below so we have some idea
of what configuration the error occurred on?

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

You didn't run out of space or something unusual like that?  Does
'xfs_repair -n <dev>' report any errors?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-01  7:10 ` Dave Chinner
@ 2016-06-01 13:50   ` Daniel Wagner
  2016-06-01 14:13     ` Daniel Wagner
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Wagner @ 2016-06-01 13:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, xfs


On 06/01/2016 09:10 AM, Dave Chinner wrote:
> On Wed, Jun 01, 2016 at 07:52:31AM +0200, Daniel Wagner wrote:
>> I got the error message below while compiling a kernel 
>> on that system. I can't really say if I did something
>> which made the file system unhappy before the crash.
>>
>>
>> [Jun 1 07:41] XFS (sde1): Internal error xfs_trans_cancel at line 984 of file fs/xfs/xfs_trans.c.  Caller xfs_rename+0x453/0x960 [xfs]
> 
> Anything in the log before this?

Just the usual stuff, as I remember. Sorry, I haven't copied the whole log.
 
>> [  +0.000095] CPU: 22 PID: 8640 Comm: gcc Not tainted 4.7.0-rc1 #16
>> [  +0.000035] Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
>> [  +0.000048]  0000000000000286 00000000c8be6bc3 ffff885fa9473cb0 ffffffff813d146e
>> [  +0.000056]  ffff885fa9ac5ed0 0000000000000001 ffff885fa9473cc8 ffffffffa0213cdc
>> [  +0.000053]  ffffffffa02257b3 ffff885fa9473cf0 ffffffffa022eb36 ffff883faa502d00
>> [  +0.000053] Call Trace:
>> [  +0.000028]  [<ffffffff813d146e>] dump_stack+0x63/0x85
>> [  +0.000069]  [<ffffffffa0213cdc>] xfs_error_report+0x3c/0x40 [xfs]
>> [  +0.000065]  [<ffffffffa02257b3>] ? xfs_rename+0x453/0x960 [xfs]
>> [  +0.000064]  [<ffffffffa022eb36>] xfs_trans_cancel+0xb6/0xe0 [xfs]
>> [  +0.000065]  [<ffffffffa02257b3>] xfs_rename+0x453/0x960 [xfs]
>> [  +0.000062]  [<ffffffffa021fa63>] xfs_vn_rename+0xb3/0xf0 [xfs]
>> [  +0.000040]  [<ffffffff8124f92c>] vfs_rename+0x58c/0x8d0
>> [  +0.000032]  [<ffffffff81253fb1>] SyS_rename+0x371/0x390
>> [  +0.000036]  [<ffffffff817d2032>] entry_SYSCALL_64_fastpath+0x1a/0xa4
>> [  +0.000040] XFS (sde1): xfs_do_force_shutdown(0x8) called from line 985 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffffa022eb4f
>> [  +0.027680] XFS (sde1): Corruption of in-memory data detected.  Shutting down filesystem
>> [  +0.000057] XFS (sde1): Please umount the filesystem and rectify the problem(s)
>> [Jun 1 07:42] XFS (sde1): xfs_log_force: error -5 returned.
>> [ +30.081016] XFS (sde1): xfs_log_force: error -5 returned.
> 
> Doesn't normally happen, and there's not a lot to go on here.

Restarted the box and did a couple of kernel builds and
everything was fine.

> Can
> you provide the info listed in the link below so we have some idea
> of what configuration the error occurred on?

Sure, forgot that in the first post.

> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

# uname -r
4.7.0-rc1-00003-g1f55b0d                                                                                                                                                    

# xfs_repair -V
xfs_repair version 4.5.0 

# cat /proc/cpuinfo | grep CPU | wc -l
64

# cat /proc/meminfo 
MemTotal:       528344752 kB
MemFree:        526838036 kB
MemAvailable:   525265612 kB
Buffers:            2716 kB
Cached:           216896 kB
SwapCached:            0 kB
Active:           119924 kB
Inactive:         116552 kB
Active(anon):      17416 kB
Inactive(anon):     1108 kB
Active(file):     102508 kB
Inactive(file):   115444 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         16972 kB
Mapped:            25288 kB
Shmem:              1616 kB
Slab:             184920 kB
SReclaimable:      60028 kB
SUnreclaim:       124892 kB
KernelStack:       13120 kB
PageTables:         2292 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    264172376 kB
Committed_AS:     270612 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      232256 kB
DirectMap2M:     7061504 kB
DirectMap1G:    531628032 kB

# cat /proc/mounts 
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=264153644k,nr_inodes=66038411,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
configfs /sys/kernel/config configfs rw,relatime 0 0
/dev/sda2 / xfs rw,relatime,attr2,inode64,noquota 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
tmpfs /tmp tmpfs rw 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
/dev/sdc1 /mnt/sdc1 xfs rw,relatime,attr2,inode64,noquota 0 0
/dev/sda1 /boot ext4 rw,relatime,data=ordered 0 0
/dev/sde2 /mnt/yocto xfs rw,relatime,attr2,inode64,noquota 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
tmpfs /run/user/0 tmpfs rw,nosuid,nodev,relatime,size=52834476k,mode=700 0 0
/dev/sde1 /home xfs rw,relatime,attr2,inode64,noquota 0 0

# cat /proc/partitions 
major minor  #blocks  name

  11        0    1048575 sr0
   8       64  249430016 sde
   8       65  104857600 sde1
   8       66  144571375 sde2
   8       48  142737408 sdd
   8       16  142737408 sdb
   8       32  142737408 sdc
   8       33  142736367 sdc1
   8        0  142737408 sda
   8        1    5120000 sda1
   8        2  104857600 sda2

No RAID
No LVM

HDD (sda, sdb, sdc, sdd):

Manufacturer 	TOSHIBA
Product ID 	MK1401GRRB 

SSD (sde):

Manufacturer	Samsung
Product ID 	Samsung SSD 850
Revision 	1B6Q

# hdparm -I /dev/sde 

/dev/sde:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0d 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ATA device, with non-removable media
Standards:
        Likely used: 1
Configuration:
        Logical         max     current
        cylinders       0       0
        heads           0       0
        sectors/track   0       0
        --
        Logical/Physical Sector size:           512 bytes
        device size with M = 1024*1024:           0 MBytes
        device size with M = 1000*1000:           0 MBytes 
        cache/buffer size  = unknown
Capabilities:
        IORDY not likely
        Cannot perform double-word IO
        R/W multiple sector transfer: not supported
        DMA: not supported
        PIO: pio0 

# xfs_info /dev/sde1
meta-data=/dev/sde1              isize=256    agcount=4, agsize=6553600 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=12800, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0




> You didn't run out of space or something unusual like that? 

It should have enough space for building the kernel. I haven't
expierenced any problems with that disk or partition in the
last half year. It's my test box, so it gets exposed to
many -rc kernels and test patches. I've never seen any
problems in xfs so far.

Filesystem      Size  Used Avail Use% Mounted on
/dev/sde1       100G   72G   29G  72% /home

> Does 'xfs_repair -n <dev>' report any errors?

# xfs_repair -n /dev/sde1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...                                                                                                                                                                                                                                          
        - scan (but don't clear) agi unlinked lists...                                                                                                                                                                                                            
        - process known inodes and perform inode discovery...                                                                                                                                                                                                     
        - agno = 0                                                                                                                                                                                                                                                
        - agno = 1                                                                                                                                                                                                                                                
        - agno = 2                                                                                                                                                                                                                                                
        - agno = 3                                                                                                                                                                                                                                                
        - process newly discovered inodes...                                                                                                                                                                                                                      
Phase 4 - check for duplicate blocks...                                                                                                                                                                                                                           
        - setting up duplicate extent list...                                                                                                                                                                                                                     
        - check for inodes claiming duplicate blocks...                                                                                                                                                                                                           
        - agno = 0                                                                                                                                                                                                                                                
        - agno = 3                                                                                                                                                                                                                                                
        - agno = 2                                                                                                                                                                                                                                                
        - agno = 1                                                                                                                                                                                                                                                
No modify flag set, skipping phase 5                                                                                                                                                                                                                              
Phase 6 - check inode connectivity...                                                                                                                                                                                                                             
        - traversing filesystem ...                                                                                                                                                                                                                               
        - traversal finished ...                                                                                                                                                                                                                                  
        - moving disconnected inodes to lost+found ...                                                                                                                                                                                                            
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

cheers,
daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-01 13:50   ` Daniel Wagner
@ 2016-06-01 14:13     ` Daniel Wagner
  2016-06-01 14:19       ` Daniel Wagner
  2016-06-02  0:26       ` Dave Chinner
  0 siblings, 2 replies; 12+ messages in thread
From: Daniel Wagner @ 2016-06-01 14:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, xfs

[-- Attachment #1: Type: text/plain, Size: 8297 bytes --]

>> Anything in the log before this?
> 
> Just the usual stuff, as I remember. Sorry, I haven't copied the whole log.

Just triggered it again. My steps for it are:

- run all lockperf test

  git://git.samba.org/jlayton/lockperf.git

  via my test script:
 
#!/bin/sh                                                                                                                                                                   
                                                                                                                                                                             
run_tests () {                                                                                                                                                              
    echo $1                                                                                                                                                                 
                                                                                                                                                                             
    for i in `seq 10`;                                                                                                                                                      
    do                                                                                                                                                                      
        rm -rf /tmp/a;                                                                                                                                                      
        $1 /tmp/a > /dev/null                                                                                                                                               
        sync                                                                                                                                                                
    done                                                                                                                                                                    
                                                                                                                                                                             
    for i in `seq 100`;                                                                                                                                                     
    do                                                                                                                                                                      
        rm -rf /tmp/a;                                                                                                                                                      
        $1 /tmp/a >> $2                                                                                                                                                     
        sync                                                                                                                                                                
    done                                                                                                                                                                    
}                                                                                                                                                                           
                                                                                                                                                                             
                                                                                                                                                                             
PATH=~/src/lockperf:$PATH                                                                                                                                                   
                                                                                                                                                                             
DIR=$1-`uname -r`                                                                                                                                                           
if [ ! -d "$DIR" ]; then                                                                                                                                                    
    mkdir $DIR                                                                                                                                                              
fi                                                                                                                                                                          
                                                                                                                                                                             
CPUSET=`cat /sys/devices/system/node/node0/cpulist`                                                                                                                         
taskset -pc $CPUSET $$                                                                                                                                                      
                                                                                                                                                                             
sudo sh -c 'for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done'                                                               
                                                                                                                                                                             
for c in `seq 8 32 128`; do                                                                                                                                                 
    for l in `seq 100 100 500`; do                                                                                                                                          
        time run_tests "posix01 -n $c -l $l "     $DIR/posix01-$c-$l.data                                                                                                   
        time run_tests "posix02 -n $c -l $l "     $DIR/posix02-$c-$l.data                                                                                                   
        time run_tests "posix03 -n $c -l $l "     $DIR/posix03-$c-$l.data                                                                                                   
        time run_tests "posix04 -n $c -l $l "     $DIR/posix04-$c-$l.data                                                                                                   
        time run_tests "flock01 -n $c -l $l "     $DIR/flock01-$c-$l.data                                                                                                   
        time run_tests "flock02 -n $c -l $l "     $DIR/flock02-$c-$l.data                                                                                                   
        time run_tests "lease01 -n $c -l $l "     $DIR/lease01-$c-$l.data                                                                                                   
        time run_tests "lease02 -n $c -l $l "     $DIR/lease02-$c-$l.data                                                                                                   
    done                                                                                                                                                                    
done                                                                                                                                                                        

sudo sh -c 'for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo powersave > $i ; done'

And after that rebuild a new kernel. That was all.

This time I saved the logs. xfs_repair was not so happy either.

cheers,
daniel




[-- Attachment #2: dmesg.log.xz --]
[-- Type: application/x-xz, Size: 19596 bytes --]

[-- Attachment #3: xfs_repair.log.xz --]
[-- Type: application/x-xz, Size: 16412 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-01 14:13     ` Daniel Wagner
@ 2016-06-01 14:19       ` Daniel Wagner
  2016-06-02  0:26       ` Dave Chinner
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Wagner @ 2016-06-01 14:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, xfs

>   via my test script:

Looks like my email client did not agree with my formatting of the script.

https://www.monom.org/data/lglock/run-tests.sh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-01 14:13     ` Daniel Wagner
  2016-06-01 14:19       ` Daniel Wagner
@ 2016-06-02  0:26       ` Dave Chinner
  2016-06-02  5:23         ` Daniel Wagner
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2016-06-02  0:26 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-fsdevel, linux-kernel, xfs

On Wed, Jun 01, 2016 at 04:13:10PM +0200, Daniel Wagner wrote:
> >> Anything in the log before this?
> > 
> > Just the usual stuff, as I remember. Sorry, I haven't copied the whole log.
> 
> Just triggered it again. My steps for it are:
> 
> - run all lockperf test
> 
>   git://git.samba.org/jlayton/lockperf.git
> 
>   via my test script:
>  
> #!/bin/sh
>
> run_tests () {
.....
> for c in `seq 8 32 128`; do
>     for l in `seq 100 100 500`; do
>         time run_tests "posix01 -n $c -l $l "     $DIR/posix01-$c-$l.data
>         time run_tests "posix02 -n $c -l $l "     $DIR/posix02-$c-$l.data
>         time run_tests "posix03 -n $c -l $l "     $DIR/posix03-$c-$l.data
>         time run_tests "posix04 -n $c -l $l "     $DIR/posix04-$c-$l.data

posix03 and posix04 just emit error messages:

posix04 -n 40 -l 100
posix04: invalid option -- 'l'
posix04: Usage: posix04 [-i iterations] [-n nr_children] [-s] <filename>
.....


So I changed them to run "-i $l" instead, and that has a somewhat
undesired effect:

static void
kill_children()
{
        siginfo_t       infop;

        signal(SIGINT, SIG_IGN);
>>>>>   kill(0, SIGINT);
        while (waitid(P_ALL, 0, &infop, WEXITED) != -1);
}

Yeah, it sends a SIGINT to everything with a process group id. It
kills the parent shell:

$ ./run-lockperf-tests.sh /mnt/scratch/
pid 9597's current affinity list: 0-15
pid 9597's new affinity list: 0,4,8,12
sh: 1: cannot create /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor: Directory nonexistent
posix01 -n 8 -l 100
posix02 -n 8 -l 100
posix03 -n 8 -i 100

$

So, I've just removed those tests from your script. I'll see if I
have any luck with reproducing the problem now.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-02  0:26       ` Dave Chinner
@ 2016-06-02  5:23         ` Daniel Wagner
  2016-06-02  6:35           ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Wagner @ 2016-06-02  5:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, xfs

> posix03 and posix04 just emit error messages:
> 
> posix04 -n 40 -l 100
> posix04: invalid option -- 'l'
> posix04: Usage: posix04 [-i iterations] [-n nr_children] [-s] <filename>
> .....

I screwed that this up. I have patched my version of lockperf to make
all test using the same options names. Though forgot to send those
patches. Will do now.

In this case you can use use '-i' instead of '-l'.

> So I changed them to run "-i $l" instead, and that has a somewhat
> undesired effect:
> 
> static void
> kill_children()
> {
>         siginfo_t       infop;
> 
>         signal(SIGINT, SIG_IGN);
>>>>>>   kill(0, SIGINT);
>         while (waitid(P_ALL, 0, &infop, WEXITED) != -1);
> }
> 
> Yeah, it sends a SIGINT to everything with a process group id. It
> kills the parent shell:

Ah that rings a bell. I tuned the parameters so that I did not run into
this problem. I'll do patch for this one. It's pretty annoying.

> $ ./run-lockperf-tests.sh /mnt/scratch/
> pid 9597's current affinity list: 0-15
> pid 9597's new affinity list: 0,4,8,12
> sh: 1: cannot create /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor: Directory nonexistent
> posix01 -n 8 -l 100
> posix02 -n 8 -l 100
> posix03 -n 8 -i 100
> 
> $
> 
> So, I've just removed those tests from your script. I'll see if I
> have any luck with reproducing the problem now.

I was able to reproduce it again with the same steps.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-02  5:23         ` Daniel Wagner
@ 2016-06-02  6:35           ` Dave Chinner
  2016-06-02 13:29             ` Daniel Wagner
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2016-06-02  6:35 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-fsdevel, linux-kernel, xfs

On Thu, Jun 02, 2016 at 07:23:24AM +0200, Daniel Wagner wrote:
> > posix03 and posix04 just emit error messages:
> > 
> > posix04 -n 40 -l 100
> > posix04: invalid option -- 'l'
> > posix04: Usage: posix04 [-i iterations] [-n nr_children] [-s] <filename>
> > .....
> 
> I screwed that this up. I have patched my version of lockperf to make
> all test using the same options names. Though forgot to send those
> patches. Will do now.
> 
> In this case you can use use '-i' instead of '-l'.
> 
> > So I changed them to run "-i $l" instead, and that has a somewhat
> > undesired effect:
> > 
> > static void
> > kill_children()
> > {
> >         siginfo_t       infop;
> > 
> >         signal(SIGINT, SIG_IGN);
> >>>>>>   kill(0, SIGINT);
> >         while (waitid(P_ALL, 0, &infop, WEXITED) != -1);
> > }
> > 
> > Yeah, it sends a SIGINT to everything with a process group id. It
> > kills the parent shell:
> 
> Ah that rings a bell. I tuned the parameters so that I did not run into
> this problem. I'll do patch for this one. It's pretty annoying.
> 
> > $ ./run-lockperf-tests.sh /mnt/scratch/
> > pid 9597's current affinity list: 0-15
> > pid 9597's new affinity list: 0,4,8,12
> > sh: 1: cannot create /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor: Directory nonexistent
> > posix01 -n 8 -l 100
> > posix02 -n 8 -l 100
> > posix03 -n 8 -i 100
> > 
> > $
> > 
> > So, I've just removed those tests from your script. I'll see if I
> > have any luck with reproducing the problem now.
> 
> I was able to reproduce it again with the same steps.

Hmmm, Ok. I've been running the lockperf test and kernel builds all
day on a filesystem that is identical in shape and size to yours
(i.e. xfs_info output is the same) but I haven't reproduced it yet.
Is it possible to get a metadump image of your filesystem to see if
I can reproduce it on that?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-02  6:35           ` Dave Chinner
@ 2016-06-02 13:29             ` Daniel Wagner
  2016-06-26 12:16               ` Thorsten Leemhuis
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Wagner @ 2016-06-02 13:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, xfs

> Hmmm, Ok. I've been running the lockperf test and kernel builds all
> day on a filesystem that is identical in shape and size to yours
> (i.e. xfs_info output is the same) but I haven't reproduced it yet.

I don't know if that is important: I run the lockperf test and after
they have finished I do a kernel build.

> Is it possible to get a metadump image of your filesystem to see if
> I can reproduce it on that?

Sure, see private mail.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-01  5:52 Internal error xfs_trans_cancel Daniel Wagner
  2016-06-01  7:10 ` Dave Chinner
@ 2016-06-14  4:29 ` Josh Poimboeuf
  1 sibling, 0 replies; 12+ messages in thread
From: Josh Poimboeuf @ 2016-06-14  4:29 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Dave Chinner, linux-fsdevel, linux-kernel, xfs

On Wed, Jun 01, 2016 at 07:52:31AM +0200, Daniel Wagner wrote:
> Hi,
> 
> I got the error message below while compiling a kernel 
> on that system. I can't really say if I did something
> which made the file system unhappy before the crash.
> 
> 
> [Jun 1 07:41] XFS (sde1): Internal error xfs_trans_cancel at line 984 of file fs/xfs/xfs_trans.c.  Caller xfs_rename+0x453/0x960 [xfs]
> [  +0.000095] CPU: 22 PID: 8640 Comm: gcc Not tainted 4.7.0-rc1 #16
> [  +0.000035] Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
> [  +0.000048]  0000000000000286 00000000c8be6bc3 ffff885fa9473cb0 ffffffff813d146e
> [  +0.000056]  ffff885fa9ac5ed0 0000000000000001 ffff885fa9473cc8 ffffffffa0213cdc
> [  +0.000053]  ffffffffa02257b3 ffff885fa9473cf0 ffffffffa022eb36 ffff883faa502d00
> [  +0.000053] Call Trace:
> [  +0.000028]  [<ffffffff813d146e>] dump_stack+0x63/0x85
> [  +0.000069]  [<ffffffffa0213cdc>] xfs_error_report+0x3c/0x40 [xfs]
> [  +0.000065]  [<ffffffffa02257b3>] ? xfs_rename+0x453/0x960 [xfs]
> [  +0.000064]  [<ffffffffa022eb36>] xfs_trans_cancel+0xb6/0xe0 [xfs]
> [  +0.000065]  [<ffffffffa02257b3>] xfs_rename+0x453/0x960 [xfs]
> [  +0.000062]  [<ffffffffa021fa63>] xfs_vn_rename+0xb3/0xf0 [xfs]
> [  +0.000040]  [<ffffffff8124f92c>] vfs_rename+0x58c/0x8d0
> [  +0.000032]  [<ffffffff81253fb1>] SyS_rename+0x371/0x390
> [  +0.000036]  [<ffffffff817d2032>] entry_SYSCALL_64_fastpath+0x1a/0xa4
> [  +0.000040] XFS (sde1): xfs_do_force_shutdown(0x8) called from line 985 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffffa022eb4f
> [  +0.027680] XFS (sde1): Corruption of in-memory data detected.  Shutting down filesystem
> [  +0.000057] XFS (sde1): Please umount the filesystem and rectify the problem(s)
> [Jun 1 07:42] XFS (sde1): xfs_log_force: error -5 returned.
> [ +30.081016] XFS (sde1): xfs_log_force: error -5 returned.

I saw this today.  I was just building/installing kernels, rebooting,
running kexec, running perf.


[ 1359.005573] ------------[ cut here ]------------
[ 1359.010191] WARNING: CPU: 4 PID: 6031 at fs/inode.c:280 drop_nlink+0x3e/0x50
[ 1359.017231] Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_powerclamp coretemp kvm_intel kvm nfsd ipmi_ssif ipmi_devintf ipmi_si iTCO_wdt irqbypass iTCO_vendor_support ipmi_msghandler i7core_edac shpchp sg edac_core pcspkr wmi lpc_ich dcdbas mfd_core acpi_power_meter auth_rpcgss acpi_cpufreq nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom iw_cxgb3 ib_core mgag200 ata_generic pata_acpi i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mptsas scsi_transport_sas ata_piix mptscsih libata cxgb3 crc32c_intel i2c_core serio_raw mptbase bnx2 fjes mdio dm_mirror dm_region_hash dm_log dm_mod
[ 1359.088447] CPU: 4 PID: 6031 Comm: depmod Tainted: G          I     4.7.0-rc3+ #4
[ 1359.095911] Hardware name: Dell Inc. PowerEdge R410/0N051F, BIOS 1.11.0 07/20/2012
[ 1359.103461]  0000000000000286 00000000a0bc39d9 ffff8802143dfd18 ffffffff8134bb7f
[ 1359.110871]  0000000000000000 0000000000000000 ffff8802143dfd58 ffffffff8108b671
[ 1359.118280]  00000118575f7d13 ffff880222c9a6e8 ffff8803ec3874d8 ffff880428827000
[ 1359.125693] Call Trace:
[ 1359.128133]  [<ffffffff8134bb7f>] dump_stack+0x63/0x84
[ 1359.133259]  [<ffffffff8108b671>] __warn+0xd1/0xf0
[ 1359.138037]  [<ffffffff8108b7ad>] warn_slowpath_null+0x1d/0x20
[ 1359.143855]  [<ffffffff81238fde>] drop_nlink+0x3e/0x50
[ 1359.149017]  [<ffffffffa0327148>] xfs_droplink+0x28/0x60 [xfs]
[ 1359.154864]  [<ffffffffa0328c81>] xfs_remove+0x231/0x350 [xfs]
[ 1359.160682]  [<ffffffff812cd70a>] ? security_inode_permission+0x3a/0x60
[ 1359.167309]  [<ffffffffa03235e8>] xfs_vn_unlink+0x58/0xa0 [xfs]
[ 1359.173213]  [<ffffffff812d7e33>] ? selinux_inode_unlink+0x13/0x20
[ 1359.179379]  [<ffffffff8122b29a>] vfs_unlink+0xda/0x190
[ 1359.184590]  [<ffffffff8122df53>] do_unlinkat+0x263/0x2a0
[ 1359.189974]  [<ffffffff8122ea1b>] SyS_unlinkat+0x1b/0x30
[ 1359.195272]  [<ffffffff81003b12>] do_syscall_64+0x62/0x110
[ 1359.200743]  [<ffffffff816d7961>] entry_SYSCALL64_slow_path+0x25/0x25
[ 1359.207178] ---[ end trace 0d397afdaff9f340 ]---
[ 1359.211830] XFS (dm-0): Internal error xfs_trans_cancel at line 984 of file fs/xfs/xfs_trans.c.  Caller xfs_remove+0x1d1/0x350 [xfs]
[ 1359.223723] CPU: 4 PID: 6031 Comm: depmod Tainted: G        W I     4.7.0-rc3+ #4
[ 1359.231185] Hardware name: Dell Inc. PowerEdge R410/0N051F, BIOS 1.11.0 07/20/2012
[ 1359.238736]  0000000000000286 00000000a0bc39d9 ffff8802143dfd60 ffffffff8134bb7f
[ 1359.246147]  ffff8803ec3874d8 0000000000000001 ffff8802143dfd78 ffffffffa03176bb
[ 1359.253559]  ffffffffa0328c21 ffff8802143dfda0 ffffffffa03327a6 ffff880222e7e180
[ 1359.260969] Call Trace:
[ 1359.263407]  [<ffffffff8134bb7f>] dump_stack+0x63/0x84
[ 1359.268560]  [<ffffffffa03176bb>] xfs_error_report+0x3b/0x40 [xfs]
[ 1359.274755]  [<ffffffffa0328c21>] ? xfs_remove+0x1d1/0x350 [xfs]
[ 1359.280778]  [<ffffffffa03327a6>] xfs_trans_cancel+0xb6/0xe0 [xfs]
[ 1359.286973]  [<ffffffffa0328c21>] xfs_remove+0x1d1/0x350 [xfs]
[ 1359.292820]  [<ffffffffa03235e8>] xfs_vn_unlink+0x58/0xa0 [xfs]
[ 1359.298724]  [<ffffffff812d7e33>] ? selinux_inode_unlink+0x13/0x20
[ 1359.304890]  [<ffffffff8122b29a>] vfs_unlink+0xda/0x190
[ 1359.310100]  [<ffffffff8122df53>] do_unlinkat+0x263/0x2a0
[ 1359.315486]  [<ffffffff8122ea1b>] SyS_unlinkat+0x1b/0x30
[ 1359.320784]  [<ffffffff81003b12>] do_syscall_64+0x62/0x110
[ 1359.326256]  [<ffffffff816d7961>] entry_SYSCALL64_slow_path+0x25/0x25
[ 1359.332692] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 985 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffffa03327bf
[ 1360.461638] XFS (dm-0): Corruption of in-memory data detected.  Shutting down filesystem
[ 1360.469729] XFS (dm-0): Please umount the filesystem and rectify the problem(s)
[ 1360.595843] XFS (dm-0): xfs_log_force: error -5 returned.


# uname -a
Linux dell-per410-01.khw.lab.eng.bos.redhat.com 4.7.0-rc3+ #5 SMP Mon Jun 13 23:35:14 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

# xfs_repair -V
xfs_repair version 3.2.2

# cat /proc/cpuinfo | grep CPU | wc -l
16

# cat /proc/meminfo
MemTotal:       16415296 kB
MemFree:        15723380 kB
MemAvailable:   15796192 kB
Buffers:             964 kB
Cached:           350700 kB
SwapCached:            0 kB
Active:           248992 kB
Inactive:         223000 kB
Active(anon):     121176 kB
Inactive(anon):     8116 kB
Active(file):     127816 kB
Inactive(file):   214884 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8257532 kB
SwapFree:        8257532 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        120340 kB
Mapped:            40136 kB
Shmem:              8964 kB
Slab:              80092 kB
SReclaimable:      25208 kB
SUnreclaim:        54884 kB
KernelStack:        5872 kB
PageTables:         5468 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    16465180 kB
Committed_AS:     355084 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:     51200 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      177572 kB
DirectMap2M:    16586752 kB

# cat /proc/mounts
sysfs /sys sysfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,seclabel,nosuid,size=8161852k,nr_inodes=2040463,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,seclabel,nosuid,nodev 0 0
devpts /dev/pts devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
tmpfs /sys/fs/cgroup tmpfs ro,seclabel,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,seclabel,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/net_cls cgroup rw,nosuid,nodev,noexec,relatime,net_cls 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
configfs /sys/kernel/config configfs rw,relatime 0 0
/dev/mapper/rhel_dell--per410--01-root / xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
selinuxfs /sys/fs/selinux selinuxfs rw,relatime 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=28,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0
debugfs /sys/kernel/debug debugfs rw,seclabel,relatime 0 0
mqueue /dev/mqueue mqueue rw,seclabel,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,seclabel,relatime 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
/dev/mapper/rhel_dell--per410--01-home /home xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
/dev/sda1 /boot xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
tmpfs /run/user/0 tmpfs rw,seclabel,nosuid,nodev,relatime,size=1641532k,mode=700 0 0
tracefs /sys/kernel/debug/tracing tracefs rw,relatime 0 0

# cat /proc/partitions
major minor  #blocks  name

  11        0    1048575 sr0
   8       16  143374740 sdb
   8       17  143373312 sdb1
   8        0  143374740 sda
   8        1     512000 sda1
   8        2  142861312 sda2
 253        0   52428800 dm-0
 253        1    8257536 dm-1
 253        2  225480704 dm-2

# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               rhel_dell-per410-01
  PV Size               136.24 GiB / not usable 0   
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              34878
  Free PE               16
  Allocated PE          34862
  PV UUID               cTa6X3-dz3E-HmdE-bY1J-XEoo-USwY-dl2lRm
   
  --- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               rhel_dell-per410-01
  PV Size               136.73 GiB / not usable 0   
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              35003
  Free PE               0
  Allocated PE          35003
  PV UUID               ZzXTQx-9CN1-TaMu-UfrN-Usuz-aFvl-A6PKJS
   
# lvdisplay
  --- Logical volume ---
  LV Path                /dev/rhel_dell-per410-01/swap
  LV Name                swap
  VG Name                rhel_dell-per410-01
  LV UUID                E6Y5qQ-URKt-9wc6-3fRc-2wbZ-ev2n-IliB7s
  LV Write Access        read/write
  LV Creation host, time dell-per410-01.khw.lab.eng.bos.redhat.com, 2016-06-13 12:55:31 -0400
  LV Status              available
  # open                 2
  LV Size                7.88 GiB
  Current LE             2016
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:1
   
  --- Logical volume ---
  LV Path                /dev/rhel_dell-per410-01/home
  LV Name                home
  VG Name                rhel_dell-per410-01
  LV UUID                Zq6BIP-0Yem-3NAp-gJ5K-2c6Q-Zc67-mdp51k
  LV Write Access        read/write
  LV Creation host, time dell-per410-01.khw.lab.eng.bos.redhat.com, 2016-06-13 12:55:31 -0400
  LV Status              available
  # open                 1
  LV Size                215.04 GiB
  Current LE             55049
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:2
   
  --- Logical volume ---
  LV Path                /dev/rhel_dell-per410-01/root
  LV Name                root
  VG Name                rhel_dell-per410-01
  LV UUID                T4rKVg-cuiW-jc6c-grNW-DJQQ-mCt8-N8Ig5l
  LV Write Access        read/write
  LV Creation host, time dell-per410-01.khw.lab.eng.bos.redhat.com, 2016-06-13 12:55:35 -0400
  LV Status              available
  # open                 1
  LV Size                50.00 GiB
  Current LE             12800
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:0
   
# hdparm -i /dev/sda

/dev/sda:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 28 00 00 00 00 20 00 00 00 00 00 00 85 55 06 01 00 00 00 00 00 00 00 00 00
 HDIO_GET_IDENTITY failed: Invalid argument
# hdparm -i /dev/sdb

/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 28 00 00 00 00 20 00 00 00 00 00 00 85 55 06 01 00 00 00 00 00 00 00 00 00
 HDIO_GET_IDENTITY failed: Invalid argument

(I don't know anything about the disks, but I can try to find out if you
need it.)

# xfs_info /dev/mapper/rhel_dell--per410--01-root
meta-data=/dev/mapper/rhel_dell--per410--01-root isize=256    agcount=4, agsize=3276800 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=13107200, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=6400, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

-- 
Josh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-02 13:29             ` Daniel Wagner
@ 2016-06-26 12:16               ` Thorsten Leemhuis
  2016-06-26 15:13                 ` Daniel Wagner
  0 siblings, 1 reply; 12+ messages in thread
From: Thorsten Leemhuis @ 2016-06-26 12:16 UTC (permalink / raw)
  To: Daniel Wagner, Dave Chinner
  Cc: linux-fsdevel, linux-kernel, xfs, Josh Poimboeuf

On 02.06.2016 15:29, Daniel Wagner wrote:
>> Hmmm, Ok. I've been running the lockperf test and kernel builds all
>> day on a filesystem that is identical in shape and size to yours
>> (i.e. xfs_info output is the same) but I haven't reproduced it yet.
> I don't know if that is important: I run the lockperf test and after
> they have finished I do a kernel build.
> 
>> Is it possible to get a metadump image of your filesystem to see if
>> I can reproduce it on that?
> Sure, see private mail.

Dave, Daniel, what's the latest status on this issue? It made it to my
list of know 4.7 regressions after Christoph suggested it should be
listed. But this thread looks stalled, as afaics nothing happened for
three weeks apart from Josh (added to CC) mentioning he also saw it. Or
is this discussed elsewhere? Or fixed already?

Sincerely, your regression tracker for Linux 4.7 (http://bit.ly/28JRmJo)
 Thorsten

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Internal error xfs_trans_cancel
  2016-06-26 12:16               ` Thorsten Leemhuis
@ 2016-06-26 15:13                 ` Daniel Wagner
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Wagner @ 2016-06-26 15:13 UTC (permalink / raw)
  To: Thorsten Leemhuis, Dave Chinner
  Cc: linux-fsdevel, linux-kernel, xfs, Josh Poimboeuf

On 06/26/2016 02:16 PM, Thorsten Leemhuis wrote:
> On 02.06.2016 15:29, Daniel Wagner wrote:
>>> Hmmm, Ok. I've been running the lockperf test and kernel builds all
>>> day on a filesystem that is identical in shape and size to yours
>>> (i.e. xfs_info output is the same) but I haven't reproduced it yet.
>> I don't know if that is important: I run the lockperf test and after
>> they have finished I do a kernel build.
>>
>>> Is it possible to get a metadump image of your filesystem to see if
>>> I can reproduce it on that?
>> Sure, see private mail.
> 
> Dave, Daniel, what's the latest status on this issue? 

I had no time to do some more testing in last couple of weeks. Tomorrow
I'll try to reproduce it again, though last time I tried it couldn't
trigger it.

> It made it to my
> list of know 4.7 regressions after Christoph suggested it should be
> listed. But this thread looks stalled, as afaics nothing happened for
> three weeks apart from Josh (added to CC) mentioning he also saw it. Or
> is this discussed elsewhere? Or fixed already?

The discussion wandered over to the thread called 'crash in xfs in
current' and there are some instruction by Al what to do test

Message-ID: <20160622014253.GS12670@dastard>

cheers,
daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-06-26 15:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-01  5:52 Internal error xfs_trans_cancel Daniel Wagner
2016-06-01  7:10 ` Dave Chinner
2016-06-01 13:50   ` Daniel Wagner
2016-06-01 14:13     ` Daniel Wagner
2016-06-01 14:19       ` Daniel Wagner
2016-06-02  0:26       ` Dave Chinner
2016-06-02  5:23         ` Daniel Wagner
2016-06-02  6:35           ` Dave Chinner
2016-06-02 13:29             ` Daniel Wagner
2016-06-26 12:16               ` Thorsten Leemhuis
2016-06-26 15:13                 ` Daniel Wagner
2016-06-14  4:29 ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).