All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: kernel crashes on commit
@ 2015-01-20 21:45 Mkrtchyan, Tigran
  2015-01-21 10:04 ` Mkrtchyan, Tigran
  0 siblings, 1 reply; 13+ messages in thread
From: Mkrtchyan, Tigran @ 2015-01-20 21:45 UTC (permalink / raw)
  To: Weston Andros Adamson; +Cc: Linux NFS Mailing List

SSB3aWxsIGNoZWNrIHRvbW9ycm93IHdpdGggUkhFTCA2IGtlcm5lbCBhbmQgbGV0IHlvdSBrbm93
bi4KClRoYW5rcywKICBUaWdyYW5PbiBKYW4gMjAsIDIwMTUgOTo0MyBQTSwgV2VzdG9uIEFuZHJv
cyBBZGFtc29uIDxkcm9zQHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6Cj4KPgo+ID4gT24gSmFuIDIw
LCAyMDE1LCBhdCAyOjIyIFBNLCBNa3J0Y2h5YW4sIFRpZ3JhbiA8dGlncmFuLm1rcnRjaHlhbkBk
ZXN5LmRlPiB3cm90ZTogCj4gPiAKPiA+IEhpIERyb3MsIAo+ID4gCj4gPiBkbyB5b3UgcmVmZXIg
dG8gdGhpcyBjb21taXQgCj4gPiAKPiA+IGh0dHA6Ly9naXQubGludXgtbmZzLm9yZy8/cD1kcm9z
L2xpbnV4LW5mcy5naXQ7YT1jb21taXQ7aD1kMjAxYzRkZTUxOGMxZDYxN2FhMjE2NjY0ODY5ZmEz
MjlkNTYyZDdkID8gCj4KPiBZZXMsIHRoYXTigJlzIHRoZSBwYXRjaCBJIHdhcyB0YWxraW5nIGFi
b3V0LiBHb29kIGZpbmQsIEkgd2FzIGFib3V0IHRvIGdvIGxvb2tpbmcgZm9yIGl0LiAKPgo+IElz
IHRoYXQgcGF0Y2ggaW4gdGhlIGtlcm5lbHMgeW914oCZcmUgdGVzdGluZz8gCj4KPiAtZHJvcyAK
Pgo+ID4gLS0tLS0gT3JpZ2luYWwgTWVzc2FnZSAtLS0tLSAKPiA+PiBGcm9tOiAiV2VzdG9uIEFu
ZHJvcyBBZGFtc29uIiA8ZHJvc0BwcmltYXJ5ZGF0YS5jb20+IAo+ID4+IFRvOiAiVGlncmFuIE1r
cnRjaHlhbiIgPHRpZ3Jhbi5ta3J0Y2h5YW5AZGVzeS5kZT4gCj4gPj4gQ2M6ICJsaW51eC1uZnMg
bGlzdCIgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+IAo+ID4+IFNlbnQ6IFR1ZXNkYXksIEph
bnVhcnkgMjAsIDIwMTUgMzozNzo0OSBQTSAKPiA+PiBTdWJqZWN0OiBSZToga2VybmVsIGNyYXNo
ZXMgb24gY29tbWl0IAo+ID4gCj4gPj4+IE9uIEphbiAyMCwgMjAxNSwgYXQgOTowMCBBTSwgTWty
dGNoeWFuLCBUaWdyYW4gPHRpZ3Jhbi5ta3J0Y2h5YW5AZGVzeS5kZT4gd3JvdGU6IAo+ID4+PiAK
PiA+Pj4gCj4gPj4+IAo+ID4+PiBEZWFyIGZlbGxvd3MsIAo+ID4+PiAKPiA+Pj4gc2luY2Ugd2Ug
aGF2ZSBlbmFibGVkIGNvbW1pdCB0aHJvdWdoIERTIGNvZGUgd2UgCj4gPj4+IHBlcm1hbmVudGx5
IG9ic2VydmUga2VybmVsIGNyYXNoZXMgd2l0aCBSSEVMNi83IGFuZCB1YnVudHUgMTQuMDQ6IAo+
ID4+PiAKPiA+Pj4gCj4gPj4+IDwxPkJVRzogdW5hYmxlIHRvIGhhbmRsZSBrZXJuZWwgcGFnaW5n
IHJlcXVlc3QgYXQgMDAwMDAwMDBkYzM2NDkxMyAKPiA+Pj4gPDE+SVA6IFs8ZmZmZmZmZmZhMDJi
NDVkZj5dIG5mc19pbml0X2NvbW1pdCsweDFmLzB4ZjAgW25mc10gCj4gPj4+IDw0PlBHRCA2Mzkz
YWUwNjcgUFVEIDAgCj4gPj4+IDw0Pk9vcHM6IDAwMDAgWyMxXSBTTVAgCj4gPj4+IDw0Pmxhc3Qg
c3lzZnMgZmlsZTogL3N5cy9kZXZpY2VzL3N5c3RlbS9jcHUvb25saW5lIAo+ID4+PiA8ND5DUFUg
MSAKPiA+Pj4gPDQ+TW9kdWxlcyBsaW5rZWQgaW46IHZmYXQgZmF0IHVzYl9zdG9yYWdlIG1wdDNz
YXMgbXB0MnNhcyByYWlkX2NsYXNzIG1wdGN0bCAKPiA+Pj4gaXBtaV9kZXZpbnRmIGRlbGxfcmJ1
IG9wZW5hZnMoUCkoVSkgYXV0b2YgCj4gPj4+IHM0IG5mc19sYXlvdXRfbmZzdjQxX2ZpbGVzIG5m
cyBsb2NrZCBmc2NhY2hlIGF1dGhfcnBjZ3NzIG5mc19hY2wgc3VucnBjIGJvbmRpbmcgCj4gPj4+
IDgwMjFxIGdhcnAgc3RwIGxsYyBpcHY2IHBvd2VyX21ldGVyIGFjIAo+ID4+PiBwaV9pcG1pIGlw
bWlfc2kgaXBtaV9tc2doYW5kbGVyIGlUQ09fd2R0IGlUQ09fdmVuZG9yX3N1cHBvcnQgbWljcm9j
b2RlIGRjZGJhcyBzZyAKPiA+Pj4gYm54MiBscGNfaWNoIG1mZF9jb3JlIGk3Y29yZV9lZGFjIGVk
YSAKPiA+Pj4gY19jb3JlIGV4dDQgamJkMiBtYmNhY2hlIHNkX21vZCBjcmNfdDEwZGlmIHdtaSBw
YXRhX2FjcGkgYXRhX2dlbmVyaWMgYXRhX3BpaXggCj4gPj4+IG1wdHNhcyBtcHRzY3NpaCBtcHRi
YXNlIHNjc2lfdHJhbnNwb3J0X3MgCj4gPj4+IGFzIGRtX21pcnJvciBkbV9yZWdpb25faGFzaCBk
bV9sb2cgZG1fbW9kIFtsYXN0IHVubG9hZGVkOiBzY3NpX3dhaXRfc2Nhbl0gCj4gPj4+IDw0PiAK
PiA+Pj4gPDQ+UGlkOiAxODIwOSwgY29tbTogZmx1c2gtMDoxOSBUYWludGVkOiBQwqDCoMKgwqDC
oMKgwqDCoMKgwqAgLS0tLS0tLS0tLS0tLS0tIAo+ID4+PiAyLjYuMzItNTA0LjMuMy5lbDYueDg2
XzY0ICMxIERlbGwgSW5jLiBQb3dlckVkZ2UgTTYxMC8wTjU4Mk0gCj4gPj4+IDw0PlJJUDogMDAx
MDpbPGZmZmZmZmZmYTAyYjQ1ZGY+XcKgIFs8ZmZmZmZmZmZhMDJiNDVkZj5dIAo+ID4+PiBuZnNf
aW5pdF9jb21taXQrMHgxZi8weGYwIFtuZnNdIAo+ID4+PiA8ND5SU1A6IDAwMTg6ZmZmZjg4MDYz
OTg4ZGEzMMKgIEVGTEFHUzogMDAwMTAyNDYgCj4gPj4+IDw0PlJBWDogZmZmZjg4MDYzOTg4ZGI2
MCBSQlg6IGZmZmY4ODAwOWM0OTIwNDAgUkNYOiBmZmZmODgwNjM5ODhkYjMwIAo+ID4+PiA8ND5S
RFg6IDAwMDAwMDAwMDAwMDAwMDAgUlNJOiBmZmZmODgwNjM5ODhkYjYwIFJESTogMDAwMDAwMDBk
YzM2NDkwMyAKPiA+Pj4gPDQ+UkJQOiBmZmZmODgwNjM5ODhkYTQwIFIwODogZmZmZjg4MDYzOTg4
ZGE5MCBSMDk6IGY5YWEzN2ZhYTI1NGQ0MDQgCj4gPj4+IDw0PlIxMDogMDAwMDAwMDAwMDAwMDAx
MCBSMTE6IDAwMDAwMDAwMDAwMDAwMDAgUjEyOiAwMDAwMDAwMDAwMDAwMDAxIAo+ID4+PiA8ND5S
MTM6IGZmZmY4ODAzMzlmMzNhMDAgUjE0OiBmZmZmODgwNjM5ODhkYjMwIFIxNTogZmZmZjg4MDYz
OTg4ZDhjOCAKPiA+Pj4gPDQ+RlM6wqAgMDAwMDAwMDAwMDAwMDAwMCgwMDAwKSBHUzpmZmZmODgw
MDI4MjAwMDAwKDAwMDApIGtubEdTOjAwMDAwMDAwMDAwMDAwMDAgCj4gPj4+IDw0PkNTOsKgIDAw
MTAgRFM6IDAwMTggRVM6IDAwMTggQ1IwOiAwMDAwMDAwMDgwMDUwMDNiIAo+ID4+PiA8ND5DUjI6
IDAwMDAwMDAwZGMzNjQ5MTMgQ1IzOiAwMDAwMDAwNjM5ZmJiMDAwIENSNDogMDAwMDAwMDAwMDAw
MDdlMCAKPiA+Pj4gPDQ+RFIwOiAwMDAwMDAwMDAwMDAwMDAwIERSMTogMDAwMDAwMDAwMDAwMDAw
MCBEUjI6IDAwMDAwMDAwMDAwMDAwMDAgCj4gPj4+IDw0PkRSMzogMDAwMDAwMDAwMDAwMDAwMCBE
UjY6IDAwMDAwMDAwZmZmZjBmZjAgRFI3OiAwMDAwMDAwMDAwMDAwNDAwIAo+ID4+PiA8ND5Qcm9j
ZXNzIGZsdXNoLTA6MTkgKHBpZDogMTgyMDksIHRocmVhZGluZm8gZmZmZjg4MDYzOTg4YzAwMCwg
dGFzayAKPiA+Pj4gZmZmZjg4MDYzODM3YzA0MCkgCj4gPj4+IDw0PlN0YWNrOiAKPiA+Pj4gPDQ+
IDAwMDAwMDAwMDAwMDAwMDAgZmZmZjg4MDA5YzQ5MjA0MCBmZmZmODgwNjM5ODhkYWQwIGZmZmZm
ZmZmYTAzMWZkYjcgCj4gPj4+IDw0PjxkPiBmZmZmODgwNjM4MzdjNWY4IGZmZmY4ODA2Mzk4OGRh
OTAgZmZmZjg4MDBhNmUzNDYwMCBmZmZmODgwMzM3ZjJhOTUwIAo+ID4+PiA8ND48ZD4gZmZmZjg4
MDYzN2M5OTQ4OCAwMDAwMDAwMDM3ZjJhOTQwIGZmZmY4ODA2Mzk4OGRiNjAgMDAwMDAwMDAwMDAw
MDAwMCAKPiA+Pj4gPDQ+Q2FsbCBUcmFjZTogCj4gPj4+IDw0PiBbPGZmZmZmZmZmYTAzMWZkYjc+
XSBmaWxlbGF5b3V0X2NvbW1pdF9wYWdlbGlzdCsweDI3Ny8weDNjMCAKPiA+Pj4gW25mc19sYXlv
dXRfbmZzdjQxX2ZpbGVzXSAKPiA+Pj4gPDQ+IFs8ZmZmZmZmZmZhMDJiNjEzYj5dIG5mc19nZW5l
cmljX2NvbW1pdF9saXN0KzB4YWIvMHgxMDAgW25mc10gCj4gPj4+IDw0PiBbPGZmZmZmZmZmYTAy
YjYyN2M+XSBuZnNfY29tbWl0X2lub2RlKzB4ZWMvMHgxNTAgW25mc10gCj4gPj4+IDw0PiBbPGZm
ZmZmZmZmYTAyYjZhYWI+XSBuZnNfd3JpdGVfaW5vZGUrMHhhYi8weDEwMCBbbmZzXSAKPiA+Pj4g
PDQ+IFs8ZmZmZmZmZmY4MTFiYWVkYz5dIHdyaXRlYmFja19zaW5nbGVfaW5vZGUrMHgyMGMvMHgy
OTAgCj4gPj4+IDw0PiBbPGZmZmZmZmZmODExYmIxYWQ+XSB3cml0ZWJhY2tfc2JfaW5vZGVzKzB4
YmQvMHgxNzAgCj4gPj4+IDw0PiBbPGZmZmZmZmZmODExYmIzMGI+XSB3cml0ZWJhY2tfaW5vZGVz
X3diKzB4YWIvMHgxYjAgCj4gPj4+IDw0PiBbPGZmZmZmZmZmODExYmI3MDM+XSB3Yl93cml0ZWJh
Y2srMHgyZjMvMHg0MTAgCj4gPj4+IDw0PiBbPGZmZmZmZmZmODEwMGI5Y2U+XSA/IGNvbW1vbl9p
bnRlcnJ1cHQrMHhlLzB4MTMgCj4gPj4+IDw0PiBbPGZmZmZmZmZmODEwODgwNjI+XSA/IGRlbF90
aW1lcl9zeW5jKzB4MjIvMHgzMCAKPiA+Pj4gPDQ+IFs8ZmZmZmZmZmY4MTFiYjljNT5dIHdiX2Rv
X3dyaXRlYmFjaysweDFhNS8weDI0MCAKPiA+Pj4gPDQ+IFs8ZmZmZmZmZmY4MTFiYmFjMz5dIGJk
aV93cml0ZWJhY2tfdGFzaysweDYzLzB4MWIwIAo+ID4+PiA8ND4gWzxmZmZmZmZmZjgxMDllOTg3
Pl0gPyBiaXRfd2FpdHF1ZXVlKzB4MTcvMHhkMCAKPiA+Pj4gPDQ+IFs8ZmZmZmZmZmY4MTE0ODM2
MD5dID8gYmRpX3N0YXJ0X2ZuKzB4MC8weDEwMCAKPiA+Pj4gPDQ+IFs8ZmZmZmZmZmY4MTE0ODNl
Nj5dIGJkaV9zdGFydF9mbisweDg2LzB4MTAwIAo+ID4+PiA8ND4gWzxmZmZmZmZmZjgxMTQ4MzYw
Pl0gPyBiZGlfc3RhcnRfZm4rMHgwLzB4MTAwIAo+ID4+PiA8ND4gWzxmZmZmZmZmZjgxMDllNjZl
Pl0ga3RocmVhZCsweDllLzB4YzAgCj4gPj4+IDw0PiBbPGZmZmZmZmZmODEwMGMyMGE+XSBjaGls
ZF9yaXArMHhhLzB4MjAgCj4gPj4+IDw0PiBbPGZmZmZmZmZmODEwOWU1ZDA+XSA/IGt0aHJlYWQr
MHgwLzB4YzAgCj4gPj4+IDw0PiBbPGZmZmZmZmZmODEwMGMyMDA+XSA/IGNoaWxkX3JpcCsweDAv
MHgyMCAKPiA+Pj4gPDQ+Q29kZTogYzMgNjYgNjYgMmUgMGYgMWYgODQgMDAgMDAgMDAgMDAgMDAg
NTUgNDggODkgZTUgNTMgNDggODMgZWMgMDggMGYgMWYgNDQgCj4gPj4+IDAwIDAwIDQ4IDhiIDA2
IDQ4IDg5IGZiIDQ4IDhiIDc4IDE4IDQ4IDM5IGM2IDQ4IDhiIDdmIDQwIDw0OD4gOGIgN2YgMTAg
NzQgMmIgNGMgCj4gPj4+IDhiIDgzIGM4IDAxIDAwIDAwIDRjIDhiIDRlIDA4IDRjIDhkIDkzIGM4
IAo+ID4+PiA8MT5SSVDCoCBbPGZmZmZmZmZmYTAyYjQ1ZGY+XSBuZnNfaW5pdF9jb21taXQrMHgx
Zi8weGYwIFtuZnNdIAo+ID4+PiA8ND4gUlNQIDxmZmZmODgwNjM5ODhkYTMwPiAKPiA+Pj4gPDQ+
Q1IyOiAwMDAwMDAwMGRjMzY0OTEzIAo+ID4+PiAKPiA+Pj4gCj4gPj4+IEkgaGF2ZSB2bWNvcmUg
ZmlsZSBhcyB3ZWxsLCBzbyBsZXQgbWUga25vdyBpZiB5b3UgbmVlZCBzb21lIG1vcmUgaW5mb3Jt
YXRpb24uIAo+ID4+PiAKPiA+PiAKPiA+PiBIaSBUaWdyYW4hIAo+ID4+IAo+ID4+IEhhdmUgeW91
IHRyaWVkIGEgcmVjZW50IHVwc3RyZWFtIGtlcm5lbD8gSUlSQyBJIGZpeGVkIGEgc2VlbWluZyBz
aW1pbGFyIAo+ID4+IGZpbGVsYXlvdXQgCj4gPj4gY29tbWl0IGlzc3VlIG5vdCB0b28gbG9uZyBh
Z28uIAo+ID4+IAo+ID4+IFRoZSBmaWxlbGF5b3V0IGNvbW1pdCBwYXRoIHNlZW1zIHRvIGhhdmUg
YmVlbiBicm9rZW4gZm9yIGEgd2hpbGUgLSBtb3N0bHkgCj4gPj4gYmVjYXVzZSAKPiA+PiBhbGwg
dGhlIGZpbGVsYXlvdXQgc2VydmVycyAodGhhdCBJIGtub3cgb2YpIHVzZSBzdGFibGUgd3JpdGVz
LCBzbyB0aGF0IGNvZGUgcGF0aCAKPiA+PiB3ZW50IAo+ID4+IHVudGVzdGVkLi4uIAo+ID4+IAo+
ID4+IC1kcm9zIAo+Cg==

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-01-20 21:45 kernel crashes on commit Mkrtchyan, Tigran
@ 2015-01-21 10:04 ` Mkrtchyan, Tigran
  2015-01-21 15:20   ` Weston Andros Adamson
  0 siblings, 1 reply; 13+ messages in thread
From: Mkrtchyan, Tigran @ 2015-01-21 10:04 UTC (permalink / raw)
  To: Weston Andros Adamson; +Cc: Linux NFS Mailing List, Myklebust, Trond

Hi Dros,

after adopting patch for RHEL6 kernel, it works.
We have to push it into stable fixes. Do you know
the procedure?

Thanks,
   Tigran.

----- Original Message -----
> From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
> To: "Weston Andros Adamson" <dros@primarydata.com>
> Cc: "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>
> Sent: Tuesday, January 20, 2015 10:45:22 PM
> Subject: Re: kernel crashes on commit

> I will check tomorrow with RHEL 6 kernel and let you known.
> 
> Thanks,
>  TigranOn Jan 20, 2015 9:43 PM, Weston Andros Adamson <dros@primarydata.com>
>  wrote:
>>
>>
>> > On Jan 20, 2015, at 2:22 PM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
>> > 
>> > Hi Dros,
>> > 
>> > do you refer to this commit
>> > 
>> > http://git.linux-nfs.org/?p=dros/linux-nfs.git;a=commit;h=d201c4de518c1d617aa216664869fa329d562d7d
>> > ?
>>
>> Yes, that’s the patch I was talking about. Good find, I was about to go looking
>> for it.
>>
>> Is that patch in the kernels you’re testing?
>>
>> -dros
>>
>> > ----- Original Message -----
>> >> From: "Weston Andros Adamson" <dros@primarydata.com>
>> >> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>> >> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>
>> >> Sent: Tuesday, January 20, 2015 3:37:49 PM
>> >> Subject: Re: kernel crashes on commit
>> > 
>> >>> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
>> >>> 
>> >>> 
>> >>> 
>> >>> Dear fellows,
>> >>> 
>> >>> since we have enabled commit through DS code we
>> >>> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
>> >>> 
>> >>> 
>> >>> <1>BUG: unable to handle kernel paging request at 00000000dc364913
>> >>> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>> >>> <4>PGD 6393ae067 PUD 0
>> >>> <4>Oops: 0000 [#1] SMP
>> >>> <4>last sysfs file: /sys/devices/system/cpu/online
>> >>> <4>CPU 1
>> >>> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl
>> >>> ipmi_devintf dell_rbu openafs(P)(U) autof
>> >>> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding
>> >>> 8021q garp stp llc ipv6 power_meter ac
>> >>> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg
>> >>> bnx2 lpc_ich mfd_core i7core_edac eda
>> >>> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix
>> >>> mptsas mptscsih mptbase scsi_transport_s
>> >>> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
>> >>> <4>
>> >>> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------
>> >>> 2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
>> >>> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>]
>> >>> nfs_init_commit+0x1f/0xf0 [nfs]
>> >>> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
>> >>> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
>> >>> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
>> >>> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
>> >>> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
>> >>> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
>> >>> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
>> >>> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> >>> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
>> >>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> >>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> >>> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task
>> >>> ffff88063837c040)
>> >>> <4>Stack:
>> >>> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
>> >>> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
>> >>> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
>> >>> <4>Call Trace:
>> >>> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0
>> >>> [nfs_layout_nfsv41_files]
>> >>> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
>> >>> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
>> >>> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
>> >>> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
>> >>> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
>> >>> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
>> >>> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
>> >>> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
>> >>> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
>> >>> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
>> >>> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
>> >>> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
>> >>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>> >>> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
>> >>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>> >>> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
>> >>> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
>> >>> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
>> >>> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
>> >>> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44
>> >>> 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c
>> >>> 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
>> >>> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>> >>> <4> RSP <ffff88063988da30>
>> >>> <4>CR2: 00000000dc364913
>> >>> 
>> >>> 
>> >>> I have vmcore file as well, so let me know if you need some more information.
>> >>> 
>> >> 
>> >> Hi Tigran!
>> >> 
>> >> Have you tried a recent upstream kernel? IIRC I fixed a seeming similar
>> >> filelayout
>> >> commit issue not too long ago.
>> >> 
>> >> The filelayout commit path seems to have been broken for a while - mostly
>> >> because
>> >> all the filelayout servers (that I know of) use stable writes, so that code path
>> >> went
>> >> untested...
>> >> 
>> >> -dros
>>
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z���h����&���G���h�(�階�ݢj"���m�����z�ޖ���f���h���~�m�

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-01-21 10:04 ` Mkrtchyan, Tigran
@ 2015-01-21 15:20   ` Weston Andros Adamson
  2015-03-26  9:28     ` Benjamin Coddington
  0 siblings, 1 reply; 13+ messages in thread
From: Weston Andros Adamson @ 2015-01-21 15:20 UTC (permalink / raw)
  To: Tigran Mkrtchyan, Steve Dickson; +Cc: linux-nfs list, Trond Myklebust


> On Jan 21, 2015, at 5:04 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> 
> Hi Dros,
> 
> after adopting patch for RHEL6 kernel, it works.

Great!

> We have to push it into stable fixes. Do you know
> the procedure?

I normally bug Steve D ;)

-dros


> ----- Original Message -----
>> From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
>> To: "Weston Andros Adamson" <dros@primarydata.com>
>> Cc: "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>
>> Sent: Tuesday, January 20, 2015 10:45:22 PM
>> Subject: Re: kernel crashes on commit
> 
>> I will check tomorrow with RHEL 6 kernel and let you known.
>> 
>> Thanks,
>> TigranOn Jan 20, 2015 9:43 PM, Weston Andros Adamson <dros@primarydata.com>
>> wrote:
>>> 
>>> 
>>>> On Jan 20, 2015, at 2:22 PM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
>>>> 
>>>> Hi Dros,
>>>> 
>>>> do you refer to this commit
>>>> 
>>>> http://git.linux-nfs.org/?p=dros/linux-nfs.git;a=commit;h=d201c4de518c1d617aa216664869fa329d562d7d
>>>> ?
>>> 
>>> Yes, that’s the patch I was talking about. Good find, I was about to go looking
>>> for it.
>>> 
>>> Is that patch in the kernels you’re testing?
>>> 
>>> -dros
>>> 
>>>> ----- Original Message -----
>>>>> From: "Weston Andros Adamson" <dros@primarydata.com>
>>>>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>>>>> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>
>>>>> Sent: Tuesday, January 20, 2015 3:37:49 PM
>>>>> Subject: Re: kernel crashes on commit
>>>> 
>>>>>> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Dear fellows,
>>>>>> 
>>>>>> since we have enabled commit through DS code we
>>>>>> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
>>>>>> 
>>>>>> 
>>>>>> <1>BUG: unable to handle kernel paging request at 00000000dc364913
>>>>>> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>>>>>> <4>PGD 6393ae067 PUD 0
>>>>>> <4>Oops: 0000 [#1] SMP
>>>>>> <4>last sysfs file: /sys/devices/system/cpu/online
>>>>>> <4>CPU 1
>>>>>> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl
>>>>>> ipmi_devintf dell_rbu openafs(P)(U) autof
>>>>>> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding
>>>>>> 8021q garp stp llc ipv6 power_meter ac
>>>>>> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg
>>>>>> bnx2 lpc_ich mfd_core i7core_edac eda
>>>>>> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix
>>>>>> mptsas mptscsih mptbase scsi_transport_s
>>>>>> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
>>>>>> <4>
>>>>>> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------
>>>>>> 2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
>>>>>> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>]
>>>>>> nfs_init_commit+0x1f/0xf0 [nfs]
>>>>>> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
>>>>>> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
>>>>>> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
>>>>>> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
>>>>>> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
>>>>>> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
>>>>>> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
>>>>>> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>>>>> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
>>>>>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>>> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task
>>>>>> ffff88063837c040)
>>>>>> <4>Stack:
>>>>>> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
>>>>>> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
>>>>>> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
>>>>>> <4>Call Trace:
>>>>>> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0
>>>>>> [nfs_layout_nfsv41_files]
>>>>>> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
>>>>>> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
>>>>>> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
>>>>>> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
>>>>>> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
>>>>>> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
>>>>>> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
>>>>>> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
>>>>>> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
>>>>>> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
>>>>>> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
>>>>>> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
>>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>>>>>> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
>>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>>>>>> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
>>>>>> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
>>>>>> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
>>>>>> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
>>>>>> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44
>>>>>> 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c
>>>>>> 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
>>>>>> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>>>>>> <4> RSP <ffff88063988da30>
>>>>>> <4>CR2: 00000000dc364913
>>>>>> 
>>>>>> 
>>>>>> I have vmcore file as well, so let me know if you need some more information.
>>>>>> 
>>>>> 
>>>>> Hi Tigran!
>>>>> 
>>>>> Have you tried a recent upstream kernel? IIRC I fixed a seeming similar
>>>>> filelayout
>>>>> commit issue not too long ago.
>>>>> 
>>>>> The filelayout commit path seems to have been broken for a while - mostly
>>>>> because
>>>>> all the filelayout servers (that I know of) use stable writes, so that code path
>>>>> went
>>>>> untested...
>>>>> 
>>>>> -dros
>>> 
>> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z���h����&���G���h�(�階�ݢj"���m�����z�ޖ���f���h���~�m�


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-01-21 15:20   ` Weston Andros Adamson
@ 2015-03-26  9:28     ` Benjamin Coddington
  2015-03-26 10:11       ` Mkrtchyan, Tigran
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Coddington @ 2015-03-26  9:28 UTC (permalink / raw)
  To: Weston Andros Adamson
  Cc: Tigran Mkrtchyan, Steve Dickson, linux-nfs list, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 7065 bytes --]

On Wed, 21 Jan 2015, Weston Andros Adamson wrote:

>
> > On Jan 21, 2015, at 5:04 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> >
> > Hi Dros,
> >
> > after adopting patch for RHEL6 kernel, it works.
>
> Great!
>
> > We have to push it into stable fixes. Do you know
> > the procedure?
>
> I normally bug Steve D ;)

Or you can open a bug against RHEL6; it should get picked up quickly, now
that you've done all the work.

Ben

>
> > ----- Original Message -----
> >> From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
> >> To: "Weston Andros Adamson" <dros@primarydata.com>
> >> Cc: "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>
> >> Sent: Tuesday, January 20, 2015 10:45:22 PM
> >> Subject: Re: kernel crashes on commit
> >
> >> I will check tomorrow with RHEL 6 kernel and let you known.
> >>
> >> Thanks,
> >> TigranOn Jan 20, 2015 9:43 PM, Weston Andros Adamson <dros@primarydata.com>
> >> wrote:
> >>>
> >>>
> >>>> On Jan 20, 2015, at 2:22 PM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> >>>>
> >>>> Hi Dros,
> >>>>
> >>>> do you refer to this commit
> >>>>
> >>>> http://git.linux-nfs.org/?p=dros/linux-nfs.git;a=commit;h=d201c4de518c1d617aa216664869fa329d562d7d
> >>>> ?
> >>>
> >>> Yes, that’s the patch I was talking about. Good find, I was about to go looking
> >>> for it.
> >>>
> >>> Is that patch in the kernels you’re testing?
> >>>
> >>> -dros
> >>>
> >>>> ----- Original Message -----
> >>>>> From: "Weston Andros Adamson" <dros@primarydata.com>
> >>>>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> >>>>> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>
> >>>>> Sent: Tuesday, January 20, 2015 3:37:49 PM
> >>>>> Subject: Re: kernel crashes on commit
> >>>>
> >>>>>> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Dear fellows,
> >>>>>>
> >>>>>> since we have enabled commit through DS code we
> >>>>>> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
> >>>>>>
> >>>>>>
> >>>>>> <1>BUG: unable to handle kernel paging request at 00000000dc364913
> >>>>>> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> >>>>>> <4>PGD 6393ae067 PUD 0
> >>>>>> <4>Oops: 0000 [#1] SMP
> >>>>>> <4>last sysfs file: /sys/devices/system/cpu/online
> >>>>>> <4>CPU 1
> >>>>>> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl
> >>>>>> ipmi_devintf dell_rbu openafs(P)(U) autof
> >>>>>> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding
> >>>>>> 8021q garp stp llc ipv6 power_meter ac
> >>>>>> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg
> >>>>>> bnx2 lpc_ich mfd_core i7core_edac eda
> >>>>>> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix
> >>>>>> mptsas mptscsih mptbase scsi_transport_s
> >>>>>> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
> >>>>>> <4>
> >>>>>> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------
> >>>>>> 2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
> >>>>>> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>]
> >>>>>> nfs_init_commit+0x1f/0xf0 [nfs]
> >>>>>> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
> >>>>>> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
> >>>>>> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
> >>>>>> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
> >>>>>> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
> >>>>>> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
> >>>>>> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> >>>>>> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> >>>>>> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
> >>>>>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>>>>> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task
> >>>>>> ffff88063837c040)
> >>>>>> <4>Stack:
> >>>>>> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
> >>>>>> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
> >>>>>> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
> >>>>>> <4>Call Trace:
> >>>>>> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0
> >>>>>> [nfs_layout_nfsv41_files]
> >>>>>> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
> >>>>>> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
> >>>>>> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
> >>>>>> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
> >>>>>> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
> >>>>>> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
> >>>>>> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
> >>>>>> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
> >>>>>> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
> >>>>>> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
> >>>>>> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
> >>>>>> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
> >>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> >>>>>> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
> >>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> >>>>>> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
> >>>>>> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
> >>>>>> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
> >>>>>> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
> >>>>>> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44
> >>>>>> 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c
> >>>>>> 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
> >>>>>> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> >>>>>> <4> RSP <ffff88063988da30>
> >>>>>> <4>CR2: 00000000dc364913
> >>>>>>
> >>>>>>
> >>>>>> I have vmcore file as well, so let me know if you need some more information.
> >>>>>>
> >>>>>
> >>>>> Hi Tigran!
> >>>>>
> >>>>> Have you tried a recent upstream kernel? IIRC I fixed a seeming similar
> >>>>> filelayout
> >>>>> commit issue not too long ago.
> >>>>>
> >>>>> The filelayout commit path seems to have been broken for a while - mostly
> >>>>> because
> >>>>> all the filelayout servers (that I know of) use stable writes, so that code path
> >>>>> went
> >>>>> untested...
> >>>>>
> >>>>> -dros
> >>>
> >> N???????????????r??????y?????????b???X??????ǧv???^???)޺{.n???+????????????{?????????"??????^n???r?????????z?????????h????????????&?????????G?????????h???(???階???ݢj"?????????m???????????????z???ޖ?????????f?????????h?????????~???m???
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-03-26  9:28     ` Benjamin Coddington
@ 2015-03-26 10:11       ` Mkrtchyan, Tigran
  2015-03-26 12:00         ` Benjamin Coddington
  2015-06-26 12:33         ` Benjamin Coddington
  0 siblings, 2 replies; 13+ messages in thread
From: Mkrtchyan, Tigran @ 2015-03-26 10:11 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Weston Andros Adamson, Steve Dickson, linux-nfs list, Trond Myklebust

the bug was submitted and fixed in rhel 6.7

https://bugzilla.redhat.com/show_bug.cgi?id=3D1184394

I don't know about rhel7, which is affected as well.

Tigran.

----- Original Message -----
> From: "Benjamin Coddington" <bcodding@redhat.com>
> To: "Weston Andros Adamson" <dros@primarydata.com>
> Cc: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>, "Steve Dickson" <Steve=
D@redhat.com>, "linux-nfs list"
> <linux-nfs@vger.kernel.org>, "Trond Myklebust" <trond.myklebust@primaryda=
ta.com>
> Sent: Thursday, March 26, 2015 10:28:59 AM
> Subject: Re: kernel crashes on commit

> On Wed, 21 Jan 2015, Weston Andros Adamson wrote:
>=20
>>
>> > On Jan 21, 2015, at 5:04 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.=
de> wrote:
>> >
>> > Hi Dros,
>> >
>> > after adopting patch for RHEL6 kernel, it works.
>>
>> Great!
>>
>> > We have to push it into stable fixes. Do you know
>> > the procedure?
>>
>> I normally bug Steve D ;)
>=20
> Or you can open a bug against RHEL6; it should get picked up quickly, now
> that you've done all the work.
>=20
> Ben
>=20
>>
>> > ----- Original Message -----
>> >> From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
>> >> To: "Weston Andros Adamson" <dros@primarydata.com>
>> >> Cc: "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>
>> >> Sent: Tuesday, January 20, 2015 10:45:22 PM
>> >> Subject: Re: kernel crashes on commit
>> >
>> >> I will check tomorrow with RHEL 6 kernel and let you known.
>> >>
>> >> Thanks,
>> >> TigranOn Jan 20, 2015 9:43 PM, Weston Andros Adamson <dros@primarydat=
a.com>
>> >> wrote:
>> >>>
>> >>>
>> >>>> On Jan 20, 2015, at 2:22 PM, Mkrtchyan, Tigran <tigran.mkrtchyan@de=
sy.de> wrote:
>> >>>>
>> >>>> Hi Dros,
>> >>>>
>> >>>> do you refer to this commit
>> >>>>
>> >>>> http://git.linux-nfs.org/?p=3Ddros/linux-nfs.git;a=3Dcommit;h=3Dd20=
1c4de518c1d617aa216664869fa329d562d7d
>> >>>> ?
>> >>>
>> >>> Yes, that=E2=80=99s the patch I was talking about. Good find, I was =
about to go looking
>> >>> for it.
>> >>>
>> >>> Is that patch in the kernels you=E2=80=99re testing?
>> >>>
>> >>> -dros
>> >>>
>> >>>> ----- Original Message -----
>> >>>>> From: "Weston Andros Adamson" <dros@primarydata.com>
>> >>>>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>> >>>>> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>
>> >>>>> Sent: Tuesday, January 20, 2015 3:37:49 PM
>> >>>>> Subject: Re: kernel crashes on commit
>> >>>>
>> >>>>>> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@=
desy.de> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Dear fellows,
>> >>>>>>
>> >>>>>> since we have enabled commit through DS code we
>> >>>>>> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
>> >>>>>>
>> >>>>>>
>> >>>>>> <1>BUG: unable to handle kernel paging request at 00000000dc36491=
3
>> >>>>>> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>> >>>>>> <4>PGD 6393ae067 PUD 0
>> >>>>>> <4>Oops: 0000 [#1] SMP
>> >>>>>> <4>last sysfs file: /sys/devices/system/cpu/online
>> >>>>>> <4>CPU 1
>> >>>>>> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_c=
lass mptctl
>> >>>>>> ipmi_devintf dell_rbu openafs(P)(U) autof
>> >>>>>> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl =
sunrpc bonding
>> >>>>>> 8021q garp stp llc ipv6 power_meter ac
>> >>>>>> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support micr=
ocode dcdbas sg
>> >>>>>> bnx2 lpc_ich mfd_core i7core_edac eda
>> >>>>>> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_gene=
ric ata_piix
>> >>>>>> mptsas mptscsih mptbase scsi_transport_s
>> >>>>>> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wa=
it_scan]
>> >>>>>> <4>
>> >>>>>> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ------------=
---
>> >>>>>> 2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
>> >>>>>> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>]
>> >>>>>> nfs_init_commit+0x1f/0xf0 [nfs]
>> >>>>>> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
>> >>>>>> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988d=
b30
>> >>>>>> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364=
903
>> >>>>>> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d=
404
>> >>>>>> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000=
001
>> >>>>>> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d=
8c8
>> >>>>>> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:00=
00000000000000
>> >>>>>> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> >>>>>> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 0000000000000=
7e0
>> >>>>>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000=
000
>> >>>>>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000=
400
>> >>>>>> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, t=
ask
>> >>>>>> ffff88063837c040)
>> >>>>>> <4>Stack:
>> >>>>>> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa0=
31fdb7
>> >>>>>> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880=
337f2a950
>> >>>>>> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000=
000000000
>> >>>>>> <4>Call Trace:
>> >>>>>> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0
>> >>>>>> [nfs_layout_nfsv41_files]
>> >>>>>> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
>> >>>>>> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
>> >>>>>> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
>> >>>>>> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
>> >>>>>> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
>> >>>>>> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
>> >>>>>> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
>> >>>>>> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
>> >>>>>> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
>> >>>>>> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
>> >>>>>> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
>> >>>>>> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
>> >>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>> >>>>>> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
>> >>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>> >>>>>> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
>> >>>>>> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
>> >>>>>> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
>> >>>>>> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
>> >>>>>> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83=
 ec 08 0f 1f 44
>> >>>>>> 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b =
7f 10 74 2b 4c
>> >>>>>> 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
>> >>>>>> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>> >>>>>> <4> RSP <ffff88063988da30>
>> >>>>>> <4>CR2: 00000000dc364913
>> >>>>>>
>> >>>>>>
>> >>>>>> I have vmcore file as well, so let me know if you need some more =
information.
>> >>>>>>
>> >>>>>
>> >>>>> Hi Tigran!
>> >>>>>
>> >>>>> Have you tried a recent upstream kernel? IIRC I fixed a seeming si=
milar
>> >>>>> filelayout
>> >>>>> commit issue not too long ago.
>> >>>>>
>> >>>>> The filelayout commit path seems to have been broken for a while -=
 mostly
>> >>>>> because
>> >>>>> all the filelayout servers (that I know of) use stable writes, so =
that code path
>> >>>>> went
>> >>>>> untested...
>> >>>>>
>> >>>>> -dros
>> >>>
>> >> N???????????????r??????y?????????b???X??????=C7=A7v???^???)=DE=BA{.n?=
??+????????????{?????????"??????^n???r?????????z?????????h????????????&????=
?????G?????????h???(???=E9=9A=8E???=DD=A2j"?????????m???????????????z???=DE=
=96?????????f?????????h?????????~???m???
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-03-26 10:11       ` Mkrtchyan, Tigran
@ 2015-03-26 12:00         ` Benjamin Coddington
  2015-06-26 12:33         ` Benjamin Coddington
  1 sibling, 0 replies; 13+ messages in thread
From: Benjamin Coddington @ 2015-03-26 12:00 UTC (permalink / raw)
  To: Mkrtchyan, Tigran
  Cc: Weston Andros Adamson, Steve Dickson, linux-nfs list, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 8661 bytes --]

On Thu, 26 Mar 2015, Mkrtchyan, Tigran wrote:

> the bug was submitted and fixed in rhel 6.7
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1184394
>
> I don't know about rhel7, which is affected as well.

Ah! Thanks for pointing that out. RHEL7 is missing
d201c4d pnfs: fix race in filelayout commit path

Which will hopefully make it into 7.2; working on it here:
https://bugzilla.redhat.com/show_bug.cgi?id=1111712

We probably want it in earlier RHEL7 too.. I'll see about that.

Ben

> ----- Original Message -----
> > From: "Benjamin Coddington" <bcodding@redhat.com>
> > To: "Weston Andros Adamson" <dros@primarydata.com>
> > Cc: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>, "Steve Dickson" <SteveD@redhat.com>, "linux-nfs list"
> > <linux-nfs@vger.kernel.org>, "Trond Myklebust" <trond.myklebust@primarydata.com>
> > Sent: Thursday, March 26, 2015 10:28:59 AM
> > Subject: Re: kernel crashes on commit
>
> > On Wed, 21 Jan 2015, Weston Andros Adamson wrote:
> >
> >>
> >> > On Jan 21, 2015, at 5:04 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> >> >
> >> > Hi Dros,
> >> >
> >> > after adopting patch for RHEL6 kernel, it works.
> >>
> >> Great!
> >>
> >> > We have to push it into stable fixes. Do you know
> >> > the procedure?
> >>
> >> I normally bug Steve D ;)
> >
> > Or you can open a bug against RHEL6; it should get picked up quickly, now
> > that you've done all the work.
> >
> > Ben
> >
> >>
> >> > ----- Original Message -----
> >> >> From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
> >> >> To: "Weston Andros Adamson" <dros@primarydata.com>
> >> >> Cc: "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>
> >> >> Sent: Tuesday, January 20, 2015 10:45:22 PM
> >> >> Subject: Re: kernel crashes on commit
> >> >
> >> >> I will check tomorrow with RHEL 6 kernel and let you known.
> >> >>
> >> >> Thanks,
> >> >> TigranOn Jan 20, 2015 9:43 PM, Weston Andros Adamson <dros@primarydata.com>
> >> >> wrote:
> >> >>>
> >> >>>
> >> >>>> On Jan 20, 2015, at 2:22 PM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> >> >>>>
> >> >>>> Hi Dros,
> >> >>>>
> >> >>>> do you refer to this commit
> >> >>>>
> >> >>>> http://git.linux-nfs.org/?p=dros/linux-nfs.git;a=commit;h=d201c4de518c1d617aa216664869fa329d562d7d
> >> >>>> ?
> >> >>>
> >> >>> Yes, that’s the patch I was talking about. Good find, I was about to go looking
> >> >>> for it.
> >> >>>
> >> >>> Is that patch in the kernels you’re testing?
> >> >>>
> >> >>> -dros
> >> >>>
> >> >>>> ----- Original Message -----
> >> >>>>> From: "Weston Andros Adamson" <dros@primarydata.com>
> >> >>>>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> >> >>>>> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>
> >> >>>>> Sent: Tuesday, January 20, 2015 3:37:49 PM
> >> >>>>> Subject: Re: kernel crashes on commit
> >> >>>>
> >> >>>>>> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Dear fellows,
> >> >>>>>>
> >> >>>>>> since we have enabled commit through DS code we
> >> >>>>>> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> <1>BUG: unable to handle kernel paging request at 00000000dc364913
> >> >>>>>> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> >> >>>>>> <4>PGD 6393ae067 PUD 0
> >> >>>>>> <4>Oops: 0000 [#1] SMP
> >> >>>>>> <4>last sysfs file: /sys/devices/system/cpu/online
> >> >>>>>> <4>CPU 1
> >> >>>>>> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl
> >> >>>>>> ipmi_devintf dell_rbu openafs(P)(U) autof
> >> >>>>>> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding
> >> >>>>>> 8021q garp stp llc ipv6 power_meter ac
> >> >>>>>> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg
> >> >>>>>> bnx2 lpc_ich mfd_core i7core_edac eda
> >> >>>>>> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix
> >> >>>>>> mptsas mptscsih mptbase scsi_transport_s
> >> >>>>>> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
> >> >>>>>> <4>
> >> >>>>>> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------
> >> >>>>>> 2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
> >> >>>>>> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>]
> >> >>>>>> nfs_init_commit+0x1f/0xf0 [nfs]
> >> >>>>>> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
> >> >>>>>> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
> >> >>>>>> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
> >> >>>>>> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
> >> >>>>>> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
> >> >>>>>> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
> >> >>>>>> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> >> >>>>>> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> >> >>>>>> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
> >> >>>>>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> >>>>>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> >>>>>> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task
> >> >>>>>> ffff88063837c040)
> >> >>>>>> <4>Stack:
> >> >>>>>> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
> >> >>>>>> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
> >> >>>>>> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
> >> >>>>>> <4>Call Trace:
> >> >>>>>> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0
> >> >>>>>> [nfs_layout_nfsv41_files]
> >> >>>>>> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
> >> >>>>>> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
> >> >>>>>> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
> >> >>>>>> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
> >> >>>>>> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
> >> >>>>>> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
> >> >>>>>> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
> >> >>>>>> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
> >> >>>>>> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
> >> >>>>>> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
> >> >>>>>> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
> >> >>>>>> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
> >> >>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> >> >>>>>> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
> >> >>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> >> >>>>>> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
> >> >>>>>> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
> >> >>>>>> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
> >> >>>>>> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
> >> >>>>>> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44
> >> >>>>>> 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c
> >> >>>>>> 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
> >> >>>>>> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> >> >>>>>> <4> RSP <ffff88063988da30>
> >> >>>>>> <4>CR2: 00000000dc364913
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> I have vmcore file as well, so let me know if you need some more information.
> >> >>>>>>
> >> >>>>>
> >> >>>>> Hi Tigran!
> >> >>>>>
> >> >>>>> Have you tried a recent upstream kernel? IIRC I fixed a seeming similar
> >> >>>>> filelayout
> >> >>>>> commit issue not too long ago.
> >> >>>>>
> >> >>>>> The filelayout commit path seems to have been broken for a while - mostly
> >> >>>>> because
> >> >>>>> all the filelayout servers (that I know of) use stable writes, so that code path
> >> >>>>> went
> >> >>>>> untested...
> >> >>>>>
> >> >>>>> -dros
> >> >>>
> >> >> N???????????????r??????y?????????b???X??????ǧv???^???)޺{.n???+????????????{?????????"??????^n???r?????????z?????????h????????????&?????????G?????????h???(???階???ݢj"?????????m???????????????z???ޖ?????????f?????????h?????????~???m???
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-03-26 10:11       ` Mkrtchyan, Tigran
  2015-03-26 12:00         ` Benjamin Coddington
@ 2015-06-26 12:33         ` Benjamin Coddington
  1 sibling, 0 replies; 13+ messages in thread
From: Benjamin Coddington @ 2015-06-26 12:33 UTC (permalink / raw)
  To: Mkrtchyan, Tigran
  Cc: Weston Andros Adamson, Steve Dickson, linux-nfs list, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 8414 bytes --]

Fixed in RHEL7.1

Ben

On Thu, 26 Mar 2015, Mkrtchyan, Tigran wrote:

> the bug was submitted and fixed in rhel 6.7
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1184394
>
> I don't know about rhel7, which is affected as well.
>
> Tigran.
>
> ----- Original Message -----
> > From: "Benjamin Coddington" <bcodding@redhat.com>
> > To: "Weston Andros Adamson" <dros@primarydata.com>
> > Cc: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>, "Steve Dickson" <SteveD@redhat.com>, "linux-nfs list"
> > <linux-nfs@vger.kernel.org>, "Trond Myklebust" <trond.myklebust@primarydata.com>
> > Sent: Thursday, March 26, 2015 10:28:59 AM
> > Subject: Re: kernel crashes on commit
>
> > On Wed, 21 Jan 2015, Weston Andros Adamson wrote:
> >
> >>
> >> > On Jan 21, 2015, at 5:04 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> >> >
> >> > Hi Dros,
> >> >
> >> > after adopting patch for RHEL6 kernel, it works.
> >>
> >> Great!
> >>
> >> > We have to push it into stable fixes. Do you know
> >> > the procedure?
> >>
> >> I normally bug Steve D ;)
> >
> > Or you can open a bug against RHEL6; it should get picked up quickly, now
> > that you've done all the work.
> >
> > Ben
> >
> >>
> >> > ----- Original Message -----
> >> >> From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
> >> >> To: "Weston Andros Adamson" <dros@primarydata.com>
> >> >> Cc: "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>
> >> >> Sent: Tuesday, January 20, 2015 10:45:22 PM
> >> >> Subject: Re: kernel crashes on commit
> >> >
> >> >> I will check tomorrow with RHEL 6 kernel and let you known.
> >> >>
> >> >> Thanks,
> >> >> TigranOn Jan 20, 2015 9:43 PM, Weston Andros Adamson <dros@primarydata.com>
> >> >> wrote:
> >> >>>
> >> >>>
> >> >>>> On Jan 20, 2015, at 2:22 PM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> >> >>>>
> >> >>>> Hi Dros,
> >> >>>>
> >> >>>> do you refer to this commit
> >> >>>>
> >> >>>> http://git.linux-nfs.org/?p=dros/linux-nfs.git;a=commit;h=d201c4de518c1d617aa216664869fa329d562d7d
> >> >>>> ?
> >> >>>
> >> >>> Yes, that’s the patch I was talking about. Good find, I was about to go looking
> >> >>> for it.
> >> >>>
> >> >>> Is that patch in the kernels you’re testing?
> >> >>>
> >> >>> -dros
> >> >>>
> >> >>>> ----- Original Message -----
> >> >>>>> From: "Weston Andros Adamson" <dros@primarydata.com>
> >> >>>>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> >> >>>>> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>
> >> >>>>> Sent: Tuesday, January 20, 2015 3:37:49 PM
> >> >>>>> Subject: Re: kernel crashes on commit
> >> >>>>
> >> >>>>>> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Dear fellows,
> >> >>>>>>
> >> >>>>>> since we have enabled commit through DS code we
> >> >>>>>> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> <1>BUG: unable to handle kernel paging request at 00000000dc364913
> >> >>>>>> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> >> >>>>>> <4>PGD 6393ae067 PUD 0
> >> >>>>>> <4>Oops: 0000 [#1] SMP
> >> >>>>>> <4>last sysfs file: /sys/devices/system/cpu/online
> >> >>>>>> <4>CPU 1
> >> >>>>>> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl
> >> >>>>>> ipmi_devintf dell_rbu openafs(P)(U) autof
> >> >>>>>> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding
> >> >>>>>> 8021q garp stp llc ipv6 power_meter ac
> >> >>>>>> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg
> >> >>>>>> bnx2 lpc_ich mfd_core i7core_edac eda
> >> >>>>>> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix
> >> >>>>>> mptsas mptscsih mptbase scsi_transport_s
> >> >>>>>> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
> >> >>>>>> <4>
> >> >>>>>> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------
> >> >>>>>> 2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
> >> >>>>>> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>]
> >> >>>>>> nfs_init_commit+0x1f/0xf0 [nfs]
> >> >>>>>> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
> >> >>>>>> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
> >> >>>>>> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
> >> >>>>>> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
> >> >>>>>> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
> >> >>>>>> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
> >> >>>>>> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> >> >>>>>> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> >> >>>>>> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
> >> >>>>>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> >>>>>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> >>>>>> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task
> >> >>>>>> ffff88063837c040)
> >> >>>>>> <4>Stack:
> >> >>>>>> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
> >> >>>>>> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
> >> >>>>>> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
> >> >>>>>> <4>Call Trace:
> >> >>>>>> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0
> >> >>>>>> [nfs_layout_nfsv41_files]
> >> >>>>>> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
> >> >>>>>> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
> >> >>>>>> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
> >> >>>>>> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
> >> >>>>>> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
> >> >>>>>> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
> >> >>>>>> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
> >> >>>>>> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
> >> >>>>>> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
> >> >>>>>> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
> >> >>>>>> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
> >> >>>>>> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
> >> >>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> >> >>>>>> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
> >> >>>>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> >> >>>>>> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
> >> >>>>>> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
> >> >>>>>> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
> >> >>>>>> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
> >> >>>>>> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44
> >> >>>>>> 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c
> >> >>>>>> 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
> >> >>>>>> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> >> >>>>>> <4> RSP <ffff88063988da30>
> >> >>>>>> <4>CR2: 00000000dc364913
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> I have vmcore file as well, so let me know if you need some more information.
> >> >>>>>>
> >> >>>>>
> >> >>>>> Hi Tigran!
> >> >>>>>
> >> >>>>> Have you tried a recent upstream kernel? IIRC I fixed a seeming similar
> >> >>>>> filelayout
> >> >>>>> commit issue not too long ago.
> >> >>>>>
> >> >>>>> The filelayout commit path seems to have been broken for a while - mostly
> >> >>>>> because
> >> >>>>> all the filelayout servers (that I know of) use stable writes, so that code path
> >> >>>>> went
> >> >>>>> untested...
> >> >>>>>
> >> >>>>> -dros
> >> >>>
> >> >> N???????????????r??????y?????????b???X??????ǧv???^???)޺{.n???+????????????{?????????"??????^n???r?????????z?????????h????????????&?????????G?????????h???(???階???ݢj"?????????m???????????????z???ޖ?????????f?????????h?????????~???m???
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-01-20 19:22   ` Mkrtchyan, Tigran
@ 2015-01-20 20:42     ` Weston Andros Adamson
  0 siblings, 0 replies; 13+ messages in thread
From: Weston Andros Adamson @ 2015-01-20 20:42 UTC (permalink / raw)
  To: Tigran Mkrtchyan; +Cc: linux-nfs list


> On Jan 20, 2015, at 2:22 PM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> 
> Hi Dros,
> 
> do you refer to this commit
> 
> http://git.linux-nfs.org/?p=dros/linux-nfs.git;a=commit;h=d201c4de518c1d617aa216664869fa329d562d7d ?

Yes, that’s the patch I was talking about. Good find, I was about to go looking for it.

Is that patch in the kernels you’re testing?

-dros

> ----- Original Message -----
>> From: "Weston Andros Adamson" <dros@primarydata.com>
>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>
>> Sent: Tuesday, January 20, 2015 3:37:49 PM
>> Subject: Re: kernel crashes on commit
> 
>>> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
>>> 
>>> 
>>> 
>>> Dear fellows,
>>> 
>>> since we have enabled commit through DS code we
>>> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
>>> 
>>> 
>>> <1>BUG: unable to handle kernel paging request at 00000000dc364913
>>> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>>> <4>PGD 6393ae067 PUD 0
>>> <4>Oops: 0000 [#1] SMP
>>> <4>last sysfs file: /sys/devices/system/cpu/online
>>> <4>CPU 1
>>> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl
>>> ipmi_devintf dell_rbu openafs(P)(U) autof
>>> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding
>>> 8021q garp stp llc ipv6 power_meter ac
>>> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg
>>> bnx2 lpc_ich mfd_core i7core_edac eda
>>> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix
>>> mptsas mptscsih mptbase scsi_transport_s
>>> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
>>> <4>
>>> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------
>>> 2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
>>> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>]
>>> nfs_init_commit+0x1f/0xf0 [nfs]
>>> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
>>> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
>>> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
>>> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
>>> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
>>> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
>>> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
>>> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
>>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task
>>> ffff88063837c040)
>>> <4>Stack:
>>> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
>>> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
>>> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
>>> <4>Call Trace:
>>> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0
>>> [nfs_layout_nfsv41_files]
>>> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
>>> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
>>> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
>>> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
>>> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
>>> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
>>> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
>>> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
>>> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
>>> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
>>> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
>>> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>>> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
>>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>>> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
>>> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
>>> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
>>> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
>>> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44
>>> 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c
>>> 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
>>> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>>> <4> RSP <ffff88063988da30>
>>> <4>CR2: 00000000dc364913
>>> 
>>> 
>>> I have vmcore file as well, so let me know if you need some more information.
>>> 
>> 
>> Hi Tigran!
>> 
>> Have you tried a recent upstream kernel? IIRC I fixed a seeming similar
>> filelayout
>> commit issue not too long ago.
>> 
>> The filelayout commit path seems to have been broken for a while - mostly
>> because
>> all the filelayout servers (that I know of) use stable writes, so that code path
>> went
>> untested...
>> 
>> -dros


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-01-20 14:37 ` Weston Andros Adamson
  2015-01-20 16:13   ` Mkrtchyan, Tigran
@ 2015-01-20 19:22   ` Mkrtchyan, Tigran
  2015-01-20 20:42     ` Weston Andros Adamson
  1 sibling, 1 reply; 13+ messages in thread
From: Mkrtchyan, Tigran @ 2015-01-20 19:22 UTC (permalink / raw)
  To: Weston Andros Adamson; +Cc: linux-nfs list

Hi Dros,

do you refer to this commit

http://git.linux-nfs.org/?p=dros/linux-nfs.git;a=commit;h=d201c4de518c1d617aa216664869fa329d562d7d ?

Tigran.


----- Original Message -----
> From: "Weston Andros Adamson" <dros@primarydata.com>
> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>
> Sent: Tuesday, January 20, 2015 3:37:49 PM
> Subject: Re: kernel crashes on commit

>> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
>> 
>> 
>> 
>> Dear fellows,
>> 
>> since we have enabled commit through DS code we
>> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
>> 
>> 
>> <1>BUG: unable to handle kernel paging request at 00000000dc364913
>> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>> <4>PGD 6393ae067 PUD 0
>> <4>Oops: 0000 [#1] SMP
>> <4>last sysfs file: /sys/devices/system/cpu/online
>> <4>CPU 1
>> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl
>> ipmi_devintf dell_rbu openafs(P)(U) autof
>> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding
>> 8021q garp stp llc ipv6 power_meter ac
>> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg
>> bnx2 lpc_ich mfd_core i7core_edac eda
>> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix
>> mptsas mptscsih mptbase scsi_transport_s
>> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
>> <4>
>> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------
>> 2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
>> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>]
>> nfs_init_commit+0x1f/0xf0 [nfs]
>> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
>> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
>> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
>> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
>> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
>> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
>> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
>> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task
>> ffff88063837c040)
>> <4>Stack:
>> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
>> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
>> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
>> <4>Call Trace:
>> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0
>> [nfs_layout_nfsv41_files]
>> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
>> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
>> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
>> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
>> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
>> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
>> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
>> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
>> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
>> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
>> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
>> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
>> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
>> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
>> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
>> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44
>> 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c
>> 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
>> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>> <4> RSP <ffff88063988da30>
>> <4>CR2: 00000000dc364913
>> 
>> 
>> I have vmcore file as well, so let me know if you need some more information.
>> 
> 
> Hi Tigran!
> 
> Have you tried a recent upstream kernel? IIRC I fixed a seeming similar
> filelayout
> commit issue not too long ago.
> 
> The filelayout commit path seems to have been broken for a while - mostly
> because
> all the filelayout servers (that I know of) use stable writes, so that code path
> went
> untested...
> 
> -dros

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-01-20 14:00 Mkrtchyan, Tigran
  2015-01-20 14:37 ` Weston Andros Adamson
@ 2015-01-20 16:53 ` Peng Tao
  1 sibling, 0 replies; 13+ messages in thread
From: Peng Tao @ 2015-01-20 16:53 UTC (permalink / raw)
  To: Mkrtchyan, Tigran; +Cc: Linux NFS Mailing List

On Tue, Jan 20, 2015 at 10:00 PM, Mkrtchyan, Tigran
<tigran.mkrtchyan@desy.de> wrote:
>
>
> Dear fellows,
>
> since we have enabled commit through DS code we
> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
>
Hi Tigran,

I fixed an issue for flexfiles layout driver that although client is
supposed to have only one RW segment, it is possible to have multiple
segments when one attached to layout header others unlinked due to
layoutreturn/layoutrecall. If we don't check for this before freeing
commit buckets, client might crash when accessing ds commit info as
there is still a valid lseg. I'm not sure if it is the same issue but
it looks similar.

Please see ff_layout_free_lseg() in Tom's patchset (the 49th patch)
where I take inode->i_lock and check for existing RW layouts before
freeing commit info buckets.

Cheers,
Tao

>
> <1>BUG: unable to handle kernel paging request at 00000000dc364913
> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> <4>PGD 6393ae067 PUD 0
> <4>Oops: 0000 [#1] SMP
> <4>last sysfs file: /sys/devices/system/cpu/online
> <4>CPU 1
> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl ipmi_devintf dell_rbu openafs(P)(U) autof
> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 power_meter ac
> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg bnx2 lpc_ich mfd_core i7core_edac eda
> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_s
> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
> <4>
> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------    2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task ffff88063837c040)
> <4>Stack:
> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
> <4>Call Trace:
> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0 [nfs_layout_nfsv41_files]
> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> <4> RSP <ffff88063988da30>
> <4>CR2: 00000000dc364913
>
>
> I have vmcore file as well, so let me know if you need some more information.
>
> Tigran.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-01-20 14:37 ` Weston Andros Adamson
@ 2015-01-20 16:13   ` Mkrtchyan, Tigran
  2015-01-20 19:22   ` Mkrtchyan, Tigran
  1 sibling, 0 replies; 13+ messages in thread
From: Mkrtchyan, Tigran @ 2015-01-20 16:13 UTC (permalink / raw)
  To: Weston Andros Adamson; +Cc: linux-nfs list



----- Original Message -----
> From: "Weston Andros Adamson" <dros@primarydata.com>
> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>
> Sent: Tuesday, January 20, 2015 3:37:49 PM
> Subject: Re: kernel crashes on commit

>> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
>> 
>> 
>> 
>> Dear fellows,
>> 
>> since we have enabled commit through DS code we
>> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
>> 
>> 
>> <1>BUG: unable to handle kernel paging request at 00000000dc364913
>> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>> <4>PGD 6393ae067 PUD 0
>> <4>Oops: 0000 [#1] SMP
>> <4>last sysfs file: /sys/devices/system/cpu/online
>> <4>CPU 1
>> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl
>> ipmi_devintf dell_rbu openafs(P)(U) autof
>> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding
>> 8021q garp stp llc ipv6 power_meter ac
>> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg
>> bnx2 lpc_ich mfd_core i7core_edac eda
>> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix
>> mptsas mptscsih mptbase scsi_transport_s
>> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
>> <4>
>> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------
>> 2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
>> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>]
>> nfs_init_commit+0x1f/0xf0 [nfs]
>> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
>> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
>> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
>> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
>> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
>> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
>> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
>> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
>> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task
>> ffff88063837c040)
>> <4>Stack:
>> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
>> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
>> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
>> <4>Call Trace:
>> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0
>> [nfs_layout_nfsv41_files]
>> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
>> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
>> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
>> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
>> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
>> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
>> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
>> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
>> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
>> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
>> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
>> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
>> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
>> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
>> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
>> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
>> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
>> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44
>> 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c
>> 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8
>> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
>> <4> RSP <ffff88063988da30>
>> <4>CR2: 00000000dc364913
>> 
>> 
>> I have vmcore file as well, so let me know if you need some more information.
>> 
> 
> Hi Tigran!

Hi Dros,

> 
> Have you tried a recent upstream kernel? IIRC I fixed a seeming similar
> filelayout
> commit issue not too long ago.

In did, looks like my fedora-21 desktop survives massive writes, when other hosts
crashing.

Now I recall seeing you mail, but can't find it any more. Could you please point to it?


> 
> The filelayout commit path seems to have been broken for a while - mostly
> because
> all the filelayout servers (that I know of) use stable writes, so that code path
> went
> untested...

We have enabled unstable write in December.

Tigran.


 
> 
> -dros

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kernel crashes on commit
  2015-01-20 14:00 Mkrtchyan, Tigran
@ 2015-01-20 14:37 ` Weston Andros Adamson
  2015-01-20 16:13   ` Mkrtchyan, Tigran
  2015-01-20 19:22   ` Mkrtchyan, Tigran
  2015-01-20 16:53 ` Peng Tao
  1 sibling, 2 replies; 13+ messages in thread
From: Weston Andros Adamson @ 2015-01-20 14:37 UTC (permalink / raw)
  To: Tigran Mkrtchyan; +Cc: linux-nfs list


> On Jan 20, 2015, at 9:00 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote:
> 
> 
> 
> Dear fellows,
> 
> since we have enabled commit through DS code we
> permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:
> 
> 
> <1>BUG: unable to handle kernel paging request at 00000000dc364913
> <1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> <4>PGD 6393ae067 PUD 0 
> <4>Oops: 0000 [#1] SMP 
> <4>last sysfs file: /sys/devices/system/cpu/online
> <4>CPU 1 
> <4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl ipmi_devintf dell_rbu openafs(P)(U) autof
> s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 power_meter ac
> pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg bnx2 lpc_ich mfd_core i7core_edac eda
> c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_s
> as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
> <4>
> <4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------    2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
> <4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> <4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
> <4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
> <4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
> <4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
> <4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
> <4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
> <4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> <4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task ffff88063837c040)
> <4>Stack:
> <4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
> <4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
> <4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
> <4>Call Trace:
> <4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0 [nfs_layout_nfsv41_files]
> <4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
> <4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
> <4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
> <4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
> <4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
> <4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
> <4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
> <4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
> <4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
> <4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
> <4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
> <4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
> <4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
> <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
> <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
> <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
> <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
> <4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8 
> <1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
> <4> RSP <ffff88063988da30>
> <4>CR2: 00000000dc364913
> 
> 
> I have vmcore file as well, so let me know if you need some more information.
> 

Hi Tigran!

Have you tried a recent upstream kernel? IIRC I fixed a seeming similar filelayout
commit issue not too long ago.

The filelayout commit path seems to have been broken for a while - mostly because
all the filelayout servers (that I know of) use stable writes, so that code path went
untested...

-dros




^ permalink raw reply	[flat|nested] 13+ messages in thread

* kernel crashes on commit
@ 2015-01-20 14:00 Mkrtchyan, Tigran
  2015-01-20 14:37 ` Weston Andros Adamson
  2015-01-20 16:53 ` Peng Tao
  0 siblings, 2 replies; 13+ messages in thread
From: Mkrtchyan, Tigran @ 2015-01-20 14:00 UTC (permalink / raw)
  To: linux-nfs



Dear fellows,

since we have enabled commit through DS code we
permanently observe kernel crashes with RHEL6/7 and ubuntu 14.04:


<1>BUG: unable to handle kernel paging request at 00000000dc364913
<1>IP: [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
<4>PGD 6393ae067 PUD 0 
<4>Oops: 0000 [#1] SMP 
<4>last sysfs file: /sys/devices/system/cpu/online
<4>CPU 1 
<4>Modules linked in: vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl ipmi_devintf dell_rbu openafs(P)(U) autof
s4 nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 power_meter ac
pi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support microcode dcdbas sg bnx2 lpc_ich mfd_core i7core_edac eda
c_core ext4 jbd2 mbcache sd_mod crc_t10dif wmi pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_s
as dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 18209, comm: flush-0:19 Tainted: P           ---------------    2.6.32-504.3.3.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M
<4>RIP: 0010:[<ffffffffa02b45df>]  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
<4>RSP: 0018:ffff88063988da30  EFLAGS: 00010246
<4>RAX: ffff88063988db60 RBX: ffff88009c492040 RCX: ffff88063988db30
<4>RDX: 0000000000000000 RSI: ffff88063988db60 RDI: 00000000dc364903
<4>RBP: ffff88063988da40 R08: ffff88063988da90 R09: f9aa37faa254d404
<4>R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000001
<4>R13: ffff880339f33a00 R14: ffff88063988db30 R15: ffff88063988d8c8
<4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 00000000dc364913 CR3: 0000000639fbb000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process flush-0:19 (pid: 18209, threadinfo ffff88063988c000, task ffff88063837c040)
<4>Stack:
<4> 0000000000000000 ffff88009c492040 ffff88063988dad0 ffffffffa031fdb7
<4><d> ffff88063837c5f8 ffff88063988da90 ffff8800a6e34600 ffff880337f2a950
<4><d> ffff880637c99488 0000000037f2a940 ffff88063988db60 0000000000000000
<4>Call Trace:
<4> [<ffffffffa031fdb7>] filelayout_commit_pagelist+0x277/0x3c0 [nfs_layout_nfsv41_files]
<4> [<ffffffffa02b613b>] nfs_generic_commit_list+0xab/0x100 [nfs]
<4> [<ffffffffa02b627c>] nfs_commit_inode+0xec/0x150 [nfs]
<4> [<ffffffffa02b6aab>] nfs_write_inode+0xab/0x100 [nfs]
<4> [<ffffffff811baedc>] writeback_single_inode+0x20c/0x290
<4> [<ffffffff811bb1ad>] writeback_sb_inodes+0xbd/0x170
<4> [<ffffffff811bb30b>] writeback_inodes_wb+0xab/0x1b0
<4> [<ffffffff811bb703>] wb_writeback+0x2f3/0x410
<4> [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
<4> [<ffffffff81088062>] ? del_timer_sync+0x22/0x30
<4> [<ffffffff811bb9c5>] wb_do_writeback+0x1a5/0x240
<4> [<ffffffff811bbac3>] bdi_writeback_task+0x63/0x1b0
<4> [<ffffffff8109e987>] ? bit_waitqueue+0x17/0xd0
<4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
<4> [<ffffffff811483e6>] bdi_start_fn+0x86/0x100
<4> [<ffffffff81148360>] ? bdi_start_fn+0x0/0x100
<4> [<ffffffff8109e66e>] kthread+0x9e/0xc0
<4> [<ffffffff8100c20a>] child_rip+0xa/0x20
<4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
<4>Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 06 48 89 fb 48 8b 78 18 48 39 c6 48 8b 7f 40 <48> 8b 7f 10 74 2b 4c 8b 83 c8 01 00 00 4c 8b 4e 08 4c 8d 93 c8 
<1>RIP  [<ffffffffa02b45df>] nfs_init_commit+0x1f/0xf0 [nfs]
<4> RSP <ffff88063988da30>
<4>CR2: 00000000dc364913


I have vmcore file as well, so let me know if you need some more information.

Tigran.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-06-26 12:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-20 21:45 kernel crashes on commit Mkrtchyan, Tigran
2015-01-21 10:04 ` Mkrtchyan, Tigran
2015-01-21 15:20   ` Weston Andros Adamson
2015-03-26  9:28     ` Benjamin Coddington
2015-03-26 10:11       ` Mkrtchyan, Tigran
2015-03-26 12:00         ` Benjamin Coddington
2015-06-26 12:33         ` Benjamin Coddington
  -- strict thread matches above, loose matches on Subject: below --
2015-01-20 14:00 Mkrtchyan, Tigran
2015-01-20 14:37 ` Weston Andros Adamson
2015-01-20 16:13   ` Mkrtchyan, Tigran
2015-01-20 19:22   ` Mkrtchyan, Tigran
2015-01-20 20:42     ` Weston Andros Adamson
2015-01-20 16:53 ` Peng Tao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.