From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Write operation is stuck Date: Wed, 10 Feb 2010 22:26:32 +0100 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0888458386243550241==" Return-path: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org --===============0888458386243550241== Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_C6A64D82E3A5D24B949315CFBC1FA1AD0729EFCD7DDEWDFECCR01wd_" --_000_C6A64D82E3A5D24B949315CFBC1FA1AD0729EFCD7DDEWDFECCR01wd_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello, Recently I ran three application instances simultaneously over a mounted C= EPH file system and one of them got stuck calling a write operation. I had the following CEPH configuration: - The nodes have Debian installation - lenny , unstable - Three nodes with osd servers - Three client nodes - One client node among the three mentioned above was located at a no= de where an osd server ran. Can the origin of the problem be the client collocated with an osd server? Can you help me to resolve this issue? Thanks and regards, Roman -- Roman Talyansky SAP Research, Israel T +972 777 5538 M +972 3388 032 mailto:roman.talyansky@sap.com --_000_C6A64D82E3A5D24B949315CFBC1FA1AD0729EFCD7DDEWDFECCR01wd_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Hello,
 
Recently I ran three application  instances simultaneously over a= mounted CEPH file system and one of them got stuck calling a write operati= on.
I had the following CEPH configuration:
  • The nodes have Debian installation – lenny  , unstable
  • <= li>Three nodes with osd servers
  • Three client nodes
  • One clie= nt node among the three mentioned above was located at a node where an osd = server ran.
 
Can the origin of the problem be the client collocated with an osd ser= ver?
Can you help me to resolve this issue?
 
Thanks and regards,
 
Roman
 
--
 
Roman Talyansky
SAP Research, Israel
T +972 777 5538
M +972 3388 032
mailto:roman.talyansky@sap.com
 
 
 
--_000_C6A64D82E3A5D24B949315CFBC1FA1AD0729EFCD7DDEWDFECCR01wd_-- --===============0888458386243550241== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev --===============0888458386243550241== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel --===============0888458386243550241==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Write operation is stuck Date: Wed, 10 Feb 2010 13:39:21 -0800 (PST) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: "Talyansky, Roman" Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org Hi Roman, On Wed, 10 Feb 2010, Talyansky, Roman wrote: > Hello, > > Recently I ran three application instances simultaneously over a mounted CEPH file system and one of them got stuck calling a write operation. > I had the following CEPH configuration: > - The nodes have Debian installation - lenny , unstable > - Three nodes with osd servers > - Three client nodes > - One client node among the three mentioned above was located at a node where an osd server ran. > > Can the origin of the problem be the client collocated with an osd server? The collocated client+osd can theoretically cause problems when you run out of memory, but it doesn't sound like that's the case here. > Can you help me to resolve this issue? I assume the OSDs and MDS are all still running? We fixed a number of bugs recently with multiple clients interacting with the same files. Is the hang reproducable? Can you try it with the latest unstable client and servers? Or, enable mds debug logging and post that somewhere (debug mds = 20, debug ms = 1)? Thanks- sage ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Re: Write operation is stuck Date: Wed, 10 Feb 2010 23:44:06 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: Sage Weil Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org Hi Sage, Thanks for the reply. > I assume the OSDs and MDS are all still running? They are not running. Since I did not have trace files started for ceph, I decided to reproduce the hang with traces started. Currently I try to reproduce the hang. > Can you try it with the latest unstable client and servers? I will definitely try. > Or, enable mds debug logging and post that somewhere (debug mds = 20, debug ms = 1)? Should I place these two lines into the ceph.conf file in the [mds] section? Thanks, Roman -----Original Message----- From: Sage Weil [mailto:sage@newdream.net] Sent: Wednesday, February 10, 2010 11:39 PM To: Talyansky, Roman Cc: ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck Hi Roman, On Wed, 10 Feb 2010, Talyansky, Roman wrote: > Hello, > > Recently I ran three application instances simultaneously over a mounted CEPH file system and one of them got stuck calling a write operation. > I had the following CEPH configuration: > - The nodes have Debian installation - lenny , unstable > - Three nodes with osd servers > - Three client nodes > - One client node among the three mentioned above was located at a node where an osd server ran. > > Can the origin of the problem be the client collocated with an osd server? The collocated client+osd can theoretically cause problems when you run out of memory, but it doesn't sound like that's the case here. > Can you help me to resolve this issue? I assume the OSDs and MDS are all still running? We fixed a number of bugs recently with multiple clients interacting with the same files. Is the hang reproducable? Can you try it with the latest unstable client and servers? Or, enable mds debug logging and post that somewhere (debug mds = 20, debug ms = 1)? Thanks- sage ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Write operation is stuck Date: Wed, 10 Feb 2010 14:49:19 -0800 (PST) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: "Talyansky, Roman" Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org On Wed, 10 Feb 2010, Talyansky, Roman wrote: > > Or, enable mds debug logging and post that somewhere (debug mds = 20, > > debug ms = 1)? > Should I place these two lines into the ceph.conf file in the [mds] section? Yes. Thanks- sage ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Re: Write operation is stuck Date: Tue, 16 Feb 2010 18:27:46 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A0E1D74DEWDFECCR01wd_" Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: Sage Weil Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A0E1D74DEWDFECCR01wd_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Sage, I am trying to reproduce the hang with the latest client and servers. I am able to start the servers, however mount fails with input/output error= 5. The dmesg listing shows the following info: [17008.244739] ceph: loaded 0.18.0 (mon/mds/osd proto 15/30/22) [17015.888143] ceph: mon0 10.55.147.70:6789 connection failed [17025.880170] ceph: mon0 10.55.147.70:6789 connection failed [17035.880121] ceph: mon0 10.55.147.70:6789 connection failed [17045.880189] ceph: mon0 10.55.147.70:6789 connection failed [17055.880130] ceph: mon0 10.55.147.70:6789 connection failed [17065.880113] ceph: mon0 10.55.147.70:6789 connection failed [17075.880170] ceph: mon0 10.55.147.70:6789 connection failed The server is reachable, as the following command output shows: $ nc 10.55.147.83 6789 ceph v027 I started running the experiments with ceph 0.18 using the configuration, w= here clients and servers run on separate nodes. It turns out that the perfo= rmance is extremely bad. Looking at dmesg trace I see ceph-related faults (= the partial trace is attached to the email). Any suggestions how to proceed are more than welcome. Thanks, Roman -----Original Message----- From: Sage Weil [mailto:sage@newdream.net]=20 Sent: Wednesday, February 10, 2010 11:39 PM To: Talyansky, Roman Cc: ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck Hi Roman, On Wed, 10 Feb 2010, Talyansky, Roman wrote: > Hello, >=20 > Recently I ran three application instances simultaneously over a mounted= CEPH file system and one of them got stuck calling a write operation. > I had the following CEPH configuration: > - The nodes have Debian installation - lenny , unstable > - Three nodes with osd servers > - Three client nodes > - One client node among the three mentioned above was located at a = node where an osd server ran. >=20 > Can the origin of the problem be the client collocated with an osd server= ? The collocated client+osd can theoretically cause problems when you run=20 out of memory, but it doesn't sound like that's the case here. > Can you help me to resolve this issue? I assume the OSDs and MDS are all still running? We fixed a number of bugs recently with multiple clients interacting with=20 the same files. Is the hang reproducable? Can you try it with the latest= =20 unstable client and servers? Or, enable mds debug logging and post that=20 somewhere (debug mds =3D 20, debug ms =3D 1)? Thanks- sage --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A0E1D74DEWDFECCR01wd_ Content-Type: text/plain; name="trace.txt" Content-Description: trace.txt Content-Disposition: attachment; filename="trace.txt"; size=4287; creation-date="Tue, 16 Feb 2010 16:02:44 GMT"; modification-date="Tue, 16 Feb 2010 16:02:54 GMT" Content-Transfer-Encoding: base64 WzExMjY5MS41MTY1MzhdIGdlbmVyYWwgcHJvdGVjdGlvbiBmYXVsdDogMDAwMCBbIzczXSBTTVAN ClsxMTI2OTEuNTIwNTE3XSBsYXN0IHN5c2ZzIGZpbGU6IC9zeXMvZGV2aWNlcy92aXJ0dWFsL25l dC9sby9vcGVyc3RhdGUNClsxMTI2OTEuNTIwNTE3XSBDUFUgMQ0KWzExMjY5MS41MjA1MTddIE1v ZHVsZXMgbGlua2VkIGluOiBjZXBoIGNyYzMyYyBsaWJjcmMzMmMgbmZzIGxvY2tkIGZzY2FjaGUg bmZzX2FjbCBhdXRoX3JwY2dzcyBzdW5ycGMgYXV0b2ZzNCBleHQ0IGpiZDIgY3JjMTYgbG9vcCBw YXJwb3J0X3BjIGZzY2htZCBpMmNfaTgwMQ0KIHBhcnBvcnQgaTJjX2NvcmUgc25kX2hkYV9jb2Rl Y19yZWFsdGVrIGV2ZGV2IHRwbV9pbmZpbmVvbiBzbmRfaGRhX2ludGVsIHBzbW91c2Ugc2VyaW9f cmF3IHRwbSBzbmRfaGRhX2NvZGVjIHNuZF9wY3NwIHNuZF9od2RlcCBzbmRfcGNtIHNuZF90aW1l ciBzbmQgc291bmRjbw0KcmUgc25kX3BhZ2VfYWxsb2MgY29udGFpbmVyIHRwbV9iaW9zIHByb2Nl c3NvciBleHQzIGpiZCBtYmNhY2hlIHNnIHNkX21vZCBjcmNfdDEwZGlmIHNyX21vZCBjZHJvbSBp ZGVfcGNpX2dlbmVyaWMgaWRlX2NvcmUgYXRhX2dlbmVyaWMgdWhjaV9oY2QgZmxvcHB5IGF0YV9w aQ0KaXggYnV0dG9uIGUxMDAwZSBpbnRlbF9hZ3AgYWdwZ2FydCBsaWJhdGEgZWhjaV9oY2Qgc2Nz aV9tb2QgdXNiY29yZSBubHNfYmFzZSB0aGVybWFsIGZhbiB0aGVybWFsX3N5cyBbbGFzdCB1bmxv YWRlZDogc2NzaV93YWl0X3NjYW5dDQpbMTEyNjkxLjUyMDUxN10gUGlkOiAzNzgwLCBjb21tOiBp b3BsYXllcjIgVGFpbnRlZDogRyAgICAgIEQgICAgMi42LjMyLXRydW5rLWFtZDY0ICMxIEVTUFJJ TU8gUDU5MjUNClsxMTI2OTEuNTIwNTE3XSBSSVA6IDAwMTA6WzxmZmZmZmZmZmEwM2E2ZjE2Pl0g IFs8ZmZmZmZmZmZhMDNhNmYxNj5dIHplcm9fdXNlcl9zZWdtZW50KzB4NjIvMHg3NSBbY2VwaF0N ClsxMTI2OTEuNTIwNTE3XSBSU1A6IDAwMTg6ZmZmZjg4MDAzNzg2MWM4OCAgRUZMQUdTOiAwMDAx MDI0Ng0KWzExMjY5MS41MjA1MTddIFJBWDogMDAwMDAwMDAwMDAwMDAwMCBSQlg6IDAwMDAwMDAw ZmZmYThkODcgUkNYOiAwMDAwMDAwMDAwMDAxMDAwDQpbMTEyNjkxLjUyMDUxN10gUkRYOiA2ZGI2 ZGI2ZGI2ZGI2ZGI3IFJTSTogMDAwMDAwMDAwMDAwMDAwMCBSREk6IDc2ZjE5NzMyZWI3YmMwMDAN ClsxMTI2OTEuNTIwNTE3XSBSQlA6IDAwMDAwMDAwMDAwMDEwMDAgUjA4OiAzMTIwMzkzNTMyMzgz MTIwIFIwOTogZmZmZmZmZmY4MTQzOTBiMA0KWzExMjY5MS41MjA1MTddIFIxMDogZmZmZjg4MDEw YjE1NzgwMCBSMTE6IGZmZmY4ODAwZDY4MDIwMDAgUjEyOiAwMDAwMDAwMGZmZmE4ZDg2DQpbMTEy NjkxLjUyMDUxN10gUjEzOiAwMDAwMDAwMDAwMDAyMDAwIFIxNDogZmZmZjg4MDEwYTkxOGIxMCBS MTU6IGZmZmY4ODAxMGE5MThiMTANClsxMTI2OTEuNTIwNTE3XSBGUzogIDAwMDA3ZjViZTI3ZmM5 MTAoMDAwMCkgR1M6ZmZmZjg4MDAwNTEwMDAwMCgwMDAwKSBrbmxHUzowMDAwMDAwMDAwMDAwMDAw DQpbMTEyNjkxLjUyMDUxN10gQ1M6ICAwMDEwIERTOiAwMDAwIEVTOiAwMDAwIENSMDogMDAwMDAw MDA4MDA1MDAzMw0KWzExMjY5MS41MjA1MTddIENSMjogMDAwMDdmNGJkM2E5Y2E5MCBDUjM6IDAw MDAwMDAxMDk0NjEwMDAgQ1I0OiAwMDAwMDAwMDAwMDAwNmUwDQpbMTEyNjkxLjUyMDUxN10gRFIw OiAwMDAwMDAwMDAwMDAwMDAwIERSMTogMDAwMDAwMDAwMDAwMDAwMCBEUjI6IDAwMDAwMDAwMDAw MDAwMDANClsxMTI2OTEuNTIwNTE3XSBEUjM6IDAwMDAwMDAwMDAwMDAwMDAgRFI2OiAwMDAwMDAw MGZmZmYwZmYwIERSNzogMDAwMDAwMDAwMDAwMDQwMA0KWzExMjY5MS41MjA1MTddIFByb2Nlc3Mg aW9wbGF5ZXIyIChwaWQ6IDM3ODAsIHRocmVhZGluZm8gZmZmZjg4MDAzNzg2MDAwMCwgdGFzayBm ZmZmODgwMDM3OTQ1YmQwKQ0KWzExMjY5MS41MjA1MTddIFN0YWNrOg0KWzExMjY5MS41MjA1MTdd ICAwMDAwMDEwMDAwMDAwMjMwIGZmZmZmZmZmYTAzYTZmOTMgMDAwMDAwMDAwMDAwMDAwMCAwMDAw MDAwMWE4ZDg2MDAwDQpbMTEyNjkxLjUyMDUxN10gPDA+IDAwMDAwMDAwMDAwMDAwMDEgMDAwMDAw MDAwMDAwMDAwMCAwMDAwMDAwMWE4ZDg2MDAwIGZmZmZmZmZmYTAzYTdiZjcNClsxMTI2OTEuNTIw NTE3XSA8MD4gZmZmZjg4MDAwMDAwMDAwMCBmZmZmZmZmZmZmZmZmZmZmIGZmZmY4ODAxMGE5MThi MTAgZmZmZjg4MDEwMDAwMDAwMg0KWzExMjY5MS41MjA1MTddIENhbGwgVHJhY2U6DQpbMTEyNjkx LjUyMDUxN10gIFs8ZmZmZmZmZmZhMDNhNmY5Mz5dID8gemVyb19wYWdlX3ZlY3Rvcl9yYW5nZSsw eDZhLzB4YTUgW2NlcGhdDQpbMTEyNjkxLjUyMDUxN10gIFs8ZmZmZmZmZmZhMDNhN2JmNz5dID8g Y2VwaF9haW9fcmVhZCsweDMzZC8weDRhYSBbY2VwaF0NClsxMTI2OTEuNTIwNTE3XSAgWzxmZmZm ZmZmZjgxMGViZjAxPl0gPyBkb19zeW5jX3JlYWQrMHhjZS8weDExMw0KWzExMjY5MS41MjA1MTdd ICBbPGZmZmZmZmZmODEwNjc2ZDQ+XSA/IGhydGltZXJfdHJ5X3RvX2NhbmNlbCsweDNhLzB4NDMN ClsxMTI2OTEuNTIwNTE3XSAgWzxmZmZmZmZmZjgxMDY0YWFlPl0gPyBhdXRvcmVtb3ZlX3dha2Vf ZnVuY3Rpb24rMHgwLzB4MmUNClsxMTI2OTEuNTIwNTE3XSAgWzxmZmZmZmZmZjgxMDY3NmU5Pl0g PyBocnRpbWVyX2NhbmNlbCsweGMvMHgxNg0KWzExMjY5MS41MjA1MTddICBbPGZmZmZmZmZmODEy ZTYyYWE+XSA/IGRvX25hbm9zbGVlcCsweDZkLzB4YTMNClsxMTI2OTEuNTIwNTE3XSAgWzxmZmZm ZmZmZjgxMDNhYTlhPl0gPyBwaWNrX25leHRfdGFzaysweDI0LzB4M2YNClsxMTI2OTEuNTIwNTE3 XSAgWzxmZmZmZmZmZjgxMGVjOTRhPl0gPyB2ZnNfcmVhZCsweGE2LzB4ZmYNClsxMTI2OTEuNTIw NTE3XSAgWzxmZmZmZmZmZjgxMGVjYTVmPl0gPyBzeXNfcmVhZCsweDQ1LzB4NmUNClsxMTI2OTEu NTIwNTE3XSAgWzxmZmZmZmZmZjgxMDEwYjAyPl0gPyBzeXN0ZW1fY2FsbF9mYXN0cGF0aCsweDE2 LzB4MWINClsxMTI2OTEuNTIwNTE3XSBDb2RlOiBiNiA2ZCBkYiBiNiA2ZCA0OSA4ZCAwNCAwMCA4 OSBmNyAyOSBmMSA0OCBjMSBmOCAwMyA0OCAwZiBhZiBjMiA0OCBjMSBlMCAwYyA0OCAwMSBjNyA0 OCBiOCAwMCAwMCAwMCAwMCAwMCA4OCBmZiBmZiA0OCAwMSBjNyAzMSBjMCA8ZjM+IGENCmEgNjUg NDggOGIgMDQgMjUgYzggY2IgMDAgMDAgZmYgODggNDQgZTAgZmYgZmYgNTkgYzMgNDEgNTYNClsx MTI2OTEuNTIwNTE3XSBSSVAgIFs8ZmZmZmZmZmZhMDNhNmYxNj5dIHplcm9fdXNlcl9zZWdtZW50 KzB4NjIvMHg3NSBbY2VwaF0NClsxMTI2OTEuNTIwNTE3XSAgUlNQIDxmZmZmODgwMDM3ODYxYzg4 Pg0KWzExMjY5MS44NzEyMzhdIC0tLVsgZW5kIHRyYWNlIDA5NDg2OTgzYThjZGJlMDQgXS0tLQ0K WzExMjY5MS44NzYzMzFdIG5vdGU6IGlvcGxheWVyMlszNzgwXSBleGl0ZWQgd2l0aCBwcmVlbXB0 X2NvdW50IDENClsxMTI2OTEuODgxNDA5XSBCVUc6IHNjaGVkdWxpbmcgd2hpbGUgYXRvbWljOiBp b3BsYXllcjIvMzc4MC8weDEwMDAwMDAxDQpbMTEyNjkxLjg4NjQ1Ml0gTW9kdWxlcyBsaW5rZWQg aW46IGNlcGggY3JjMzJjIGxpYmNyYzMyYyBuZnMgbG9ja2QgZnNjYWNoZSBuZnNfYWNsIGF1dGhf cnBjZ3NzIHN1bnJwYyBhdXRvZnM0IGV4dDQgamJkMiBjcmMxNiBsb29wIHBhcnBvcnRfcGMgZnNj aG1kIGkyY19pODAxDQogcGFycG9ydCBpMmNfY29yZSBzbmRfaGRhX2NvZGVjX3JlYWx0ZWsgZXZk ZXYgdHBtX2luZmluZW9uIHNuZF9oZGFfaW50ZWwgcHNtb3VzZSBzZXJpb19yYXcgdHBtIHNuZF9o ZGFfY29kZWMgc25kX3Bjc3Agc25kX2h3ZGVwIHNuZF9wY20gc25kX3RpbWVyIHNuZCBzb3VuZGNv DQpyZSBzbmRfcGFnZV9hbGxvYyBjb250YWluZXIgdHBtX2Jpb3MgcHJvY2Vzc29yIGV4dDMgamJk IG1iY2FjaGUgc2cgc2RfbW9kIGNyY190MTBkaWYgc3JfbW9kIGNkcm9tIGlkZV9wY2lfZ2VuZXJp YyBpZGVfY29yZSBhdGFfZ2VuZXJpYyB1aGNpX2hjZCBmbG9wcHkgYXRhX3BpDQppeCBidXR0b24g ZTEwMDBlIGludGVsX2FncCBhZ3BnYXJ0IGxpYmF0YSBlaGNpX2hjZCBzY3NpX21vZCB1c2Jjb3Jl IG5sc19iYXNlIHRoZXJtYWwgZmFuIHRoZXJtYWxfc3lzIFtsYXN0IHVubG9hZGVkOiBzY3NpX3dh aXRfc2Nhbl0NClsxMTI2OTEuOTI4NDM0XSBQaWQ6IDM3ODAsIGNvbW06IGlvcGxheWVyMiBUYWlu dGVkOiBHICAgICAgRCAgICAyLjYuMzItdHJ1bmstYW1kNjQgIzENClsxMTI2OTEuOTM4ODAyXSBD YWxsIFRyYWNlOg0K --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A0E1D74DEWDFECCR01wd_ Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A0E1D74DEWDFECCR01wd_ Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A0E1D74DEWDFECCR01wd_-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Write operation is stuck Date: Tue, 16 Feb 2010 10:35:00 -0800 (PST) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: "Talyansky, Roman" Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org On Tue, 16 Feb 2010, Talyansky, Roman wrote: > Hi Sage, > > I am trying to reproduce the hang with the latest client and servers. > I am able to start the servers, however mount fails with input/output error 5. The dmesg listing shows the following info: > > [17008.244739] ceph: loaded 0.18.0 (mon/mds/osd proto 15/30/22) > [17015.888143] ceph: mon0 10.55.147.70:6789 connection failed > [17025.880170] ceph: mon0 10.55.147.70:6789 connection failed > [17035.880121] ceph: mon0 10.55.147.70:6789 connection failed > [17045.880189] ceph: mon0 10.55.147.70:6789 connection failed > [17055.880130] ceph: mon0 10.55.147.70:6789 connection failed > [17065.880113] ceph: mon0 10.55.147.70:6789 connection failed > [17075.880170] ceph: mon0 10.55.147.70:6789 connection failed > > The server is reachable, as the following command output shows: > > $ nc 10.55.147.83 6789 > ceph v027 It looks like dmesg shows it trying to connect to the monitor at .70, but you tested .83? > I started running the experiments with ceph 0.18 using the > configuration, where clients and servers run on separate nodes. It turns > out that the performance is extremely bad. Looking at dmesg trace I see > ceph-related faults (the partial trace is attached to the email). The oops in the attached trace.txt was fixed last week in the unstable code. It also looks like the IO is synchronous, which may have something to do with your performance. Are you mounting with -o sync or using direct IO, or are multiple clients reading and writing to the same file or something? Thanks- sage ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Re: Write operation is stuck Date: Fri, 19 Feb 2010 16:40:12 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: Sage Weil Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org Hi Sage, Thanks for the answer. > It looks like dmesg shows it trying to connect to the monitor at .70, but you tested .83? Since I test several ceph versions simultaneously I could confuse the error checking at different nodes. I'll double check this and let you know. > It also looks like the IO is synchronous, which may have something > to do with your performance. Are you mounting with -o sync or using > direct IO, or are multiple clients reading and writing to the same file or > something? The IO is indeed synchronous. However the performance under ceph is much worse than even under nfs, which looks strange. I do not mount with -o synch. And in our experiments multiple clients read and write the same file. Thanks, Roman -----Original Message----- From: Sage Weil [mailto:sage@newdream.net] Sent: Tuesday, February 16, 2010 8:35 PM To: Talyansky, Roman Cc: ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck On Tue, 16 Feb 2010, Talyansky, Roman wrote: > Hi Sage, > > I am trying to reproduce the hang with the latest client and servers. > I am able to start the servers, however mount fails with input/output error 5. The dmesg listing shows the following info: > > [17008.244739] ceph: loaded 0.18.0 (mon/mds/osd proto 15/30/22) > [17015.888143] ceph: mon0 10.55.147.70:6789 connection failed > [17025.880170] ceph: mon0 10.55.147.70:6789 connection failed > [17035.880121] ceph: mon0 10.55.147.70:6789 connection failed > [17045.880189] ceph: mon0 10.55.147.70:6789 connection failed > [17055.880130] ceph: mon0 10.55.147.70:6789 connection failed > [17065.880113] ceph: mon0 10.55.147.70:6789 connection failed > [17075.880170] ceph: mon0 10.55.147.70:6789 connection failed > > The server is reachable, as the following command output shows: > > $ nc 10.55.147.83 6789 > ceph v027 It looks like dmesg shows it trying to connect to the monitor at .70, but you tested .83? > I started running the experiments with ceph 0.18 using the > configuration, where clients and servers run on separate nodes. It turns > out that the performance is extremely bad. Looking at dmesg trace I see > ceph-related faults (the partial trace is attached to the email). The oops in the attached trace.txt was fixed last week in the unstable code. It also looks like the IO is synchronous, which may have something to do with your performance. Are you mounting with -o sync or using direct IO, or are multiple clients reading and writing to the same file or something? Thanks- sage ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Write operation is stuck Date: Fri, 19 Feb 2010 10:39:29 -0800 (PST) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: "Talyansky, Roman" Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org Hi Roman, On Fri, 19 Feb 2010, Talyansky, Roman wrote: > Since I test several ceph versions simultaneously I could confuse the error checking at different nodes. > I'll double check this and let you know. Thanks. If you haven't switched to the just-released 0.19, now might be the time to do that. > > It also looks like the IO is synchronous, which may have something > > to do with your performance. Are you mounting with -o sync or using > > direct IO, or are multiple clients reading and writing to the same file or > > something? > > The IO is indeed synchronous. However the performance under ceph is much > worse than even under nfs, which looks strange. I do not mount with -o > synch. And in our experiments multiple clients read and write the same > file. If you are accessing the same file from multiple clients, then any comparison with nfs is going to be misleading. NFS provides only close to open consistency, so IO will be buffered and inconsistent. Ceph provides fully consistent semantics by switching to synchronous IO when there are multiple clients. Ceph will be slower, but correct; nfs will be fast, but incorrect. If your application is smart enough to handle it's own consistency (each client is writing to a different region of the file) then you probably want something along the lines of O_LAZY [1], so that the application can tell the FS not to worry about consistency and stick with buffered IO. Unfortunately O_LAZY doesn't exist in Linux at this point. There is some preliminary support for it in Ceph... if that's what you're looking for, we can cook up some patches for you. If you can find us in #ceph on irc.oftc.net that might be a quicker way to diagnose the performance problems with your workload. Thanks! sage [1] http://www.pdl.cmu.edu/posix/docs/posix_lazy_io.pdf ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Re: Write operation is stuck Date: Tue, 23 Feb 2010 15:11:43 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A34414FDEWDFECCR01wd_" Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: Sage Weil Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A34414FDEWDFECCR01wd_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Sage, As you advised us, we switched to the release 0.19 of ceph and ran into ano= ther bug in the ceph client. When writing to a file with the O_SYNC flag, = "0" is always returned although the data is written to disk. This poses a problem in our benchmark which uses the return value as number= of bytes written. Also it seems that such behavior infringes the POSIX wri= te() contract. Attached is a small unit test in c++. The unit test creates 2 files which are exactly the same, both filled rando= mly with numbers 0-9. Afterwards the both files are closed. Then one file is reopened and filled with 1's. Running the test: $ g++ temp.cc $ ./a.out 100 (this is the number of bytes in the files) Each time 0 is returned it is printed out on the screen. Run the executable a.out from within a directory on a ceph file system. After the program finishes you will find 2 files: ./test - filled with one's ./test.start - filled with random numeric data If you run this test on NFS and ceph you will see that no errors are printe= d out on the NFS file system, and 100 errors are printed out on ceph. Thanks, Roman & Roman -----Original Message----- From: Sage Weil [mailto:sage@newdream.net]=20 Sent: Friday, February 19, 2010 8:39 PM To: Talyansky, Roman Cc: ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck Hi Roman, On Fri, 19 Feb 2010, Talyansky, Roman wrote: > Since I test several ceph versions simultaneously I could confuse the err= or checking at different nodes. > I'll double check this and let you know. Thanks. If you haven't switched to the just-released 0.19, now might be=20 the time to do that. > > It also looks like the IO is synchronous, which may have something=20 > > to do with your performance. Are you mounting with -o sync or using=20 > > direct IO, or are multiple clients reading and writing to the same file= or=20 > > something? > > The IO is indeed synchronous. However the performance under ceph is much= =20 > worse than even under nfs, which looks strange. I do not mount with -o=20 > synch. And in our experiments multiple clients read and write the same=20 > file. If you are accessing the same file from multiple clients, then any=20 comparison with nfs is going to be misleading. NFS provides only close to= =20 open consistency, so IO will be buffered and inconsistent. Ceph provides=20 fully consistent semantics by switching to synchronous IO when there are=20 multiple clients. Ceph will be slower, but correct; nfs will be fast, but= =20 incorrect. If your application is smart enough to handle it's own consistency (each=20 client is writing to a different region of the file) then you probably=20 want something along the lines of O_LAZY [1], so that the application can=20 tell the FS not to worry about consistency and stick with buffered IO. =20 Unfortunately O_LAZY doesn't exist in Linux at this point. There is some=20 preliminary support for it in Ceph... if that's what you're looking for,=20 we can cook up some patches for you. If you can find us in #ceph on irc.oftc.net that might be a quicker way to= =20 diagnose the performance problems with your workload. Thanks! sage [1] http://www.pdl.cmu.edu/posix/docs/posix_lazy_io.pdf --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A34414FDEWDFECCR01wd_ Content-Type: application/octet-stream; name="temp.cc" Content-Description: temp.cc Content-Disposition: attachment; filename="temp.cc"; size=1658; creation-date="Tue, 23 Feb 2010 14:22:44 GMT"; modification-date="Tue, 23 Feb 2010 14:21:26 GMT" Content-Transfer-Encoding: base64 I2luY2x1ZGUgPHN5cy90eXBlcy5oPgojaW5jbHVkZSA8ZGlyZW50Lmg+CiNpbmNsdWRlIDxlcnJu by5oPgojaW5jbHVkZSA8dmVjdG9yPgojaW5jbHVkZSA8c3RyaW5nPgojaW5jbHVkZSA8aW9zdHJl YW0+CiNpbmNsdWRlIDxmc3RyZWFtPgojaW5jbHVkZSA8c3RkbGliLmg+CiNpbmNsdWRlIDxmY250 bC5oPgojaW5jbHVkZSA8dW5pc3RkLmg+CiNpbmNsdWRlIDxlcnJuby5oPgojaW5jbHVkZSA8c3Ry aW5nLmg+CnVzaW5nIG5hbWVzcGFjZSBzdGQ7CgoKI2RlZmluZSBCVUZfTEVOIDEwMDAwMAojZGVm aW5lIEJFR0lOX0lOUFVUX0ZfTkFNRSAidGVzdC5zdGFydCIKI2RlZmluZSBGX05BTUUgInRlc3Qi CmludCBtYWluKGludCBhcmdjLCBjaGFyKiogYXJndikKewoKCQkJCXN3aXRjaChhcmdjKXsKCQkJ CQkJCQljYXNlIDI6CgkJCQkJCQkJCQkJCWNvdXQ8PCJGaWxlIHNpemUgaXMgIjw8YXRvaShhcmd2 WzFdKTw8ZW5kbDsJCQoJCQkJCQkJCQkJCQlicmVhazsJCgkJCQkJCQkJZGVmYXVsdDoKCQkJCQkJ CQkJCQkJY2Vycjw8IlVzYWdlOiAiPDxhcmd2WzBdPDwiIHNpemUgb2YgZmlsZSI8PGVuZGw7CgkJ CQkJCQkJCQkJCWV4aXQoMSk7CgkJCQl9CgkJCQlpbnQgZlNpemUJPSBhdG9pKGFyZ3ZbMV0pOwoJ CQkJb2ZzdHJlYW0gYmVnaW5GaWxlOwoJCQkJb2ZzdHJlYW0gd29ya0ZpbGU7CgkJCQliZWdpbkZp bGUub3BlbihCRUdJTl9JTlBVVF9GX05BTUUpOwoJCQkJd29ya0ZpbGUub3BlbihGX05BTUUpOwoK CQkJCWludCByYW49MDsKCQkJCWZvcihpbnQgaT0wO2k8ZlNpemU7aSsrKXsKCQkJCQkJCQlyYW49 cmFuZCgpOwoJCQkJCQkJCXJhbj00OCtyYW4lMTA7CgkJCQkJCQkJYmVnaW5GaWxlPDwoY2hhcily YW47CgkJCQkJCQkJd29ya0ZpbGU8PChjaGFyKXJhbjsKCQkJCX0KCQkJCWJlZ2luRmlsZS5jbG9z ZSgpOwoJCQkJd29ya0ZpbGUuY2xvc2UoKTsKCgkJCQljaGFyIGJ1ZmZbXT17NDl9OwoJCQkJLy9T dGFydCBmaWxsaW5nIGZpbGVzIHdpdGggb25lcwoJCQkJLy8KCgkJCQlpbnQgZmxhZ3MgPSBPX1NZ TkN8T19SRFdSOwoJCQkJaW50IGZkID0gOjpvcGVuKEZfTkFNRSwgZmxhZ3MpOwoJCQkJaWYgKGZk IDw9IDApIHsKCQkJCQkJCQljZXJyIDw8ICIgb3BlbiBwcm9ibGVtIHdpdGg6ICIgPDwgRl9OQU1F IDw8IGVuZGw7CgkJCQl9CgoJCQkJZm9yKGludCBpID0gMDsgaSA8ZlNpemU7IGkrKyl7CgkJCQkJ CQkJb2ZmX3QgcmVzID0gOjpsc2VlayhmZCwgaSwgU0VFS19TRVQpOwoJCQkJCQkJCWlmIChyZXMg IT0gaSkgewoJCQkJCQkJCQkJCQljZXJyIDw8ICJzZWVrIG9wIGZhaWxlZCByZXM9IiA8PCByZXMg PDwgIiBvZmZzZXQ9IiA8PCBpIDw8IGVuZGw7CgkJCQkJCQkJfQoJCQkJCQkJCXJlcyA9IDo6d3Jp dGUoZmQsYnVmZiwxICk7CgkJCQkJCQkJaWYgKHJlcyAhPSAxKXsKCQkJCQkJCQkJCQkJY2VyciA8 PCAicmVzPSIgPDwgcmVzIDw8ICIgd3JpdGUgZXJyb3I9IiA8PCBzdHJlcnJvcihlcnJubykgPDwg c3RkOjplbmRsOwoJCQkJCQkJCX0KCgkJCQl9CgoJCQkJaW50IHJlc19jbG9zZSA9IDo6Y2xvc2Uo ZmQpOyAKCQkJCWlmIChyZXNfY2xvc2UgPT0gLTEpewoJCQkJCQkJCWNlcnIgPDwgImNsb3NlIGVy cm9yPSIgPDwgc3RyZXJyb3IoZXJybm8pIDw8IHN0ZDo6ZW5kbDsKCQkJCX0KCgkJCQlleGl0KDAp Owp9Cgo= --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A34414FDEWDFECCR01wd_ Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A34414FDEWDFECCR01wd_ Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel --_002_C6A64D82E3A5D24B949315CFBC1FA1AD072A34414FDEWDFECCR01wd_-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yehuda Sadeh Weinraub Subject: Re: Write operation is stuck Date: Tue, 23 Feb 2010 10:11:14 -0800 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1466363504195151363==" Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: "Talyansky, Roman" Cc: Sage Weil , "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org --===============1466363504195151363== Content-Type: multipart/alternative; boundary=001636b14b85b6f4cb048048795f --001636b14b85b6f4cb048048795f Content-Type: text/plain; charset=ISO-8859-1 On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman wrote: > Hi Sage, > > As you advised us, we switched to the release 0.19 of ceph and ran into > another bug in the ceph client. When writing to a file with the O_SYNC flag, > "0" is always returned although the data is written to disk. > This poses a problem in our benchmark which uses the return value as number > of bytes written. Also it seems that such behavior infringes the POSIX > write() contract. > > Yeah, thanks. A fix was pushed to the unstable branch. We will probably start maintaining a stable version that will contain such fixes, but you can apply this in the mean time: diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 2c4ae44..88932c9 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov, struct ceph_osd_client *osdc = &ceph_client(inode->i_sb)->osdc; loff_t endoff = pos + iov->iov_len; int got = 0; - int ret; + int ret, err; if (ceph_snap(inode) != CEPH_NOSNAP) return -EROFS; @@ -838,9 +838,12 @@ retry_snap: if ((ret >= 0 || ret == -EIOCBQUEUED) && ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host) - || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) - ret = vfs_fsync_range(file, file->f_path.dentry, + || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) { + err = vfs_fsync_range(file, file->f_path.dentry, pos, pos + ret - 1, 1); + if (err < 0) + ret = err; + } } if (ret >= 0) { spin_lock(&inode->i_lock); Yehuda --001636b14b85b6f4cb048048795f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable


On Tue, Feb 23, 2010 at= 6:11 AM, Talyansky, Roman <roman.talyansky@sap.com> wrote:
Hi Sage,

As you advised us, we switched to the release 0.19 of ceph and ran into ano= ther bug in the ceph client. When writing to a file with the O_SYNC flag, = =A0"0" is always returned although the data is written to disk. This poses a problem in our benchmark which uses the return value as number= of bytes written. Also it seems that such behavior infringes the POSIX wri= te() contract.


Yeah, thanks. A fix = was pushed to the unstable branch. We will probably start maintaining a sta= ble version that will contain such fixes, but you can apply this in the mea= n time:

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
=
index 2c4ae44..88932c9 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -807,7 +807,7 @@ static ssize_t ceph_aio= _write(struct kiocb *iocb, const struct iovec *iov,
=A0=A0 =A0 =A0 =A0struct ceph_osd_client *osdc =3D &ceph_client(in= ode->i_sb)->osdc;
=A0=A0 =A0 =A0 =A0loff_t endoff =3D pos += iov->iov_len;
=A0=A0 =A0 =A0 =A0int got =3D 0;
- = =A0 =A0 =A0 int ret;
+ =A0 =A0 =A0 int ret, err;

=A0=A0 =A0 =A0 =A0if (ceph_snap(inode) !=3D CEPH_NOSNAP= )
=A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return -EROFS;
@@ -= 838,9 +838,12 @@ retry_snap:

=A0=A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0if ((ret >=3D 0 || ret =3D=3D -EIOCBQUEUED) &&
=A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0((file->f_flags & O_S= YNC) || IS_SYNC(file->f_mapping->host)
- =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0|| ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEA= RFULL)))
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D vf= s_fsync_range(file, file->f_path.dentry,
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|| ceph_osdmap_flag(osdc->= osdmap, CEPH_OSDMAP_NEARFULL))) {
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 err =3D vfs_fsync_range(file, file->f_path.dentry,
=
=A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0pos, pos + ret - 1, 1);
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (err < 0)
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D err= ;
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
=A0=A0 =A0 =A0 =A0}
=A0=A0 =A0 =A0 =A0if (ret >=3D 0) {
=A0=A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0spin_lock(&inode->i_lock);



Yehuda
<= /div> --001636b14b85b6f4cb048048795f-- --===============1466363504195151363== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev --===============1466363504195151363== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel --===============1466363504195151363==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Re: Write operation is stuck Date: Wed, 24 Feb 2010 14:34:23 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1446135999712393956==" Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: Yehuda Sadeh Weinraub Cc: Sage Weil , "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org --===============1446135999712393956== Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_C6A64D82E3A5D24B949315CFBC1FA1AD072A3B5200DEWDFECCR01wd_" --_000_C6A64D82E3A5D24B949315CFBC1FA1AD072A3B5200DEWDFECCR01wd_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Yehuda, Thanks for the info on the fix. I'll incorporate it into the code and rerun= the experiments. It also seems that the code at that location became a bit more complex - ne= w #if occurred: #if LINUX_VERSION_CODE >=3D KERNEL_VERSION(2, 6, 32) And consequently the code under #else should be fixed as well. Thanks, Roman From: Yehuda Sadeh Weinraub [mailto:yehudasa@gmail.com] Sent: Tuesday, February 23, 2010 8:11 PM To: Talyansky, Roman Cc: Sage Weil; ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman > wrote: Hi Sage, As you advised us, we switched to the release 0.19 of ceph and ran into ano= ther bug in the ceph client. When writing to a file with the O_SYNC flag, = "0" is always returned although the data is written to disk. This poses a problem in our benchmark which uses the return value as number= of bytes written. Also it seems that such behavior infringes the POSIX wri= te() contract. Yeah, thanks. A fix was pushed to the unstable branch. We will probably sta= rt maintaining a stable version that will contain such fixes, but you can a= pply this in the mean time: diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 2c4ae44..88932c9 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const= struct iovec *iov, struct ceph_osd_client *osdc =3D &ceph_client(inode->i_sb)->osdc; loff_t endoff =3D pos + iov->iov_len; int got =3D 0; - int ret; + int ret, err; if (ceph_snap(inode) !=3D CEPH_NOSNAP) return -EROFS; @@ -838,9 +838,12 @@ retry_snap: if ((ret >=3D 0 || ret =3D=3D -EIOCBQUEUED) && ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->h= ost) - || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL= ))) - ret =3D vfs_fsync_range(file, file->f_path.dentry, + || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL= ))) { + err =3D vfs_fsync_range(file, file->f_path.dentry, pos, pos + ret - 1, 1); + if (err < 0) + ret =3D err; + } } if (ret >=3D 0) { spin_lock(&inode->i_lock); Yehuda --_000_C6A64D82E3A5D24B949315CFBC1FA1AD072A3B5200DEWDFECCR01wd_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Yehuda,

 

Thanks for the info on the fix. I’ll incorporate it in= to the code and rerun the experiments.

It also seems that the code at that location became a bit mo= re complex – new #if occurred:

 

#if LINUX_VERSION_CODE &g= t;=3D KERNEL_VERSION(2, 6, 32)

 

And consequently the code under #else should be fixed as wel= l.

 

Thanks,

 

Roman

 

From: Yehuda Sadeh Weinraub [mailto:yehudasa@gmail.com]
Sent: Tuesday, February 23, 2010 8:11 PM
To: Talyansky, Roman
Cc: Sage Weil; ceph-devel@lists.sourceforge.net
Subject: Re: [ceph-devel] Write operation is stuck
=

 

 

On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman <= roman.talyansky@sap.com> wro= te:

Hi Sage,

As you advised us, we switched to the release 0.19 of ceph and ran into ano= ther bug in the ceph client. When writing to a file with the O_SYNC flag,  "0" is always returned although the data is written to disk= .
This poses a problem in our benchmark which uses the return value as number= of bytes written. Also it seems that such behavior infringes the POSIX write() contract.

 

Yeah, thanks. A fix was pushed to the unstable branch.= We will probably start maintaining a stable version that will contain such fix= es, but you can apply this in the mean time:

 

diff --git a/fs/ceph/file.c b/fs/ceph/file.c

index 2c4ae44..88932c9 100644

--- a/fs/ceph/file.c

+++ b/fs/ceph/file.c

@@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(stru= ct kiocb *iocb, const struct iovec *iov,

        struct ceph_osd_clien= t *osdc =3D &ceph_client(inode->i_sb)->osdc;

        loff_t endoff =3D pos= + iov->iov_len;

        int got =3D 0;

-       int ret;

+       int ret, err;

 

        if (ceph_snap(inode) = !=3D CEPH_NOSNAP)

                return -EROFS;

@@ -838,9 +838,12 @@ retry_snap:

 

                if ((ret >=3D 0 || ret =3D=3D -EIOCBQUEUED) &&<= /p>

                    ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host)

-               &nb= sp;    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL)))

-               &nb= sp;       ret =3D vfs_fsync_range(file, file->f_path.dentry,<= o:p>

+               &nb= sp;    || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) {=

+               &nb= sp;       err =3D vfs_fsync_range(file, file->f_path.dentry,<= o:p>

                                    = ;          pos, pos + ret - 1, 1);

+               &nb= sp;       if (err < 0)

+               &nb= sp;               ret =3D err;

+               }

        }

        if (ret >=3D 0) {<= o:p>

                spin_lock(&inode->i_lock);

 

 

 

Yehuda

--_000_C6A64D82E3A5D24B949315CFBC1FA1AD072A3B5200DEWDFECCR01wd_-- --===============1446135999712393956== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev --===============1446135999712393956== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel --===============1446135999712393956==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Write operation is stuck Date: Wed, 24 Feb 2010 06:56:13 -0800 (PST) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: "Talyansky, Roman" Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org On Wed, 24 Feb 2010, Talyansky, Roman wrote: > Hi Yehuda, > > Thanks for the info on the fix. I'll incorporate it into the code and rerun the experiments. > It also seems that the code at that location became a bit more complex - new #if occurred: > > #if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 32) > > And consequently the code under #else should be fixed as well. Yeah. I pushed this fix (and another fix that comes up when there's >1 mds) to the stable 'master' branch of ceph-client.git and ceph-client-standalone.git. The 'master-backport' branch of ceph-client-standalone.git has the backport #ifdefs (and builds back to 2.6.28 or so). Thanks! sage > > Thanks, > > Roman > > From: Yehuda Sadeh Weinraub [mailto:yehudasa@gmail.com] > Sent: Tuesday, February 23, 2010 8:11 PM > To: Talyansky, Roman > Cc: Sage Weil; ceph-devel@lists.sourceforge.net > Subject: Re: [ceph-devel] Write operation is stuck > > > On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman > wrote: > Hi Sage, > > As you advised us, we switched to the release 0.19 of ceph and ran into another bug in the ceph client. When writing to a file with the O_SYNC flag, "0" is always returned although the data is written to disk. > This poses a problem in our benchmark which uses the return value as number of bytes written. Also it seems that such behavior infringes the POSIX write() contract. > > Yeah, thanks. A fix was pushed to the unstable branch. We will probably start maintaining a stable version that will contain such fixes, but you can apply this in the mean time: > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index 2c4ae44..88932c9 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov, > struct ceph_osd_client *osdc = &ceph_client(inode->i_sb)->osdc; > loff_t endoff = pos + iov->iov_len; > int got = 0; > - int ret; > + int ret, err; > > if (ceph_snap(inode) != CEPH_NOSNAP) > return -EROFS; > @@ -838,9 +838,12 @@ retry_snap: > > if ((ret >= 0 || ret == -EIOCBQUEUED) && > ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host) > - || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) > - ret = vfs_fsync_range(file, file->f_path.dentry, > + || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) { > + err = vfs_fsync_range(file, file->f_path.dentry, > pos, pos + ret - 1, 1); > + if (err < 0) > + ret = err; > + } > } > if (ret >= 0) { > spin_lock(&inode->i_lock); > > > > Yehuda > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Re: Write operation is stuck Date: Wed, 24 Feb 2010 17:42:20 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: Sage Weil Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org Hi Sage, Besides the bug with the return value of write operation, the system hangs in a write operation. You can access the trace files of ceph servers at https://sapmats-de.sap-ag.de/download/download.cgi?id=DHTK24DYMIH8MJEYGMO5GT1GKDLBR9IJDKFAA3R0C4D1JGTYVW The traces were collected without "debug mds = 20" in the configuration file. I still have the hang system available. I also run the systems to regenerate the hang with "debug mds = 20" defined. It would be great if we could have a chat to resolve the hang in about 4 hours. Thanks, Roman -----Original Message----- From: Sage Weil [mailto:sage@newdream.net] Sent: Wednesday, February 24, 2010 4:56 PM To: Talyansky, Roman Cc: Yehuda Sadeh Weinraub; ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck On Wed, 24 Feb 2010, Talyansky, Roman wrote: > Hi Yehuda, > > Thanks for the info on the fix. I'll incorporate it into the code and rerun the experiments. > It also seems that the code at that location became a bit more complex - new #if occurred: > > #if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 32) > > And consequently the code under #else should be fixed as well. Yeah. I pushed this fix (and another fix that comes up when there's >1 mds) to the stable 'master' branch of ceph-client.git and ceph-client-standalone.git. The 'master-backport' branch of ceph-client-standalone.git has the backport #ifdefs (and builds back to 2.6.28 or so). Thanks! sage > > Thanks, > > Roman > > From: Yehuda Sadeh Weinraub [mailto:yehudasa@gmail.com] > Sent: Tuesday, February 23, 2010 8:11 PM > To: Talyansky, Roman > Cc: Sage Weil; ceph-devel@lists.sourceforge.net > Subject: Re: [ceph-devel] Write operation is stuck > > > On Tue, Feb 23, 2010 at 6:11 AM, Talyansky, Roman > wrote: > Hi Sage, > > As you advised us, we switched to the release 0.19 of ceph and ran into another bug in the ceph client. When writing to a file with the O_SYNC flag, "0" is always returned although the data is written to disk. > This poses a problem in our benchmark which uses the return value as number of bytes written. Also it seems that such behavior infringes the POSIX write() contract. > > Yeah, thanks. A fix was pushed to the unstable branch. We will probably start maintaining a stable version that will contain such fixes, but you can apply this in the mean time: > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index 2c4ae44..88932c9 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -807,7 +807,7 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov, > struct ceph_osd_client *osdc = &ceph_client(inode->i_sb)->osdc; > loff_t endoff = pos + iov->iov_len; > int got = 0; > - int ret; > + int ret, err; > > if (ceph_snap(inode) != CEPH_NOSNAP) > return -EROFS; > @@ -838,9 +838,12 @@ retry_snap: > > if ((ret >= 0 || ret == -EIOCBQUEUED) && > ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host) > - || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) > - ret = vfs_fsync_range(file, file->f_path.dentry, > + || ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) { > + err = vfs_fsync_range(file, file->f_path.dentry, > pos, pos + ret - 1, 1); > + if (err < 0) > + ret = err; > + } > } > if (ret >= 0) { > spin_lock(&inode->i_lock); > > > > Yehuda > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Write operation is stuck Date: Wed, 24 Feb 2010 10:43:34 -0800 (PST) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: "Talyansky, Roman" Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org Hi Roman, On Wed, 24 Feb 2010, Talyansky, Roman wrote: > Hi Sage, > > Besides the bug with the return value of write operation, the system hangs in a write operation. > You can access the trace files of ceph servers at > https://sapmats-de.sap-ag.de/download/download.cgi?id=DHTK24DYMIH8MJEYGMO5GT1GKDLBR9IJDKFAA3R0C4D1JGTYVW The .tar.gz appears to be corrupt (gunzip complains)... :( > The traces were collected without "debug mds = 20" in the configuration > file. I still have the hang system available. I also run the systems to > regenerate the hang with "debug mds = 20" defined. > > It would be great if we could have a chat to resolve the hang in about 4 > hours. Sounds good. We should just be back from lunch, and will be in #ceph on irc.oftc.net, or on jabber! sage ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Re: Write operation is stuck Date: Thu, 25 Feb 2010 00:21:53 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: Sage Weil Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org Hi Sage, That's right the file was corrupted. Currently I have extremely slow network. So probably I'll open access to the non-corrupted file tomorrow. Meanwhile the system with higher trace level also hangs and I'll be able to send you more informative traces. Thanks, Roman -----Original Message----- From: Sage Weil [mailto:sage@newdream.net] Sent: Wednesday, February 24, 2010 8:44 PM To: Talyansky, Roman Cc: ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck Hi Roman, On Wed, 24 Feb 2010, Talyansky, Roman wrote: > Hi Sage, > > Besides the bug with the return value of write operation, the system hangs in a write operation. > You can access the trace files of ceph servers at > https://sapmats-de.sap-ag.de/download/download.cgi?id=DHTK24DYMIH8MJEYGMO5GT1GKDLBR9IJDKFAA3R0C4D1JGTYVW The .tar.gz appears to be corrupt (gunzip complains)... :( > The traces were collected without "debug mds = 20" in the configuration > file. I still have the hang system available. I also run the systems to > regenerate the hang with "debug mds = 20" defined. > > It would be great if we could have a chat to resolve the hang in about 4 > hours. Sounds good. We should just be back from lunch, and will be in #ceph on irc.oftc.net, or on jabber! sage ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Talyansky, Roman" Subject: Re: Write operation is stuck Date: Thu, 25 Feb 2010 11:07:08 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-devel-bounces@lists.sourceforge.net To: Sage Weil Cc: "ceph-devel@lists.sourceforge.net" List-Id: ceph-devel.vger.kernel.org Hi Sage, The file with the new traces can be found at https://sapmats-de.sap-ag.de/download/download.cgi?id=A90NAAZ5KZG8IXQP7Z9Z084IKX6HF47GE2OEEPT740RBRSGJNO Thanks, Roman -----Original Message----- From: Sage Weil [mailto:sage@newdream.net] Sent: Wednesday, February 24, 2010 8:44 PM To: Talyansky, Roman Cc: ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck Hi Roman, On Wed, 24 Feb 2010, Talyansky, Roman wrote: > Hi Sage, > > Besides the bug with the return value of write operation, the system hangs in a write operation. > You can access the trace files of ceph servers at > https://sapmats-de.sap-ag.de/download/download.cgi?id=DHTK24DYMIH8MJEYGMO5GT1GKDLBR9IJDKFAA3R0C4D1JGTYVW The .tar.gz appears to be corrupt (gunzip complains)... :( > The traces were collected without "debug mds = 20" in the configuration > file. I still have the hang system available. I also run the systems to > regenerate the hang with "debug mds = 20" defined. > > It would be great if we could have a chat to resolve the hang in about 4 > hours. Sounds good. We should just be back from lunch, and will be in #ceph on irc.oftc.net, or on jabber! sage ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bogdan Lobodzinski Subject: Write operation is stuck Date: Fri, 27 Aug 2010 12:18:29 +0000 (UTC) Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from lo.gmane.org ([80.91.229.12]:32912 "EHLO lo.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752524Ab0H0MZG (ORCPT ); Fri, 27 Aug 2010 08:25:06 -0400 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OoxzU-0008Km-Kh for ceph-devel@vger.kernel.org; Fri, 27 Aug 2010 14:25:04 +0200 Received: from h1bombeiros.desy.de ([131.169.60.116]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 27 Aug 2010 14:25:04 +0200 Received: from bogdan by h1bombeiros.desy.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 27 Aug 2010 14:25:04 +0200 Sender: ceph-devel-owner@vger.kernel.org List-ID: To: ceph-devel@vger.kernel.org Hello, working with ceph on my test configuration (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP) After starting svn co https://root.cern.ch/svn/root/trunk root on the /ceph directory, the command become stuck, and also: root 5303 0.0 0.0 0 0 ? D Aug26 0:00 [kjournald] root 30181 0.0 0.0 6972 2056 pts/1 D+ 13:46 0:00 /usr//bin/cosd -i 2 -c /etc/ceph/ceph.conf any mount, unmount are going also to the state D. This is a permanennt behaviour of the ceph if the command is started. dmesg shows: ------------- [99048.567704] ------------[ cut here ]------------ [99048.568767] kernel BUG at /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384! [99048.568767] invalid opcode: 0000 [#1] SMP [99048.568767] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas usbhid mptsas mptscsih mptbase scsi_transport_sas [99048.596652] [99048.596652] Pid: 6258, comm: cosd Tainted: P (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950 [99048.596652] EIP: 0060:[] EFLAGS: 00210296 CPU: 3 [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000 [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14 [99048.596652] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300 task.ti=f5cca000) [99048.596652] Stack: [99048.596652] 00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c f6dd5494 02147fff [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428 f1058500 00000000 [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0 f5ccbcb4 f5ccbc90 [99048.596652] Call Trace: [99048.596652] [] ? read_block_bitmap+0x48/0x160 [99048.596652] [] ? ext3_new_blocks+0x228/0x6c0 [99048.596652] [] ? mb_cache_entry_find_first+0x67/0x80 [99048.596652] [] ? ext3_new_block+0x25/0x30 [99048.596652] [] ? ext3_xattr_block_set+0x554/0x670 [99048.596652] [] ? ext3_xattr_set_entry+0x29/0x350 [99048.596652] [] ? ext3_xattr_set_handle+0x2cb/0x3e0 [99048.596652] [] ? ext3_xattr_set+0x75/0xc0 [99048.596652] [] ? ext3_xattr_user_set+0x76/0x80 [99048.596652] [] ? generic_setxattr+0x9c/0xb0 [99048.596652] [] ? generic_setxattr+0x0/0xb0 [99048.596652] [] ? __vfs_setxattr_noperm+0x44/0x160 [99048.596652] [] ? cap_inode_setxattr+0x2c/0x60 [99048.596652] [] ? vfs_setxattr+0x91/0xa0 [99048.596652] [] ? setxattr+0xb8/0x110 [99048.596652] [] ? __link_path_walk+0x632/0xca0 [99048.596652] [] ? enqueue_task_fair+0x39/0x80 [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 [99048.596652] [] ? path_put+0x25/0x30 [99048.596652] [] ? putname+0x2b/0x40 [99048.596652] [] ? user_path_at+0x4a/0x80 [99048.596652] [] ? sys_futex+0x72/0x120 [99048.596652] [] ? sys_setxattr+0x83/0x90 [99048.596652] [] ? sysenter_do_call+0x12/0x28 [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 ff<0f> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53 [99048.596652] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 SS:ESP 0068:f5ccbc14 [99049.044090] ---[ end trace 35860103963ee444 ]--- h1farm184# -------------------- my ceph.conf is: ------- [global] pid file = /var/run/ceph/$name.pid debug ms = 1 keyring = /etc/ceph/keyring.bin ; monitors [mon] ;Directory for monitor files mon data = /x02/mon$id debug mon = 20 debug paxos = 20 mon lease wiggle room = 0.5 [mon0] host = h1farm182 mon addr = xxx.xxx.xx.116:6789 [mon1] host = h1farm183 mon addr = xxx.xxx.xx.117:6789 ; metadata servers [mds] debug mds = 20 mds log max segments = 2 keyring = /etc/ceph/keyring.$name [mds0] host = h1farm182 [mds1] host = h1farm183 [osd] sudo = true osd data = /x02/osd$id osd journal = /x02/osd$id/journal osd journal size = 100 keyring = /etc/ceph/keyring.$name debug osd = 20 debug journal = 20 debug filestore = 20 ;osd journal size = 100 [osd0] host = h1farm182 [osd1] host = h1farm183 [osd2] host = h1farm184 ------- Any idea how to improve the situation ? From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wido den Hollander Subject: Re: Write operation is stuck Date: Fri, 27 Aug 2010 17:42:36 +0200 Message-ID: <1282923756.2548.137.camel@wido-laptop.pcextreme.nl> References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from smtp02.mail.pcextreme.nl ([109.72.87.138]:51330 "EHLO smtp02.mail.pcextreme.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753016Ab0H0Pmi (ORCPT ); Fri, 27 Aug 2010 11:42:38 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Bogdan Lobodzinski Cc: ceph-devel@vger.kernel.org Hi Bogdan, Are you running your OSD data on ext3? It seems that you are hitting some ext3 bug. Could you try changing to btrfs? This since ext is not yet fully supported. Wido On Fri, 2010-08-27 at 12:18 +0000, Bogdan Lobodzinski wrote: > Hello, > > working with ceph on my test configuration > (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP) > After starting > svn co https://root.cern.ch/svn/root/trunk root > > on the /ceph directory, the command become stuck, and also: > root 5303 0.0 0.0 0 0 ? D Aug26 0:00 [kjournald] > root 30181 0.0 0.0 6972 2056 pts/1 D+ 13:46 0:00 /usr//bin/cosd > -i 2 -c /etc/ceph/ceph.conf > > any mount, unmount are going also to the state D. > This is a permanennt behaviour of the ceph if the command is started. > > dmesg shows: > ------------- > [99048.567704] ------------[ cut here ]------------ > [99048.568767] kernel BUG at > /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384! > [99048.568767] invalid opcode: 0000 [#1] SMP > [99048.568767] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device > [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph > crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga > vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac > edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas > usbhid mptsas mptscsih mptbase scsi_transport_sas > [99048.596652] > [99048.596652] Pid: 6258, comm: cosd Tainted: P > (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950 > [99048.596652] EIP: 0060:[] EFLAGS: 00210296 CPU: 3 > [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000 > [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14 > [99048.596652] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300 > task.ti=f5cca000) > [99048.596652] Stack: > [99048.596652] 00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c > f6dd5494 02147fff > [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428 > f1058500 00000000 > [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0 > f5ccbcb4 f5ccbc90 > [99048.596652] Call Trace: > [99048.596652] [] ? read_block_bitmap+0x48/0x160 > [99048.596652] [] ? ext3_new_blocks+0x228/0x6c0 > [99048.596652] [] ? mb_cache_entry_find_first+0x67/0x80 > [99048.596652] [] ? ext3_new_block+0x25/0x30 > [99048.596652] [] ? ext3_xattr_block_set+0x554/0x670 > [99048.596652] [] ? ext3_xattr_set_entry+0x29/0x350 > [99048.596652] [] ? ext3_xattr_set_handle+0x2cb/0x3e0 > [99048.596652] [] ? ext3_xattr_set+0x75/0xc0 > [99048.596652] [] ? ext3_xattr_user_set+0x76/0x80 > [99048.596652] [] ? generic_setxattr+0x9c/0xb0 > [99048.596652] [] ? generic_setxattr+0x0/0xb0 > [99048.596652] [] ? __vfs_setxattr_noperm+0x44/0x160 > [99048.596652] [] ? cap_inode_setxattr+0x2c/0x60 > [99048.596652] [] ? vfs_setxattr+0x91/0xa0 > [99048.596652] [] ? setxattr+0xb8/0x110 > [99048.596652] [] ? __link_path_walk+0x632/0xca0 > [99048.596652] [] ? enqueue_task_fair+0x39/0x80 > [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 > [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 > [99048.596652] [] ? path_put+0x25/0x30 > [99048.596652] [] ? putname+0x2b/0x40 > [99048.596652] [] ? user_path_at+0x4a/0x80 > [99048.596652] [] ? sys_futex+0x72/0x120 > [99048.596652] [] ? sys_setxattr+0x83/0x90 > [99048.596652] [] ? sysenter_do_call+0x12/0x28 > [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 > 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 ff<0f> > 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53 > [99048.596652] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > SS:ESP 0068:f5ccbc14 > [99049.044090] ---[ end trace 35860103963ee444 ]--- > h1farm184# > -------------------- > > my ceph.conf is: > ------- > [global] > pid file = /var/run/ceph/$name.pid > debug ms = 1 > keyring = /etc/ceph/keyring.bin > ; monitors > [mon] > ;Directory for monitor files > mon data = /x02/mon$id > debug mon = 20 > debug paxos = 20 > mon lease wiggle room = 0.5 > > [mon0] > host = h1farm182 > mon addr = xxx.xxx.xx.116:6789 > [mon1] > host = h1farm183 > mon addr = xxx.xxx.xx.117:6789 > ; metadata servers > [mds] > debug mds = 20 > mds log max segments = 2 > keyring = /etc/ceph/keyring.$name > [mds0] > host = h1farm182 > [mds1] > host = h1farm183 > [osd] > sudo = true > osd data = /x02/osd$id > osd journal = /x02/osd$id/journal > osd journal size = 100 > keyring = /etc/ceph/keyring.$name > debug osd = 20 > debug journal = 20 > debug filestore = 20 > ;osd journal size = 100 > [osd0] > host = h1farm182 > [osd1] > host = h1farm183 > [osd2] > host = h1farm184 > > ------- > > Any idea how to improve the situation ? > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Write operation is stuck Date: Fri, 27 Aug 2010 09:09:03 -0700 (PDT) Message-ID: References: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Return-path: Received: from cobra.newdream.net ([66.33.216.30]:57368 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752829Ab0H0QIQ (ORCPT ); Fri, 27 Aug 2010 12:08:16 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Bogdan Lobodzinski Cc: ceph-devel@vger.kernel.org Hi Bogdan, This is a bug in the ext3 xattr code. It seems to be gone in 2.6.34 and later. Or, you can switch to btrfs! sage On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote: > Hello, > > working with ceph on my test configuration > (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP) > After starting > svn co https://root.cern.ch/svn/root/trunk root > > on the /ceph directory, the command become stuck, and also: > root 5303 0.0 0.0 0 0 ? D Aug26 0:00 [kjournald] > root 30181 0.0 0.0 6972 2056 pts/1 D+ 13:46 0:00 /usr//bin/cosd > -i 2 -c /etc/ceph/ceph.conf > > any mount, unmount are going also to the state D. > This is a permanennt behaviour of the ceph if the command is started. > > dmesg shows: > ------------- > [99048.567704] ------------[ cut here ]------------ > [99048.568767] kernel BUG at > /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384! > [99048.568767] invalid opcode: 0000 [#1] SMP > [99048.568767] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device > [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph > crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga > vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac > edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas > usbhid mptsas mptscsih mptbase scsi_transport_sas > [99048.596652] > [99048.596652] Pid: 6258, comm: cosd Tainted: P > (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950 > [99048.596652] EIP: 0060:[] EFLAGS: 00210296 CPU: 3 > [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000 > [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14 > [99048.596652] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300 > task.ti=f5cca000) > [99048.596652] Stack: > [99048.596652] 00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c > f6dd5494 02147fff > [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428 > f1058500 00000000 > [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0 > f5ccbcb4 f5ccbc90 > [99048.596652] Call Trace: > [99048.596652] [] ? read_block_bitmap+0x48/0x160 > [99048.596652] [] ? ext3_new_blocks+0x228/0x6c0 > [99048.596652] [] ? mb_cache_entry_find_first+0x67/0x80 > [99048.596652] [] ? ext3_new_block+0x25/0x30 > [99048.596652] [] ? ext3_xattr_block_set+0x554/0x670 > [99048.596652] [] ? ext3_xattr_set_entry+0x29/0x350 > [99048.596652] [] ? ext3_xattr_set_handle+0x2cb/0x3e0 > [99048.596652] [] ? ext3_xattr_set+0x75/0xc0 > [99048.596652] [] ? ext3_xattr_user_set+0x76/0x80 > [99048.596652] [] ? generic_setxattr+0x9c/0xb0 > [99048.596652] [] ? generic_setxattr+0x0/0xb0 > [99048.596652] [] ? __vfs_setxattr_noperm+0x44/0x160 > [99048.596652] [] ? cap_inode_setxattr+0x2c/0x60 > [99048.596652] [] ? vfs_setxattr+0x91/0xa0 > [99048.596652] [] ? setxattr+0xb8/0x110 > [99048.596652] [] ? __link_path_walk+0x632/0xca0 > [99048.596652] [] ? enqueue_task_fair+0x39/0x80 > [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 > [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 > [99048.596652] [] ? path_put+0x25/0x30 > [99048.596652] [] ? putname+0x2b/0x40 > [99048.596652] [] ? user_path_at+0x4a/0x80 > [99048.596652] [] ? sys_futex+0x72/0x120 > [99048.596652] [] ? sys_setxattr+0x83/0x90 > [99048.596652] [] ? sysenter_do_call+0x12/0x28 > [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 > 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 ff<0f> > 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53 > [99048.596652] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > SS:ESP 0068:f5ccbc14 > [99049.044090] ---[ end trace 35860103963ee444 ]--- > h1farm184# > -------------------- > > my ceph.conf is: > ------- > [global] > pid file = /var/run/ceph/$name.pid > debug ms = 1 > keyring = /etc/ceph/keyring.bin > ; monitors > [mon] > ;Directory for monitor files > mon data = /x02/mon$id > debug mon = 20 > debug paxos = 20 > mon lease wiggle room = 0.5 > > [mon0] > host = h1farm182 > mon addr = xxx.xxx.xx.116:6789 > [mon1] > host = h1farm183 > mon addr = xxx.xxx.xx.117:6789 > ; metadata servers > [mds] > debug mds = 20 > mds log max segments = 2 > keyring = /etc/ceph/keyring.$name > [mds0] > host = h1farm182 > [mds1] > host = h1farm183 > [osd] > sudo = true > osd data = /x02/osd$id > osd journal = /x02/osd$id/journal > osd journal size = 100 > keyring = /etc/ceph/keyring.$name > debug osd = 20 > debug journal = 20 > debug filestore = 20 > ;osd journal size = 100 > [osd0] > host = h1farm182 > [osd1] > host = h1farm183 > [osd2] > host = h1farm184 > > ------- > > Any idea how to improve the situation ? > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bogdan Lobodzinski Subject: Re: Write operation is stuck Date: Mon, 30 Aug 2010 17:32:24 +0200 (CEST) Message-ID: References: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Return-path: Received: from smtp-out-2.desy.de ([131.169.56.85]:43780 "EHLO smtp-out-2.desy.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755620Ab0H3P5u (ORCPT ); Mon, 30 Aug 2010 11:57:50 -0400 Received: from smtp-map-2.desy.de (smtp-map-2.desy.de [131.169.56.67]) by smtp-out-2.desy.de (DESY_OUT_1) with ESMTP id 6CB68C34 for ; Mon, 30 Aug 2010 17:32:24 +0200 (MEST) Received: from adserv71.win.desy.de (adserv71.win.desy.de [131.169.97.57]) by smtp-map-2.desy.de (DESY_MAP_2) with ESMTP id 6296EC2E for ; Mon, 30 Aug 2010 17:32:24 +0200 (MEST) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: ceph-devel@vger.kernel.org Hello Sage, I moved to the kernel 2.6.35, keeping ext3 filesystem. After executing teh same command: svn co https://root.cern.ch/svn/root/trunk root System is again dead. The command and kjournald are stuck bogdan 8539 0.9 0.6 31168 22040 pts/0 DL+ 16:44 0:21 svn co https://root.cern.ch/svn/root/trunk root root 802 0.0 0.0 0 0 ? D 12:59 0:01 [kjournald] Looks like the bug is not fixed, dmesg shows: --------- [14325.304068] kernel BUG at /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385! [14325.304191] invalid opcode: 0000 [#1] SMP [14325.304263] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp serio_raw mptsas mptscsih mptbase scsi_transport_sas [14325.304266] [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic #20~lucid2-Ubuntu 0DT097/PowerEdge 1950 [14325.304266] EIP: 0060:[] EFLAGS: 00210286 CPU: 1 [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000 [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10 [14325.304266] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70 task.ti=f5822000) [14325.304266] Stack: [14325.304266] 000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 c8641454 007b7fff [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6 c4063ec0 00000000 [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8 f5823cac c0256017 [14325.304266] Call Trace: [14325.304266] [] ? read_block_bitmap+0x48/0x160 [14325.304266] [] ? ext3_new_blocks+0x1ff/0x610 [14325.304266] [] ? mb_cache_entry_find_first+0x67/0x80 [14325.304266] [] ? ext3_new_block+0x25/0x30 [14325.304266] [] ? ext3_xattr_block_set+0x481/0x550 [14325.304266] [] ? ext3_xattr_set_entry+0x20/0x2f0 [14325.304266] [] ? ext3_xattr_set_handle+0x31b/0x400 [14325.304266] [] ? ext3_xattr_set+0x75/0xc0 [14325.304266] [] ? ext3_xattr_user_set+0x74/0x80 [14325.304266] [] ? generic_setxattr+0x9b/0xb0 [14325.304266] [] ? generic_setxattr+0x0/0xb0 [14325.304266] [] ? __vfs_setxattr_noperm+0x44/0x150 [14325.304266] [] ? cap_inode_setxattr+0x2c/0x60 [14325.304266] [] ? vfs_setxattr+0x91/0xa0 [14325.304266] [] ? setxattr+0xb8/0x110 [14325.304266] [] ? path_to_nameidata+0x1e/0x50 [14325.304266] [] ? link_path_walk+0x412/0x890 [14325.304266] [] ? enqueue_task_fair+0x39/0x80 [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 [14325.304266] [] ? putname+0x2b/0x40 [14325.304266] [] ? user_path_at+0x4a/0x80 [14325.304266] [] ? sys_futex+0x72/0x120 [14325.304266] [] ? sys_setxattr+0x83/0x90 [14325.304266] [] ? syscall_call+0x7/0xb [14325.304266] [] ? cache_add_dev+0x73/0x195 [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 32 ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff <0f> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b [14325.304266] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 SS:ESP 0068:f5823c10 [14325.326777] ---[ end trace 53e0b3b55af7a83c ]--- [14384.001261] ceph: mds0 caps stale [14413.616132] ceph: tid 33594 timed out on osd2, will reset osd [14628.992279] ceph: mds0 hung --------- as a next step I wil try to use btrfs . Cheers, Bogdan On Fri, 27 Aug 2010, Sage Weil wrote: > Hi Bogdan, > > This is a bug in the ext3 xattr code. It seems to be gone in 2.6.34 and > later. Or, you can switch to btrfs! > > sage > > > On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote: > >> Hello, >> >> working with ceph on my test configuration >> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP) >> After starting >> svn co https://root.cern.ch/svn/root/trunk root >> >> on the /ceph directory, the command become stuck, and also: >> root 5303 0.0 0.0 0 0 ? D Aug26 0:00 [kjournald] >> root 30181 0.0 0.0 6972 2056 pts/1 D+ 13:46 0:00 /usr//bin/cosd >> -i 2 -c /etc/ceph/ceph.conf >> >> any mount, unmount are going also to the state D. >> This is a permanennt behaviour of the ceph if the command is started. >> >> dmesg shows: >> ------------- >> [99048.567704] ------------[ cut here ]------------ >> [99048.568767] kernel BUG at >> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384! >> [99048.568767] invalid opcode: 0000 [#1] SMP >> [99048.568767] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device >> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph >> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga >> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac >> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas >> usbhid mptsas mptscsih mptbase scsi_transport_sas >> [99048.596652] >> [99048.596652] Pid: 6258, comm: cosd Tainted: P >> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950 >> [99048.596652] EIP: 0060:[] EFLAGS: 00210296 CPU: 3 >> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000 >> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14 >> [99048.596652] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300 >> task.ti=f5cca000) >> [99048.596652] Stack: >> [99048.596652] 00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c >> f6dd5494 02147fff >> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428 >> f1058500 00000000 >> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0 >> f5ccbcb4 f5ccbc90 >> [99048.596652] Call Trace: >> [99048.596652] [] ? read_block_bitmap+0x48/0x160 >> [99048.596652] [] ? ext3_new_blocks+0x228/0x6c0 >> [99048.596652] [] ? mb_cache_entry_find_first+0x67/0x80 >> [99048.596652] [] ? ext3_new_block+0x25/0x30 >> [99048.596652] [] ? ext3_xattr_block_set+0x554/0x670 >> [99048.596652] [] ? ext3_xattr_set_entry+0x29/0x350 >> [99048.596652] [] ? ext3_xattr_set_handle+0x2cb/0x3e0 >> [99048.596652] [] ? ext3_xattr_set+0x75/0xc0 >> [99048.596652] [] ? ext3_xattr_user_set+0x76/0x80 >> [99048.596652] [] ? generic_setxattr+0x9c/0xb0 >> [99048.596652] [] ? generic_setxattr+0x0/0xb0 >> [99048.596652] [] ? __vfs_setxattr_noperm+0x44/0x160 >> [99048.596652] [] ? cap_inode_setxattr+0x2c/0x60 >> [99048.596652] [] ? vfs_setxattr+0x91/0xa0 >> [99048.596652] [] ? setxattr+0xb8/0x110 >> [99048.596652] [] ? __link_path_walk+0x632/0xca0 >> [99048.596652] [] ? enqueue_task_fair+0x39/0x80 >> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 >> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 >> [99048.596652] [] ? path_put+0x25/0x30 >> [99048.596652] [] ? putname+0x2b/0x40 >> [99048.596652] [] ? user_path_at+0x4a/0x80 >> [99048.596652] [] ? sys_futex+0x72/0x120 >> [99048.596652] [] ? sys_setxattr+0x83/0x90 >> [99048.596652] [] ? sysenter_do_call+0x12/0x28 >> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 >> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 ff<0f> >> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53 >> [99048.596652] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >> SS:ESP 0068:f5ccbc14 >> [99049.044090] ---[ end trace 35860103963ee444 ]--- >> h1farm184# >> -------------------- >> >> my ceph.conf is: >> ------- >> [global] >> pid file = /var/run/ceph/$name.pid >> debug ms = 1 >> keyring = /etc/ceph/keyring.bin >> ; monitors >> [mon] >> ;Directory for monitor files >> mon data = /x02/mon$id >> debug mon = 20 >> debug paxos = 20 >> mon lease wiggle room = 0.5 >> >> [mon0] >> host = h1farm182 >> mon addr = xxx.xxx.xx.116:6789 >> [mon1] >> host = h1farm183 >> mon addr = xxx.xxx.xx.117:6789 >> ; metadata servers >> [mds] >> debug mds = 20 >> mds log max segments = 2 >> keyring = /etc/ceph/keyring.$name >> [mds0] >> host = h1farm182 >> [mds1] >> host = h1farm183 >> [osd] >> sudo = true >> osd data = /x02/osd$id >> osd journal = /x02/osd$id/journal >> osd journal size = 100 >> keyring = /etc/ceph/keyring.$name >> debug osd = 20 >> debug journal = 20 >> debug filestore = 20 >> ;osd journal size = 100 >> [osd0] >> host = h1farm182 >> [osd1] >> host = h1farm183 >> [osd2] >> host = h1farm184 >> >> ------- >> >> Any idea how to improve the situation ? >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Write operation is stuck Date: Mon, 30 Aug 2010 12:39:11 -0700 (PDT) Message-ID: References: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Return-path: Received: from cobra.newdream.net ([66.33.216.30]:40336 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751543Ab0H3TiQ (ORCPT ); Mon, 30 Aug 2010 15:38:16 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Bogdan Lobodzinski Cc: ceph-devel@vger.kernel.org On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote: > > Hello Sage, > > I moved to the kernel 2.6.35, keeping ext3 filesystem. > After executing teh same command: > svn co https://root.cern.ch/svn/root/trunk root > > System is again dead. The command and kjournald are stuck > bogdan 8539 0.9 0.6 31168 22040 pts/0 DL+ 16:44 0:21 svn co > https://root.cern.ch/svn/root/trunk root > root 802 0.0 0.0 0 0 ? D 12:59 0:01 [kjournald] Hmm. Have you tried ext4? I stopped seeing this on my own machine with recent kernels, but it looks like it isn't in fact fixed. This should be reported to the ext4 list. Are you running ceph via vstart.sh or a custom ceph.conf? sage > > Looks like the bug is not fixed, dmesg shows: > --------- > [14325.304068] kernel BUG at > /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385! > [14325.304191] invalid opcode: 0000 [#1] SMP > [14325.304263] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device > [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc > ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart > i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp > serio_raw mptsas mptscsih mptbase scsi_transport_sas > [14325.304266] > [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic > #20~lucid2-Ubuntu 0DT097/PowerEdge 1950 > [14325.304266] EIP: 0060:[] EFLAGS: 00210286 CPU: 1 > [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000 > [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10 > [14325.304266] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70 > task.ti=f5822000) > [14325.304266] Stack: > [14325.304266] 000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 c8641454 > 007b7fff > [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6 > c4063ec0 00000000 > [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8 > f5823cac c0256017 > [14325.304266] Call Trace: > [14325.304266] [] ? read_block_bitmap+0x48/0x160 > [14325.304266] [] ? ext3_new_blocks+0x1ff/0x610 > [14325.304266] [] ? mb_cache_entry_find_first+0x67/0x80 > [14325.304266] [] ? ext3_new_block+0x25/0x30 > [14325.304266] [] ? ext3_xattr_block_set+0x481/0x550 > [14325.304266] [] ? ext3_xattr_set_entry+0x20/0x2f0 > [14325.304266] [] ? ext3_xattr_set_handle+0x31b/0x400 > [14325.304266] [] ? ext3_xattr_set+0x75/0xc0 > [14325.304266] [] ? ext3_xattr_user_set+0x74/0x80 > [14325.304266] [] ? generic_setxattr+0x9b/0xb0 > [14325.304266] [] ? generic_setxattr+0x0/0xb0 > [14325.304266] [] ? __vfs_setxattr_noperm+0x44/0x150 > [14325.304266] [] ? cap_inode_setxattr+0x2c/0x60 > [14325.304266] [] ? vfs_setxattr+0x91/0xa0 > [14325.304266] [] ? setxattr+0xb8/0x110 > [14325.304266] [] ? path_to_nameidata+0x1e/0x50 > [14325.304266] [] ? link_path_walk+0x412/0x890 > [14325.304266] [] ? enqueue_task_fair+0x39/0x80 > [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 > [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 > [14325.304266] [] ? putname+0x2b/0x40 > [14325.304266] [] ? user_path_at+0x4a/0x80 > [14325.304266] [] ? sys_futex+0x72/0x120 > [14325.304266] [] ? sys_setxattr+0x83/0x90 > [14325.304266] [] ? syscall_call+0x7/0xb > [14325.304266] [] ? cache_add_dev+0x73/0x195 > [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 32 > ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff <0f> > 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b > [14325.304266] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > SS:ESP 0068:f5823c10 > [14325.326777] ---[ end trace 53e0b3b55af7a83c ]--- > [14384.001261] ceph: mds0 caps stale > [14413.616132] ceph: tid 33594 timed out on osd2, will reset osd > [14628.992279] ceph: mds0 hung > --------- > > as a next step I wil try to use btrfs . > > Cheers, > > Bogdan > > > On Fri, 27 Aug 2010, Sage Weil wrote: > > > Hi Bogdan, > > > > This is a bug in the ext3 xattr code. It seems to be gone in 2.6.34 and > > later. Or, you can switch to btrfs! > > > > sage > > > > > > On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote: > > > > > Hello, > > > > > > working with ceph on my test configuration > > > (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP) > > > After starting > > > svn co https://root.cern.ch/svn/root/trunk root > > > > > > on the /ceph directory, the command become stuck, and also: > > > root 5303 0.0 0.0 0 0 ? D Aug26 0:00 > > > [kjournald] > > > root 30181 0.0 0.0 6972 2056 pts/1 D+ 13:46 0:00 > > > /usr//bin/cosd > > > -i 2 -c /etc/ceph/ceph.conf > > > > > > any mount, unmount are going also to the state D. > > > This is a permanennt behaviour of the ceph if the command is started. > > > > > > dmesg shows: > > > ------------- > > > [99048.567704] ------------[ cut here ]------------ > > > [99048.568767] kernel BUG at > > > /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384! > > > [99048.568767] invalid opcode: 0000 [#1] SMP > > > [99048.568767] last sysfs file: > > > /sys/devices/pci0000:00/0000:00:00.0/device > > > [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc > > > ceph > > > crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga > > > vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac > > > edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas > > > usbhid mptsas mptscsih mptbase scsi_transport_sas > > > [99048.596652] > > > [99048.596652] Pid: 6258, comm: cosd Tainted: P > > > (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950 > > > [99048.596652] EIP: 0060:[] EFLAGS: 00210296 CPU: 3 > > > [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > > > [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000 > > > [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14 > > > [99048.596652] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > > > [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300 > > > task.ti=f5cca000) > > > [99048.596652] Stack: > > > [99048.596652] 00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c > > > f6dd5494 02147fff > > > [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428 > > > f1058500 00000000 > > > [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0 > > > f5ccbcb4 f5ccbc90 > > > [99048.596652] Call Trace: > > > [99048.596652] [] ? read_block_bitmap+0x48/0x160 > > > [99048.596652] [] ? ext3_new_blocks+0x228/0x6c0 > > > [99048.596652] [] ? mb_cache_entry_find_first+0x67/0x80 > > > [99048.596652] [] ? ext3_new_block+0x25/0x30 > > > [99048.596652] [] ? ext3_xattr_block_set+0x554/0x670 > > > [99048.596652] [] ? ext3_xattr_set_entry+0x29/0x350 > > > [99048.596652] [] ? ext3_xattr_set_handle+0x2cb/0x3e0 > > > [99048.596652] [] ? ext3_xattr_set+0x75/0xc0 > > > [99048.596652] [] ? ext3_xattr_user_set+0x76/0x80 > > > [99048.596652] [] ? generic_setxattr+0x9c/0xb0 > > > [99048.596652] [] ? generic_setxattr+0x0/0xb0 > > > [99048.596652] [] ? __vfs_setxattr_noperm+0x44/0x160 > > > [99048.596652] [] ? cap_inode_setxattr+0x2c/0x60 > > > [99048.596652] [] ? vfs_setxattr+0x91/0xa0 > > > [99048.596652] [] ? setxattr+0xb8/0x110 > > > [99048.596652] [] ? __link_path_walk+0x632/0xca0 > > > [99048.596652] [] ? enqueue_task_fair+0x39/0x80 > > > [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 > > > [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 > > > [99048.596652] [] ? path_put+0x25/0x30 > > > [99048.596652] [] ? putname+0x2b/0x40 > > > [99048.596652] [] ? user_path_at+0x4a/0x80 > > > [99048.596652] [] ? sys_futex+0x72/0x120 > > > [99048.596652] [] ? sys_setxattr+0x83/0x90 > > > [99048.596652] [] ? sysenter_do_call+0x12/0x28 > > > [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 > > > 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 > > > ff<0f> > > > 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53 > > > [99048.596652] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > > > SS:ESP 0068:f5ccbc14 > > > [99049.044090] ---[ end trace 35860103963ee444 ]--- > > > h1farm184# > > > -------------------- > > > > > > my ceph.conf is: > > > ------- > > > [global] > > > pid file = /var/run/ceph/$name.pid > > > debug ms = 1 > > > keyring = /etc/ceph/keyring.bin > > > ; monitors > > > [mon] > > > ;Directory for monitor files > > > mon data = /x02/mon$id > > > debug mon = 20 > > > debug paxos = 20 > > > mon lease wiggle room = 0.5 > > > > > > [mon0] > > > host = h1farm182 > > > mon addr = xxx.xxx.xx.116:6789 > > > [mon1] > > > host = h1farm183 > > > mon addr = xxx.xxx.xx.117:6789 > > > ; metadata servers > > > [mds] > > > debug mds = 20 > > > mds log max segments = 2 > > > keyring = /etc/ceph/keyring.$name > > > [mds0] > > > host = h1farm182 > > > [mds1] > > > host = h1farm183 > > > [osd] > > > sudo = true > > > osd data = /x02/osd$id > > > osd journal = /x02/osd$id/journal > > > osd journal size = 100 > > > keyring = /etc/ceph/keyring.$name > > > debug osd = 20 > > > debug journal = 20 > > > debug filestore = 20 > > > ;osd journal size = 100 > > > [osd0] > > > host = h1farm182 > > > [osd1] > > > host = h1farm183 > > > [osd2] > > > host = h1farm184 > > > > > > ------- > > > > > > Any idea how to improve the situation ? > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bogdan Lobodzinski Subject: Re: Write operation is stuck Date: Tue, 31 Aug 2010 09:56:43 +0200 (CEST) Message-ID: References: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Return-path: Received: from smtp-out-1.desy.de ([131.169.56.84]:47742 "EHLO smtp-out-1.desy.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756897Ab0HaH4p (ORCPT ); Tue, 31 Aug 2010 03:56:45 -0400 Received: from smtp-map-1.desy.de (smtp-map-1.desy.de [131.169.56.66]) by smtp-out-1.desy.de (DESY_OUT_1) with ESMTP id F13A7173C for ; Tue, 31 Aug 2010 09:56:43 +0200 (MEST) Received: from adserv71.win.desy.de (adserv71.win.desy.de [131.169.97.57]) by smtp-map-1.desy.de (DESY_MAP_1) with ESMTP id E573D13E92 for ; Tue, 31 Aug 2010 09:56:43 +0200 (MEST) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: ceph-devel@vger.kernel.org Hello Sage, On Mon, 30 Aug 2010, Sage Weil wrote: > On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote: >> >> Hello Sage, >> >> I moved to the kernel 2.6.35, keeping ext3 filesystem. >> After executing teh same command: >> svn co https://root.cern.ch/svn/root/trunk root >> >> System is again dead. The command and kjournald are stuck >> bogdan 8539 0.9 0.6 31168 22040 pts/0 DL+ 16:44 0:21 svn co >> https://root.cern.ch/svn/root/trunk root >> root 802 0.0 0.0 0 0 ? D 12:59 0:01 [kjournald] > > Hmm. Have you tried ext4? > > I stopped seeing this on my own machine with recent kernels, but it looks > like it isn't in fact fixed. This should be reported to the ext4 list. > Are you running ceph via vstart.sh or a custom ceph.conf? I am using vstart.sh taken from compiled by me source tarball ceph-0.21.tar.gz (http://ceph.newdream.net/download/) and the client from git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client-standalone.git Cheers, Bogdan > > sage > >> >> Looks like the bug is not fixed, dmesg shows: >> --------- >> [14325.304068] kernel BUG at >> /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385! >> [14325.304191] invalid opcode: 0000 [#1] SMP >> [14325.304263] last sysfs file: /sys/devices/pci0000:00/0000:00:00.0/device >> [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc >> ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart >> i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp >> serio_raw mptsas mptscsih mptbase scsi_transport_sas >> [14325.304266] >> [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic >> #20~lucid2-Ubuntu 0DT097/PowerEdge 1950 >> [14325.304266] EIP: 0060:[] EFLAGS: 00210286 CPU: 1 >> [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >> [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000 >> [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10 >> [14325.304266] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >> [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70 >> task.ti=f5822000) >> [14325.304266] Stack: >> [14325.304266] 000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 c8641454 >> 007b7fff >> [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6 >> c4063ec0 00000000 >> [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8 >> f5823cac c0256017 >> [14325.304266] Call Trace: >> [14325.304266] [] ? read_block_bitmap+0x48/0x160 >> [14325.304266] [] ? ext3_new_blocks+0x1ff/0x610 >> [14325.304266] [] ? mb_cache_entry_find_first+0x67/0x80 >> [14325.304266] [] ? ext3_new_block+0x25/0x30 >> [14325.304266] [] ? ext3_xattr_block_set+0x481/0x550 >> [14325.304266] [] ? ext3_xattr_set_entry+0x20/0x2f0 >> [14325.304266] [] ? ext3_xattr_set_handle+0x31b/0x400 >> [14325.304266] [] ? ext3_xattr_set+0x75/0xc0 >> [14325.304266] [] ? ext3_xattr_user_set+0x74/0x80 >> [14325.304266] [] ? generic_setxattr+0x9b/0xb0 >> [14325.304266] [] ? generic_setxattr+0x0/0xb0 >> [14325.304266] [] ? __vfs_setxattr_noperm+0x44/0x150 >> [14325.304266] [] ? cap_inode_setxattr+0x2c/0x60 >> [14325.304266] [] ? vfs_setxattr+0x91/0xa0 >> [14325.304266] [] ? setxattr+0xb8/0x110 >> [14325.304266] [] ? path_to_nameidata+0x1e/0x50 >> [14325.304266] [] ? link_path_walk+0x412/0x890 >> [14325.304266] [] ? enqueue_task_fair+0x39/0x80 >> [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 >> [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 >> [14325.304266] [] ? putname+0x2b/0x40 >> [14325.304266] [] ? user_path_at+0x4a/0x80 >> [14325.304266] [] ? sys_futex+0x72/0x120 >> [14325.304266] [] ? sys_setxattr+0x83/0x90 >> [14325.304266] [] ? syscall_call+0x7/0xb >> [14325.304266] [] ? cache_add_dev+0x73/0x195 >> [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 32 >> ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff <0f> >> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b >> [14325.304266] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >> SS:ESP 0068:f5823c10 >> [14325.326777] ---[ end trace 53e0b3b55af7a83c ]--- >> [14384.001261] ceph: mds0 caps stale >> [14413.616132] ceph: tid 33594 timed out on osd2, will reset osd >> [14628.992279] ceph: mds0 hung >> --------- >> >> as a next step I wil try to use btrfs . >> >> Cheers, >> >> Bogdan >> >> >> On Fri, 27 Aug 2010, Sage Weil wrote: >> >>> Hi Bogdan, >>> >>> This is a bug in the ext3 xattr code. It seems to be gone in 2.6.34 and >>> later. Or, you can switch to btrfs! >>> >>> sage >>> >>> >>> On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote: >>> >>>> Hello, >>>> >>>> working with ceph on my test configuration >>>> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP) >>>> After starting >>>> svn co https://root.cern.ch/svn/root/trunk root >>>> >>>> on the /ceph directory, the command become stuck, and also: >>>> root 5303 0.0 0.0 0 0 ? D Aug26 0:00 >>>> [kjournald] >>>> root 30181 0.0 0.0 6972 2056 pts/1 D+ 13:46 0:00 >>>> /usr//bin/cosd >>>> -i 2 -c /etc/ceph/ceph.conf >>>> >>>> any mount, unmount are going also to the state D. >>>> This is a permanennt behaviour of the ceph if the command is started. >>>> >>>> dmesg shows: >>>> ------------- >>>> [99048.567704] ------------[ cut here ]------------ >>>> [99048.568767] kernel BUG at >>>> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384! >>>> [99048.568767] invalid opcode: 0000 [#1] SMP >>>> [99048.568767] last sysfs file: >>>> /sys/devices/pci0000:00/0000:00:00.0/device >>>> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc >>>> ceph >>>> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga >>>> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac >>>> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas >>>> usbhid mptsas mptscsih mptbase scsi_transport_sas >>>> [99048.596652] >>>> [99048.596652] Pid: 6258, comm: cosd Tainted: P >>>> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950 >>>> [99048.596652] EIP: 0060:[] EFLAGS: 00210296 CPU: 3 >>>> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>>> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000 >>>> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14 >>>> [99048.596652] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >>>> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300 >>>> task.ti=f5cca000) >>>> [99048.596652] Stack: >>>> [99048.596652] 00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c >>>> f6dd5494 02147fff >>>> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428 >>>> f1058500 00000000 >>>> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0 >>>> f5ccbcb4 f5ccbc90 >>>> [99048.596652] Call Trace: >>>> [99048.596652] [] ? read_block_bitmap+0x48/0x160 >>>> [99048.596652] [] ? ext3_new_blocks+0x228/0x6c0 >>>> [99048.596652] [] ? mb_cache_entry_find_first+0x67/0x80 >>>> [99048.596652] [] ? ext3_new_block+0x25/0x30 >>>> [99048.596652] [] ? ext3_xattr_block_set+0x554/0x670 >>>> [99048.596652] [] ? ext3_xattr_set_entry+0x29/0x350 >>>> [99048.596652] [] ? ext3_xattr_set_handle+0x2cb/0x3e0 >>>> [99048.596652] [] ? ext3_xattr_set+0x75/0xc0 >>>> [99048.596652] [] ? ext3_xattr_user_set+0x76/0x80 >>>> [99048.596652] [] ? generic_setxattr+0x9c/0xb0 >>>> [99048.596652] [] ? generic_setxattr+0x0/0xb0 >>>> [99048.596652] [] ? __vfs_setxattr_noperm+0x44/0x160 >>>> [99048.596652] [] ? cap_inode_setxattr+0x2c/0x60 >>>> [99048.596652] [] ? vfs_setxattr+0x91/0xa0 >>>> [99048.596652] [] ? setxattr+0xb8/0x110 >>>> [99048.596652] [] ? __link_path_walk+0x632/0xca0 >>>> [99048.596652] [] ? enqueue_task_fair+0x39/0x80 >>>> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 >>>> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 >>>> [99048.596652] [] ? path_put+0x25/0x30 >>>> [99048.596652] [] ? putname+0x2b/0x40 >>>> [99048.596652] [] ? user_path_at+0x4a/0x80 >>>> [99048.596652] [] ? sys_futex+0x72/0x120 >>>> [99048.596652] [] ? sys_setxattr+0x83/0x90 >>>> [99048.596652] [] ? sysenter_do_call+0x12/0x28 >>>> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 >>>> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 >>>> ff<0f> >>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53 >>>> [99048.596652] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>>> SS:ESP 0068:f5ccbc14 >>>> [99049.044090] ---[ end trace 35860103963ee444 ]--- >>>> h1farm184# >>>> -------------------- >>>> >>>> my ceph.conf is: >>>> ------- >>>> [global] >>>> pid file = /var/run/ceph/$name.pid >>>> debug ms = 1 >>>> keyring = /etc/ceph/keyring.bin >>>> ; monitors >>>> [mon] >>>> ;Directory for monitor files >>>> mon data = /x02/mon$id >>>> debug mon = 20 >>>> debug paxos = 20 >>>> mon lease wiggle room = 0.5 >>>> >>>> [mon0] >>>> host = h1farm182 >>>> mon addr = xxx.xxx.xx.116:6789 >>>> [mon1] >>>> host = h1farm183 >>>> mon addr = xxx.xxx.xx.117:6789 >>>> ; metadata servers >>>> [mds] >>>> debug mds = 20 >>>> mds log max segments = 2 >>>> keyring = /etc/ceph/keyring.$name >>>> [mds0] >>>> host = h1farm182 >>>> [mds1] >>>> host = h1farm183 >>>> [osd] >>>> sudo = true >>>> osd data = /x02/osd$id >>>> osd journal = /x02/osd$id/journal >>>> osd journal size = 100 >>>> keyring = /etc/ceph/keyring.$name >>>> debug osd = 20 >>>> debug journal = 20 >>>> debug filestore = 20 >>>> ;osd journal size = 100 >>>> [osd0] >>>> host = h1farm182 >>>> [osd1] >>>> host = h1farm183 >>>> [osd2] >>>> host = h1farm184 >>>> >>>> ------- >>>> >>>> Any idea how to improve the situation ? >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bogdan Lobodzinski Subject: Re: Write operation is stuck Date: Wed, 1 Sep 2010 17:21:14 +0200 (CEST) Message-ID: References: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Return-path: Received: from smtp-out-1.desy.de ([131.169.56.84]:53881 "EHLO smtp-out-1.desy.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755649Ab0IAPVR (ORCPT ); Wed, 1 Sep 2010 11:21:17 -0400 Received: from smtp-map-1.desy.de (smtp-map-1.desy.de [131.169.56.66]) by smtp-out-1.desy.de (DESY_OUT_1) with ESMTP id E15C11776 for ; Wed, 1 Sep 2010 17:21:15 +0200 (MEST) Received: from adserv70.win.desy.de (adserv70.win.desy.de [131.169.97.56]) by smtp-map-1.desy.de (DESY_MAP_1) with ESMTP id D6DD913E9B for ; Wed, 1 Sep 2010 17:21:15 +0200 (MEST) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: ceph-devel@vger.kernel.org Hello Sage, replacing ext3 by btrfs my ceph test-bed survived my test comand: svn co https://root.cern.ch/svn/root/trunk root I didn't try ext4. However, I did a few changes in my initial ceph.conf. Could you, please, check if such a configuration is reasonable ? Is it correct to use "osd journal" location as it is done below ? My new ceph.conf: ----------- [global] pid file = /var/run/ceph/$name.pid debug ms = 1 keyring = /etc/ceph/keyring.bin [mon] mon data = /x01/mon$id debug mon = 20 debug paxos = 20 mon lease wiggle room = 0.5 [mon0] host = h1farm182 mon addr = xxx.xxx.xxx.116:6789 [mon1] host = h1farm183 mon addr = xxx.xxx.xxx.117:6789 [mds] debug mds = 10 mds log max segments = 2 keyring = /etc/ceph/keyring.$name [mds0] host = h1farm182 [mds1] host = h1farm183 [osd] sudo = true keyring = /etc/ceph/keyring.$name osd data = /x02/osd$id osd journal = /x02/osd$id/journal osd journal size = 100 debug osd = 20 debug journal = 20 debug filestore = 20 [osd0] host = h1farm183 btrfs devs = /dev/sdb1 [osd1] host = h1farm184 btrfs devs = /dev/sdb1 ----------- Thank you for help, Cheers, Bogdan On Tue, 31 Aug 2010, Bogdan Lobodzinski wrote: > > Hello Sage, > > On Mon, 30 Aug 2010, Sage Weil wrote: > >> On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote: >>> >>> Hello Sage, >>> >>> I moved to the kernel 2.6.35, keeping ext3 filesystem. >>> After executing teh same command: >>> svn co https://root.cern.ch/svn/root/trunk root >>> >>> System is again dead. The command and kjournald are stuck >>> bogdan 8539 0.9 0.6 31168 22040 pts/0 DL+ 16:44 0:21 svn co >>> https://root.cern.ch/svn/root/trunk root >>> root 802 0.0 0.0 0 0 ? D 12:59 0:01 [kjournald] >> >> Hmm. Have you tried ext4? >> >> I stopped seeing this on my own machine with recent kernels, but it looks >> like it isn't in fact fixed. This should be reported to the ext4 list. >> Are you running ceph via vstart.sh or a custom ceph.conf? > I am using vstart.sh taken from compiled by me source tarball > ceph-0.21.tar.gz (http://ceph.newdream.net/download/) > and the client from > git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client-standalone.git > > Cheers, > > Bogdan > >> >> sage >> >>> >>> Looks like the bug is not fixed, dmesg shows: >>> --------- >>> [14325.304068] kernel BUG at >>> /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385! >>> [14325.304191] invalid opcode: 0000 [#1] SMP >>> [14325.304263] last sysfs file: >>> /sys/devices/pci0000:00/0000:00:00.0/device >>> [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss >>> sunrpc >>> ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart >>> i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp >>> serio_raw mptsas mptscsih mptbase scsi_transport_sas >>> [14325.304266] >>> [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic >>> #20~lucid2-Ubuntu 0DT097/PowerEdge 1950 >>> [14325.304266] EIP: 0060:[] EFLAGS: 00210286 CPU: 1 >>> [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>> [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000 >>> [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10 >>> [14325.304266] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >>> [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70 >>> task.ti=f5822000) >>> [14325.304266] Stack: >>> [14325.304266] 000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 >>> c8641454 >>> 007b7fff >>> [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6 >>> c4063ec0 00000000 >>> [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8 >>> f5823cac c0256017 >>> [14325.304266] Call Trace: >>> [14325.304266] [] ? read_block_bitmap+0x48/0x160 >>> [14325.304266] [] ? ext3_new_blocks+0x1ff/0x610 >>> [14325.304266] [] ? mb_cache_entry_find_first+0x67/0x80 >>> [14325.304266] [] ? ext3_new_block+0x25/0x30 >>> [14325.304266] [] ? ext3_xattr_block_set+0x481/0x550 >>> [14325.304266] [] ? ext3_xattr_set_entry+0x20/0x2f0 >>> [14325.304266] [] ? ext3_xattr_set_handle+0x31b/0x400 >>> [14325.304266] [] ? ext3_xattr_set+0x75/0xc0 >>> [14325.304266] [] ? ext3_xattr_user_set+0x74/0x80 >>> [14325.304266] [] ? generic_setxattr+0x9b/0xb0 >>> [14325.304266] [] ? generic_setxattr+0x0/0xb0 >>> [14325.304266] [] ? __vfs_setxattr_noperm+0x44/0x150 >>> [14325.304266] [] ? cap_inode_setxattr+0x2c/0x60 >>> [14325.304266] [] ? vfs_setxattr+0x91/0xa0 >>> [14325.304266] [] ? setxattr+0xb8/0x110 >>> [14325.304266] [] ? path_to_nameidata+0x1e/0x50 >>> [14325.304266] [] ? link_path_walk+0x412/0x890 >>> [14325.304266] [] ? enqueue_task_fair+0x39/0x80 >>> [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 >>> [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 >>> [14325.304266] [] ? putname+0x2b/0x40 >>> [14325.304266] [] ? user_path_at+0x4a/0x80 >>> [14325.304266] [] ? sys_futex+0x72/0x120 >>> [14325.304266] [] ? sys_setxattr+0x83/0x90 >>> [14325.304266] [] ? syscall_call+0x7/0xb >>> [14325.304266] [] ? cache_add_dev+0x73/0x195 >>> [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 >>> 32 >>> ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff >>> <0f> >>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b >>> [14325.304266] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>> SS:ESP 0068:f5823c10 >>> [14325.326777] ---[ end trace 53e0b3b55af7a83c ]--- >>> [14384.001261] ceph: mds0 caps stale >>> [14413.616132] ceph: tid 33594 timed out on osd2, will reset osd >>> [14628.992279] ceph: mds0 hung >>> --------- >>> >>> as a next step I wil try to use btrfs . >>> >>> Cheers, >>> >>> Bogdan >>> >>> >>> On Fri, 27 Aug 2010, Sage Weil wrote: >>> >>>> Hi Bogdan, >>>> >>>> This is a bug in the ext3 xattr code. It seems to be gone in 2.6.34 and >>>> later. Or, you can switch to btrfs! >>>> >>>> sage >>>> >>>> >>>> On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote: >>>> >>>>> Hello, >>>>> >>>>> working with ceph on my test configuration >>>>> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP) >>>>> After starting >>>>> svn co https://root.cern.ch/svn/root/trunk root >>>>> >>>>> on the /ceph directory, the command become stuck, and also: >>>>> root 5303 0.0 0.0 0 0 ? D Aug26 0:00 >>>>> [kjournald] >>>>> root 30181 0.0 0.0 6972 2056 pts/1 D+ 13:46 0:00 >>>>> /usr//bin/cosd >>>>> -i 2 -c /etc/ceph/ceph.conf >>>>> >>>>> any mount, unmount are going also to the state D. >>>>> This is a permanennt behaviour of the ceph if the command is started. >>>>> >>>>> dmesg shows: >>>>> ------------- >>>>> [99048.567704] ------------[ cut here ]------------ >>>>> [99048.568767] kernel BUG at >>>>> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384! >>>>> [99048.568767] invalid opcode: 0000 [#1] SMP >>>>> [99048.568767] last sysfs file: >>>>> /sys/devices/pci0000:00/0000:00:00.0/device >>>>> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc >>>>> ceph >>>>> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga >>>>> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac >>>>> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas >>>>> usbhid mptsas mptscsih mptbase scsi_transport_sas >>>>> [99048.596652] >>>>> [99048.596652] Pid: 6258, comm: cosd Tainted: P >>>>> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950 >>>>> [99048.596652] EIP: 0060:[] EFLAGS: 00210296 CPU: 3 >>>>> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>>>> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000 >>>>> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14 >>>>> [99048.596652] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >>>>> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300 >>>>> task.ti=f5cca000) >>>>> [99048.596652] Stack: >>>>> [99048.596652] 00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c >>>>> f6dd5494 02147fff >>>>> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428 >>>>> f1058500 00000000 >>>>> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0 >>>>> f5ccbcb4 f5ccbc90 >>>>> [99048.596652] Call Trace: >>>>> [99048.596652] [] ? read_block_bitmap+0x48/0x160 >>>>> [99048.596652] [] ? ext3_new_blocks+0x228/0x6c0 >>>>> [99048.596652] [] ? mb_cache_entry_find_first+0x67/0x80 >>>>> [99048.596652] [] ? ext3_new_block+0x25/0x30 >>>>> [99048.596652] [] ? ext3_xattr_block_set+0x554/0x670 >>>>> [99048.596652] [] ? ext3_xattr_set_entry+0x29/0x350 >>>>> [99048.596652] [] ? ext3_xattr_set_handle+0x2cb/0x3e0 >>>>> [99048.596652] [] ? ext3_xattr_set+0x75/0xc0 >>>>> [99048.596652] [] ? ext3_xattr_user_set+0x76/0x80 >>>>> [99048.596652] [] ? generic_setxattr+0x9c/0xb0 >>>>> [99048.596652] [] ? generic_setxattr+0x0/0xb0 >>>>> [99048.596652] [] ? __vfs_setxattr_noperm+0x44/0x160 >>>>> [99048.596652] [] ? cap_inode_setxattr+0x2c/0x60 >>>>> [99048.596652] [] ? vfs_setxattr+0x91/0xa0 >>>>> [99048.596652] [] ? setxattr+0xb8/0x110 >>>>> [99048.596652] [] ? __link_path_walk+0x632/0xca0 >>>>> [99048.596652] [] ? enqueue_task_fair+0x39/0x80 >>>>> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 >>>>> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 >>>>> [99048.596652] [] ? path_put+0x25/0x30 >>>>> [99048.596652] [] ? putname+0x2b/0x40 >>>>> [99048.596652] [] ? user_path_at+0x4a/0x80 >>>>> [99048.596652] [] ? sys_futex+0x72/0x120 >>>>> [99048.596652] [] ? sys_setxattr+0x83/0x90 >>>>> [99048.596652] [] ? sysenter_do_call+0x12/0x28 >>>>> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f >>>>> 83 >>>>> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 >>>>> ff<0f> >>>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53 >>>>> [99048.596652] EIP: [] >>>>> ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>>>> SS:ESP 0068:f5ccbc14 >>>>> [99049.044090] ---[ end trace 35860103963ee444 ]--- >>>>> h1farm184# >>>>> -------------------- >>>>> >>>>> my ceph.conf is: >>>>> ------- >>>>> [global] >>>>> pid file = /var/run/ceph/$name.pid >>>>> debug ms = 1 >>>>> keyring = /etc/ceph/keyring.bin >>>>> ; monitors >>>>> [mon] >>>>> ;Directory for monitor files >>>>> mon data = /x02/mon$id >>>>> debug mon = 20 >>>>> debug paxos = 20 >>>>> mon lease wiggle room = 0.5 >>>>> >>>>> [mon0] >>>>> host = h1farm182 >>>>> mon addr = xxx.xxx.xx.116:6789 >>>>> [mon1] >>>>> host = h1farm183 >>>>> mon addr = xxx.xxx.xx.117:6789 >>>>> ; metadata servers >>>>> [mds] >>>>> debug mds = 20 >>>>> mds log max segments = 2 >>>>> keyring = /etc/ceph/keyring.$name >>>>> [mds0] >>>>> host = h1farm182 >>>>> [mds1] >>>>> host = h1farm183 >>>>> [osd] >>>>> sudo = true >>>>> osd data = /x02/osd$id >>>>> osd journal = /x02/osd$id/journal >>>>> osd journal size = 100 >>>>> keyring = /etc/ceph/keyring.$name >>>>> debug osd = 20 >>>>> debug journal = 20 >>>>> debug filestore = 20 >>>>> ;osd journal size = 100 >>>>> [osd0] >>>>> host = h1farm182 >>>>> [osd1] >>>>> host = h1farm183 >>>>> [osd2] >>>>> host = h1farm184 >>>>> >>>>> ------- >>>>> >>>>> Any idea how to improve the situation ? >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wido den Hollander Subject: Re: Write operation is stuck Date: Wed, 01 Sep 2010 21:29:52 +0200 Message-ID: <1283369392.3894.8.camel@wido-laptop.pcextreme.nl> References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from smtp01.mail.pcextreme.nl ([109.72.87.137]:49805 "EHLO smtp01.mail.pcextreme.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752768Ab0IATaA (ORCPT ); Wed, 1 Sep 2010 15:30:00 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Bogdan Lobodzinski Cc: Sage Weil , ceph-devel@vger.kernel.org Hi Bogdan, Yes, you can place your journal on a file, that is no problem. Performance wise you might want to use a block device (or partition) and a other device then the one where your data is one. Wido On Wed, 2010-09-01 at 17:21 +0200, Bogdan Lobodzinski wrote: > Hello Sage, > > replacing ext3 by btrfs my ceph test-bed survived my test comand: > svn co https://root.cern.ch/svn/root/trunk root > > I didn't try ext4. > > However, I did a few changes in my initial ceph.conf. > Could you, please, check if such a configuration is reasonable ? > Is it correct to use "osd journal" location as it is done below ? > > My new ceph.conf: > ----------- > [global] > pid file = /var/run/ceph/$name.pid > debug ms = 1 > keyring = /etc/ceph/keyring.bin > [mon] > mon data = /x01/mon$id > debug mon = 20 > debug paxos = 20 > mon lease wiggle room = 0.5 > [mon0] > host = h1farm182 > mon addr = xxx.xxx.xxx.116:6789 > [mon1] > host = h1farm183 > mon addr = xxx.xxx.xxx.117:6789 > [mds] > debug mds = 10 > mds log max segments = 2 > keyring = /etc/ceph/keyring.$name > [mds0] > host = h1farm182 > [mds1] > host = h1farm183 > [osd] > sudo = true > keyring = /etc/ceph/keyring.$name > osd data = /x02/osd$id > osd journal = /x02/osd$id/journal > osd journal size = 100 > debug osd = 20 > debug journal = 20 > debug filestore = 20 > [osd0] > host = h1farm183 > btrfs devs = /dev/sdb1 > [osd1] > host = h1farm184 > btrfs devs = /dev/sdb1 > ----------- > > Thank you for help, > > Cheers, > > Bogdan > > > On Tue, 31 Aug 2010, Bogdan Lobodzinski wrote: > > > > > Hello Sage, > > > > On Mon, 30 Aug 2010, Sage Weil wrote: > > > >> On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote: > >>> > >>> Hello Sage, > >>> > >>> I moved to the kernel 2.6.35, keeping ext3 filesystem. > >>> After executing teh same command: > >>> svn co https://root.cern.ch/svn/root/trunk root > >>> > >>> System is again dead. The command and kjournald are stuck > >>> bogdan 8539 0.9 0.6 31168 22040 pts/0 DL+ 16:44 0:21 svn co > >>> https://root.cern.ch/svn/root/trunk root > >>> root 802 0.0 0.0 0 0 ? D 12:59 0:01 [kjournald] > >> > >> Hmm. Have you tried ext4? > >> > >> I stopped seeing this on my own machine with recent kernels, but it looks > >> like it isn't in fact fixed. This should be reported to the ext4 list. > >> Are you running ceph via vstart.sh or a custom ceph.conf? > > I am using vstart.sh taken from compiled by me source tarball > > ceph-0.21.tar.gz (http://ceph.newdream.net/download/) > > and the client from > > git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client-standalone.git > > > > Cheers, > > > > Bogdan > > > >> > >> sage > >> > >>> > >>> Looks like the bug is not fixed, dmesg shows: > >>> --------- > >>> [14325.304068] kernel BUG at > >>> /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385! > >>> [14325.304191] invalid opcode: 0000 [#1] SMP > >>> [14325.304263] last sysfs file: > >>> /sys/devices/pci0000:00/0000:00:00.0/device > >>> [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss > >>> sunrpc > >>> ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart > >>> i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp > >>> serio_raw mptsas mptscsih mptbase scsi_transport_sas > >>> [14325.304266] > >>> [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic > >>> #20~lucid2-Ubuntu 0DT097/PowerEdge 1950 > >>> [14325.304266] EIP: 0060:[] EFLAGS: 00210286 CPU: 1 > >>> [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > >>> [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000 > >>> [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10 > >>> [14325.304266] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > >>> [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70 > >>> task.ti=f5822000) > >>> [14325.304266] Stack: > >>> [14325.304266] 000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 > >>> c8641454 > >>> 007b7fff > >>> [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6 > >>> c4063ec0 00000000 > >>> [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8 > >>> f5823cac c0256017 > >>> [14325.304266] Call Trace: > >>> [14325.304266] [] ? read_block_bitmap+0x48/0x160 > >>> [14325.304266] [] ? ext3_new_blocks+0x1ff/0x610 > >>> [14325.304266] [] ? mb_cache_entry_find_first+0x67/0x80 > >>> [14325.304266] [] ? ext3_new_block+0x25/0x30 > >>> [14325.304266] [] ? ext3_xattr_block_set+0x481/0x550 > >>> [14325.304266] [] ? ext3_xattr_set_entry+0x20/0x2f0 > >>> [14325.304266] [] ? ext3_xattr_set_handle+0x31b/0x400 > >>> [14325.304266] [] ? ext3_xattr_set+0x75/0xc0 > >>> [14325.304266] [] ? ext3_xattr_user_set+0x74/0x80 > >>> [14325.304266] [] ? generic_setxattr+0x9b/0xb0 > >>> [14325.304266] [] ? generic_setxattr+0x0/0xb0 > >>> [14325.304266] [] ? __vfs_setxattr_noperm+0x44/0x150 > >>> [14325.304266] [] ? cap_inode_setxattr+0x2c/0x60 > >>> [14325.304266] [] ? vfs_setxattr+0x91/0xa0 > >>> [14325.304266] [] ? setxattr+0xb8/0x110 > >>> [14325.304266] [] ? path_to_nameidata+0x1e/0x50 > >>> [14325.304266] [] ? link_path_walk+0x412/0x890 > >>> [14325.304266] [] ? enqueue_task_fair+0x39/0x80 > >>> [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 > >>> [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 > >>> [14325.304266] [] ? putname+0x2b/0x40 > >>> [14325.304266] [] ? user_path_at+0x4a/0x80 > >>> [14325.304266] [] ? sys_futex+0x72/0x120 > >>> [14325.304266] [] ? sys_setxattr+0x83/0x90 > >>> [14325.304266] [] ? syscall_call+0x7/0xb > >>> [14325.304266] [] ? cache_add_dev+0x73/0x195 > >>> [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 > >>> 32 > >>> ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff > >>> <0f> > >>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b > >>> [14325.304266] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > >>> SS:ESP 0068:f5823c10 > >>> [14325.326777] ---[ end trace 53e0b3b55af7a83c ]--- > >>> [14384.001261] ceph: mds0 caps stale > >>> [14413.616132] ceph: tid 33594 timed out on osd2, will reset osd > >>> [14628.992279] ceph: mds0 hung > >>> --------- > >>> > >>> as a next step I wil try to use btrfs . > >>> > >>> Cheers, > >>> > >>> Bogdan > >>> > >>> > >>> On Fri, 27 Aug 2010, Sage Weil wrote: > >>> > >>>> Hi Bogdan, > >>>> > >>>> This is a bug in the ext3 xattr code. It seems to be gone in 2.6.34 and > >>>> later. Or, you can switch to btrfs! > >>>> > >>>> sage > >>>> > >>>> > >>>> On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> working with ceph on my test configuration > >>>>> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP) > >>>>> After starting > >>>>> svn co https://root.cern.ch/svn/root/trunk root > >>>>> > >>>>> on the /ceph directory, the command become stuck, and also: > >>>>> root 5303 0.0 0.0 0 0 ? D Aug26 0:00 > >>>>> [kjournald] > >>>>> root 30181 0.0 0.0 6972 2056 pts/1 D+ 13:46 0:00 > >>>>> /usr//bin/cosd > >>>>> -i 2 -c /etc/ceph/ceph.conf > >>>>> > >>>>> any mount, unmount are going also to the state D. > >>>>> This is a permanennt behaviour of the ceph if the command is started. > >>>>> > >>>>> dmesg shows: > >>>>> ------------- > >>>>> [99048.567704] ------------[ cut here ]------------ > >>>>> [99048.568767] kernel BUG at > >>>>> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384! > >>>>> [99048.568767] invalid opcode: 0000 [#1] SMP > >>>>> [99048.568767] last sysfs file: > >>>>> /sys/devices/pci0000:00/0000:00:00.0/device > >>>>> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc > >>>>> ceph > >>>>> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga > >>>>> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac > >>>>> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas > >>>>> usbhid mptsas mptscsih mptbase scsi_transport_sas > >>>>> [99048.596652] > >>>>> [99048.596652] Pid: 6258, comm: cosd Tainted: P > >>>>> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950 > >>>>> [99048.596652] EIP: 0060:[] EFLAGS: 00210296 CPU: 3 > >>>>> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > >>>>> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000 > >>>>> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14 > >>>>> [99048.596652] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > >>>>> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300 > >>>>> task.ti=f5cca000) > >>>>> [99048.596652] Stack: > >>>>> [99048.596652] 00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c > >>>>> f6dd5494 02147fff > >>>>> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428 > >>>>> f1058500 00000000 > >>>>> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0 > >>>>> f5ccbcb4 f5ccbc90 > >>>>> [99048.596652] Call Trace: > >>>>> [99048.596652] [] ? read_block_bitmap+0x48/0x160 > >>>>> [99048.596652] [] ? ext3_new_blocks+0x228/0x6c0 > >>>>> [99048.596652] [] ? mb_cache_entry_find_first+0x67/0x80 > >>>>> [99048.596652] [] ? ext3_new_block+0x25/0x30 > >>>>> [99048.596652] [] ? ext3_xattr_block_set+0x554/0x670 > >>>>> [99048.596652] [] ? ext3_xattr_set_entry+0x29/0x350 > >>>>> [99048.596652] [] ? ext3_xattr_set_handle+0x2cb/0x3e0 > >>>>> [99048.596652] [] ? ext3_xattr_set+0x75/0xc0 > >>>>> [99048.596652] [] ? ext3_xattr_user_set+0x76/0x80 > >>>>> [99048.596652] [] ? generic_setxattr+0x9c/0xb0 > >>>>> [99048.596652] [] ? generic_setxattr+0x0/0xb0 > >>>>> [99048.596652] [] ? __vfs_setxattr_noperm+0x44/0x160 > >>>>> [99048.596652] [] ? cap_inode_setxattr+0x2c/0x60 > >>>>> [99048.596652] [] ? vfs_setxattr+0x91/0xa0 > >>>>> [99048.596652] [] ? setxattr+0xb8/0x110 > >>>>> [99048.596652] [] ? __link_path_walk+0x632/0xca0 > >>>>> [99048.596652] [] ? enqueue_task_fair+0x39/0x80 > >>>>> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 > >>>>> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 > >>>>> [99048.596652] [] ? path_put+0x25/0x30 > >>>>> [99048.596652] [] ? putname+0x2b/0x40 > >>>>> [99048.596652] [] ? user_path_at+0x4a/0x80 > >>>>> [99048.596652] [] ? sys_futex+0x72/0x120 > >>>>> [99048.596652] [] ? sys_setxattr+0x83/0x90 > >>>>> [99048.596652] [] ? sysenter_do_call+0x12/0x28 > >>>>> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f > >>>>> 83 > >>>>> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 > >>>>> ff<0f> > >>>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53 > >>>>> [99048.596652] EIP: [] > >>>>> ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 > >>>>> SS:ESP 0068:f5ccbc14 > >>>>> [99049.044090] ---[ end trace 35860103963ee444 ]--- > >>>>> h1farm184# > >>>>> -------------------- > >>>>> > >>>>> my ceph.conf is: > >>>>> ------- > >>>>> [global] > >>>>> pid file = /var/run/ceph/$name.pid > >>>>> debug ms = 1 > >>>>> keyring = /etc/ceph/keyring.bin > >>>>> ; monitors > >>>>> [mon] > >>>>> ;Directory for monitor files > >>>>> mon data = /x02/mon$id > >>>>> debug mon = 20 > >>>>> debug paxos = 20 > >>>>> mon lease wiggle room = 0.5 > >>>>> > >>>>> [mon0] > >>>>> host = h1farm182 > >>>>> mon addr = xxx.xxx.xx.116:6789 > >>>>> [mon1] > >>>>> host = h1farm183 > >>>>> mon addr = xxx.xxx.xx.117:6789 > >>>>> ; metadata servers > >>>>> [mds] > >>>>> debug mds = 20 > >>>>> mds log max segments = 2 > >>>>> keyring = /etc/ceph/keyring.$name > >>>>> [mds0] > >>>>> host = h1farm182 > >>>>> [mds1] > >>>>> host = h1farm183 > >>>>> [osd] > >>>>> sudo = true > >>>>> osd data = /x02/osd$id > >>>>> osd journal = /x02/osd$id/journal > >>>>> osd journal size = 100 > >>>>> keyring = /etc/ceph/keyring.$name > >>>>> debug osd = 20 > >>>>> debug journal = 20 > >>>>> debug filestore = 20 > >>>>> ;osd journal size = 100 > >>>>> [osd0] > >>>>> host = h1farm182 > >>>>> [osd1] > >>>>> host = h1farm183 > >>>>> [osd2] > >>>>> host = h1farm184 > >>>>> > >>>>> ------- > >>>>> > >>>>> Any idea how to improve the situation ? > >>>>> > >>>>> -- > >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>> the body of a message to majordomo@vger.kernel.org > >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>> > >>>>> > >>>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >>> > >> > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bogdan Lobodzinski Subject: Re: Write operation is stuck Date: Fri, 3 Sep 2010 17:02:30 +0200 (CEST) Message-ID: References: <1283369392.3894.8.camel@wido-laptop.pcextreme.nl> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Return-path: Received: from smtp-out-3.desy.de ([131.169.56.86]:63356 "EHLO smtp-out-3.desy.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755596Ab0ICPCc (ORCPT ); Fri, 3 Sep 2010 11:02:32 -0400 Received: from smtp-map-3.desy.de (smtp-map-3.desy.de [131.169.56.68]) by smtp-out-3.desy.de (DESY_OUT_3) with ESMTP id 2EA5C1036 for ; Fri, 3 Sep 2010 17:02:31 +0200 (MEST) Received: from adserv70.win.desy.de (adserv70.win.desy.de [131.169.97.56]) by smtp-map-3.desy.de (DESY_MAP_3) with ESMTP id 10F401026 for ; Fri, 3 Sep 2010 17:02:31 +0200 (MEST) In-Reply-To: <1283369392.3894.8.camel@wido-laptop.pcextreme.nl> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Wido den Hollander Cc: Sage Weil , ceph-devel@vger.kernel.org Hello all, let me continue my troubles, the title can stay the same. As I wrote, my ceph configuration survived my critical test svn co https://root.cern.ch/svn/root/trunk root and suddenly, during the night, at 5 oclock ceph became stuck again - without any kind of user activity, no work at all with /ceph directory. The node is running as mds1, mon1, osd0 System log file reports (the problem starts with entry: "Sep 2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale" ): -------- Sep 1 12:40:38 h1farm183 kernel: [10983.398458] Btrfs loaded Sep 1 12:44:25 h1farm183 kernel: [11210.109913] ceph: loaded (mon/mds/osd proto 15/32/24, osdmap 5/5 5/5) Sep 1 13:08:25 h1farm183 kernel: [12650.255052] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb 1 Sep 1 14:25:06 h1farm183 kernel: [17251.100851] RPC: Registered udp transport module. Sep 1 14:25:06 h1farm183 kernel: [17251.100854] RPC: Registered tcp transport module. Sep 1 14:25:06 h1farm183 kernel: [17251.100855] RPC: Registered tcp NFSv4.1 backchannel transport module. Sep 1 14:25:20 h1farm183 kernel: [17265.404967] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb 1 Sep 1 14:25:20 h1farm183 kernel: [17265.562870] udev: starting version 151 Sep 1 14:25:26 h1farm183 kernel: [17271.752817] device fsid 754ae49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb 1 ... Sep 1 16:41:51 h1farm183 kernel: [25456.385184] device fsid 4940eafa1c110ce7-c14b44192348589f devid 1 transid 12 /dev/sdb1 Sep 1 16:42:21 h1farm183 kernel: [25486.297025] ceph: client4100 fsid 4ea08089-acf1-b738-6f72-96c3ed029b71 Sep 1 16:42:21 h1farm183 kernel: [25486.297169] ceph: mon0 131.169.74.116:6789 session established Sep 2 02:37:54 h1farm183 rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="863" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'. Sep 2 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps stale Sep 2 05:44:57 h1farm183 kernel: [72441.976037] ceph: mds0 caps stale Sep 2 05:45:27 h1farm183 kernel: [72472.066320] ceph: mds0 reconnect start Sep 2 05:45:27 h1farm183 kernel: [72472.069681] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ceph btrfs zlib_deflate crc32c libcrc32c ppdev lp parport openafs(P) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp fbcon tileblit font bitblit softcursor vga16fb vgastate radeon ttm mptctl drm_kms_helper bnx2 drm usbhid i5000_edac hid dell_wmi shpchp edac_core agpgart i2c_algo_bit i5k_amb dcdbas psmouse serio_raw mptsas mptscsih mptbase scsi_transport_sas [last unloaded: kvm] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] Pid: 6184, comm: ceph-msgr/1 Tainted: P (2.6.32-24-generic-pae #42-Ubuntu) PowerEdge 1950 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] EIP: 0060:[] EFLAGS: 00010246 CPU: 1 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] EIP is at kunmap_high+0x97/0xa0 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] EAX: 00000000 EBX: f5d17000 ECX: c0916848 EDX: 00000292 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] ESI: c17ee940 EDI: f5d18000 EBP: f5fb3c6c ESP: f5fb3c64 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] c07d9280 f50b10a0 f5fb3c74 c0138307 f5fb3c98 f9ad7d54 00000000 f5fb3cbc Sep 2 05:45:27 h1farm183 kernel: [72472.072332] <0> 00000038 0000002b eaee1018 ee4bcd70 00000000 f5fb3d14 f9ada09d 00000000 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] <0> eaee108c 0000005c f60bab40 eaee0e00 ee788440 f50b10a0 00000a21 00000000 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? kunmap+0x57/0x60 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? ceph_pagelist_append+0x54/0x110 [ceph] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? encode_caps_cb+0x16d/0x1f0 [ceph] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? iterate_session_caps+0xa0/0x170 [ceph] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? encode_caps_cb+0x0/0x1f0 [ceph] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? send_mds_reconnect+0x23f/0x3b0 [ceph] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? ceph_mdsc_handle_map+0x224/0x380 [ceph] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? dispatch+0x8e/0x430 [ceph] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? con_work+0x1cf6/0x1ed0 [ceph] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? __switch_to+0xcd/0x180 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? finish_task_switch+0x43/0xc0 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? schedule+0x44c/0x840 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? run_workqueue+0x8e/0x150 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? con_work+0x0/0x1ed0 [ceph] Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? worker_thread+0x84/0xe0 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? autoremove_wake_function+0x0/0x50 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? worker_thread+0x0/0xe0 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? kthread+0x74/0x80 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? kthread+0x0/0x80 Sep 2 05:45:27 h1farm183 kernel: [72472.072332] [] ? kernel_thread_helper+0x7/0x10 Sep 2 05:45:27 h1farm183 kernel: [72472.304298] ---[ end trace 47e346731d47774d ]--- --- my mds1.log from the node shows: -------- 10.09.02_05:45:15.001538 b5168b70 mds-1.0 beacon_send up:standby seq 11751 (currently up:standby) 10.09.02_05:45:15.001555 b5168b70 -- 131.169.74.117:6800/3679 --> mon1 131.169.74.117:6789/0 -- mdsbeacon(4099/1 up:stand 10.09.02_05:45:19.001663 b5168b70 mds-1.0 beacon_send up:standby seq 11752 (currently up:standby) 10.09.02_05:45:19.001681 b5168b70 -- 131.169.74.117:6800/3679 --> mon1 131.169.74.117:6789/0 -- mdsbeacon(4099/1 up:stand 10.09.02_05:45:19.128037 b5168b70 mds-1.0 last tick was 80.001470 > 5 seconds ago, laggy_until 0.000000, setting laggy f 10.09.02_05:45:19.795620 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 131.169.74.117:6789/0 12055 ==== mdsmap(e 6) v1 == 10.09.02_05:45:19.795669 b636cb70 mds-1.0 handle_mds_map epoch 6 from mon1 10.09.02_05:45:19.795697 b636cb70 mds-1.0 my compat compat={},rocompat={},incompat={1=base v0.20} 10.09.02_05:45:19.795708 b636cb70 mds-1.0 mdsmap compat compat={},rocompat={},incompat={1=base v0.20} 10.09.02_05:45:19.795715 b636cb70 mds0.0 map says i am 131.169.74.117:6800/3679 mds0 state up:replay 10.09.02_05:45:19.795803 b636cb70 mds0.2 handle_mds_map i am now mds0.2 10.09.02_05:45:19.795812 b636cb70 mds0.2 handle_mds_map state change up:standby --> up:replay 10.09.02_05:45:19.795818 b636cb70 mds0.2 replay_start 10.09.02_05:45:19.795825 b636cb70 mds0.2 now replay. my recovery peers are 10.09.02_05:45:19.795835 b636cb70 mds0.cache set_recovery_set 10.09.02_05:45:19.795856 b636cb70 mds0.2 boot_start 1: opening inotable 10.09.02_05:45:19.795866 b636cb70 mds0.inotable: load 10.09.02_05:45:19.795912 b636cb70 -- 131.169.74.117:6800/3679 --> mon1 131.169.74.117:6789/0 -- mon_subscribe({mdsmap=7+, 10.09.02_05:45:19.795940 b636cb70 mds0.2 boot_start 1: opening sessionmap 10.09.02_05:45:19.795951 b636cb70 mds0.sessionmap load 10.09.02_05:45:19.795975 b636cb70 mds0.2 boot_start 1: opening anchor table 10.09.02_05:45:19.795982 b636cb70 mds0.anchortable: load 10.09.02_05:45:19.795998 b636cb70 mds0.2 boot_start 1: opening snap table 10.09.02_05:45:19.796015 b636cb70 mds0.snaptable: load 10.09.02_05:45:19.796030 b636cb70 mds0.2 boot_start 1: opening mds log 10.09.02_05:45:19.796041 b636cb70 mds0.log open discovering log bounds 10.09.02_05:45:19.796082 b636cb70 mds0.cache handle_mds_failure mds0 10.09.02_05:45:19.796093 b636cb70 mds0.cache handle_mds_failure mds0 : recovery peers are 10.09.02_05:45:19.796101 b636cb70 mds0.cache wants_resolve 10.09.02_05:45:19.796107 b636cb70 mds0.cache got_resolve 10.09.02_05:45:19.796112 b636cb70 mds0.cache rejoin_sent 10.09.02_05:45:19.796117 b636cb70 mds0.cache rejoin_gather 10.09.02_05:45:19.796123 b636cb70 mds0.cache rejoin_ack_gather 10.09.02_05:45:19.796133 b636cb70 mds0.migrator handle_mds_failure_or_stop mds0 10.09.02_05:45:19.796164 b636cb70 mds0.cache show_subtrees - no subtrees 10.09.02_05:45:19.796177 b636cb70 mds0.bal check_targets have need want 10.09.02_05:45:19.796195 b636cb70 mds0.bal rebalance done 10.09.02_05:45:19.796201 b636cb70 mds0.cache show_subtrees - no subtrees 10.09.02_05:45:19.798127 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 131.169.74.117:6789/0 12056 ==== osd_map(1,5) v1 = 10.09.02_05:45:19.798152 b636cb70 mds0.2 laggy, deferring osd_map(1,5) v1 10.09.02_05:45:19.798165 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 131.169.74.117:6789/0 12057 ==== mon_subscribe_ack 10.09.02_05:45:19.984913 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 131.169.74.117:6789/0 12058 ==== mdsbeacon(4099/1 10.09.02_05:45:19.984951 b636cb70 mds0.2 handle_mds_beacon up:boot seq 2 dne 10.09.02_05:45:19.985185 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 131.169.74.117:6789/0 12059 ==== mdsbeacon(4099/1 10.09.02_05:45:19.985210 b636cb70 mds0.2 handle_mds_beacon up:standby seq 11730 rtt 88.986215 10.09.02_05:45:19.985245 b5168b70 mds0.2 beacon_kill last_acked_stamp 10.09.02_05:43:50.998994, setting laggy flag. 10.09.02_05:45:19.985293 b636cb70 -- 131.169.74.117:6800/3679 <== mon1 131.169.74.117:6789/0 12060 ==== mdsbeacon(4099/1 10.09.02_05:45:19.985320 b636cb70 mds0.2 handle_mds_beacon up:standby seq 11731 rtt 84.986197 -------- The node was stuck at all. Do you know what can be a reason ? Any hint how to change the configuration are welcome Cheers, Bogdan On Wed, 1 Sep 2010, Wido den Hollander wrote: > Hi Bogdan, > > Yes, you can place your journal on a file, that is no problem. > > Performance wise you might want to use a block device (or partition) and > a other device then the one where your data is one. > > Wido > > On Wed, 2010-09-01 at 17:21 +0200, Bogdan Lobodzinski wrote: >> Hello Sage, >> >> replacing ext3 by btrfs my ceph test-bed survived my test comand: >> svn co https://root.cern.ch/svn/root/trunk root >> >> I didn't try ext4. >> >> However, I did a few changes in my initial ceph.conf. >> Could you, please, check if such a configuration is reasonable ? >> Is it correct to use "osd journal" location as it is done below ? >> >> My new ceph.conf: >> ----------- >> [global] >> pid file = /var/run/ceph/$name.pid >> debug ms = 1 >> keyring = /etc/ceph/keyring.bin >> [mon] >> mon data = /x01/mon$id >> debug mon = 20 >> debug paxos = 20 >> mon lease wiggle room = 0.5 >> [mon0] >> host = h1farm182 >> mon addr = xxx.xxx.xxx.116:6789 >> [mon1] >> host = h1farm183 >> mon addr = xxx.xxx.xxx.117:6789 >> [mds] >> debug mds = 10 >> mds log max segments = 2 >> keyring = /etc/ceph/keyring.$name >> [mds0] >> host = h1farm182 >> [mds1] >> host = h1farm183 >> [osd] >> sudo = true >> keyring = /etc/ceph/keyring.$name >> osd data = /x02/osd$id >> osd journal = /x02/osd$id/journal >> osd journal size = 100 >> debug osd = 20 >> debug journal = 20 >> debug filestore = 20 >> [osd0] >> host = h1farm183 >> btrfs devs = /dev/sdb1 >> [osd1] >> host = h1farm184 >> btrfs devs = /dev/sdb1 >> ----------- >> >> Thank you for help, >> >> Cheers, >> >> Bogdan >> >> >> On Tue, 31 Aug 2010, Bogdan Lobodzinski wrote: >> >>> >>> Hello Sage, >>> >>> On Mon, 30 Aug 2010, Sage Weil wrote: >>> >>>> On Mon, 30 Aug 2010, Bogdan Lobodzinski wrote: >>>>> >>>>> Hello Sage, >>>>> >>>>> I moved to the kernel 2.6.35, keeping ext3 filesystem. >>>>> After executing teh same command: >>>>> svn co https://root.cern.ch/svn/root/trunk root >>>>> >>>>> System is again dead. The command and kjournald are stuck >>>>> bogdan 8539 0.9 0.6 31168 22040 pts/0 DL+ 16:44 0:21 svn co >>>>> https://root.cern.ch/svn/root/trunk root >>>>> root 802 0.0 0.0 0 0 ? D 12:59 0:01 [kjournald] >>>> >>>> Hmm. Have you tried ext4? >>>> >>>> I stopped seeing this on my own machine with recent kernels, but it looks >>>> like it isn't in fact fixed. This should be reported to the ext4 list. >>>> Are you running ceph via vstart.sh or a custom ceph.conf? >>> I am using vstart.sh taken from compiled by me source tarball >>> ceph-0.21.tar.gz (http://ceph.newdream.net/download/) >>> and the client from >>> git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client-standalone.git >>> >>> Cheers, >>> >>> Bogdan >>> >>>> >>>> sage >>>> >>>>> >>>>> Looks like the bug is not fixed, dmesg shows: >>>>> --------- >>>>> [14325.304068] kernel BUG at >>>>> /build/buildd/linux-maverick-2.6.35/fs/ext3/balloc.c:1385! >>>>> [14325.304191] invalid opcode: 0000 [#1] SMP >>>>> [14325.304263] last sysfs file: >>>>> /sys/devices/pci0000:00/0000:00:00.0/device >>>>> [14325.304266] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss >>>>> sunrpc >>>>> ceph crc32c libcrc32c radeon ttm drm_kms_helper drm mptctl psmouse agpgart >>>>> i5000_edac usbhid hid edac_core i2c_algo_bit bnx2 i5k_amb dcdbas shpchp >>>>> serio_raw mptsas mptscsih mptbase scsi_transport_sas >>>>> [14325.304266] >>>>> [14325.304266] Pid: 8391, comm: cosd Not tainted 2.6.35-14-generic >>>>> #20~lucid2-Ubuntu 0DT097/PowerEdge 1950 >>>>> [14325.304266] EIP: 0060:[] EFLAGS: 00210286 CPU: 1 >>>>> [14325.304266] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>>>> [14325.304266] EAX: 00000027 EBX: c8641440 ECX: c07d7cfc EDX: 00000000 >>>>> [14325.304266] ESI: 007b7fff EDI: f640fa00 EBP: f5823c50 ESP: f5823c10 >>>>> [14325.304266] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >>>>> [14325.304266] Process cosd (pid: 8391, ti=f5822000 task=f6b7bf70 >>>>> task.ti=f5822000) >>>>> [14325.304266] Stack: >>>>> [14325.304266] 000000f6 c6c11930 c0273a58 00001000 f62e549c 00000007 >>>>> c8641454 >>>>> 007b7fff >>>>> [14325.304266] <0> f6f7e420 007b0000 000000f6 f640de00 00000001 000000f6 >>>>> c4063ec0 00000000 >>>>> [14325.304266] <0> f5823cc0 c0274daf c6c11930 ffffffff c8641440 f5823ca8 >>>>> f5823cac c0256017 >>>>> [14325.304266] Call Trace: >>>>> [14325.304266] [] ? read_block_bitmap+0x48/0x160 >>>>> [14325.304266] [] ? ext3_new_blocks+0x1ff/0x610 >>>>> [14325.304266] [] ? mb_cache_entry_find_first+0x67/0x80 >>>>> [14325.304266] [] ? ext3_new_block+0x25/0x30 >>>>> [14325.304266] [] ? ext3_xattr_block_set+0x481/0x550 >>>>> [14325.304266] [] ? ext3_xattr_set_entry+0x20/0x2f0 >>>>> [14325.304266] [] ? ext3_xattr_set_handle+0x31b/0x400 >>>>> [14325.304266] [] ? ext3_xattr_set+0x75/0xc0 >>>>> [14325.304266] [] ? ext3_xattr_user_set+0x74/0x80 >>>>> [14325.304266] [] ? generic_setxattr+0x9b/0xb0 >>>>> [14325.304266] [] ? generic_setxattr+0x0/0xb0 >>>>> [14325.304266] [] ? __vfs_setxattr_noperm+0x44/0x150 >>>>> [14325.304266] [] ? cap_inode_setxattr+0x2c/0x60 >>>>> [14325.304266] [] ? vfs_setxattr+0x91/0xa0 >>>>> [14325.304266] [] ? setxattr+0xb8/0x110 >>>>> [14325.304266] [] ? path_to_nameidata+0x1e/0x50 >>>>> [14325.304266] [] ? link_path_walk+0x412/0x890 >>>>> [14325.304266] [] ? enqueue_task_fair+0x39/0x80 >>>>> [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 >>>>> [14325.304266] [] ? mntput_no_expire+0x1f/0xd0 >>>>> [14325.304266] [] ? putname+0x2b/0x40 >>>>> [14325.304266] [] ? user_path_at+0x4a/0x80 >>>>> [14325.304266] [] ? sys_futex+0x72/0x120 >>>>> [14325.304266] [] ? sys_setxattr+0x83/0x90 >>>>> [14325.304266] [] ? syscall_call+0x7/0xb >>>>> [14325.304266] [] ? cache_add_dev+0x73/0x195 >>>>> [14325.304266] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f 83 >>>>> 32 >>>>> ff ff ff 8b 87 80 01 00 00 ba 5a 7e 5e c0 05 d0 00 00 00 e8 83 f1 ff ff >>>>> <0f> >>>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 ec 4b >>>>> [14325.304266] EIP: [] ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>>>> SS:ESP 0068:f5823c10 >>>>> [14325.326777] ---[ end trace 53e0b3b55af7a83c ]--- >>>>> [14384.001261] ceph: mds0 caps stale >>>>> [14413.616132] ceph: tid 33594 timed out on osd2, will reset osd >>>>> [14628.992279] ceph: mds0 hung >>>>> --------- >>>>> >>>>> as a next step I wil try to use btrfs . >>>>> >>>>> Cheers, >>>>> >>>>> Bogdan >>>>> >>>>> >>>>> On Fri, 27 Aug 2010, Sage Weil wrote: >>>>> >>>>>> Hi Bogdan, >>>>>> >>>>>> This is a bug in the ext3 xattr code. It seems to be gone in 2.6.34 and >>>>>> later. Or, you can switch to btrfs! >>>>>> >>>>>> sage >>>>>> >>>>>> >>>>>> On Fri, 27 Aug 2010, Bogdan Lobodzinski wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> working with ceph on my test configuration >>>>>>> (3 nodes Ubuntu 10.04.1 LTS, Linux 2.6.32-24-generic-pae #41-Ubuntu SMP) >>>>>>> After starting >>>>>>> svn co https://root.cern.ch/svn/root/trunk root >>>>>>> >>>>>>> on the /ceph directory, the command become stuck, and also: >>>>>>> root 5303 0.0 0.0 0 0 ? D Aug26 0:00 >>>>>>> [kjournald] >>>>>>> root 30181 0.0 0.0 6972 2056 pts/1 D+ 13:46 0:00 >>>>>>> /usr//bin/cosd >>>>>>> -i 2 -c /etc/ceph/ceph.conf >>>>>>> >>>>>>> any mount, unmount are going also to the state D. >>>>>>> This is a permanennt behaviour of the ceph if the command is started. >>>>>>> >>>>>>> dmesg shows: >>>>>>> ------------- >>>>>>> [99048.567704] ------------[ cut here ]------------ >>>>>>> [99048.568767] kernel BUG at >>>>>>> /build/buildd/linux-2.6.32/fs/ext3/balloc.c:1384! >>>>>>> [99048.568767] invalid opcode: 0000 [#1] SMP >>>>>>> [99048.568767] last sysfs file: >>>>>>> /sys/devices/pci0000:00/0000:00:00.0/device >>>>>>> [99048.596652] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc >>>>>>> ceph >>>>>>> crc32c libcrc32c openafs(P) fbcon tileblit font bitblit softcursor vga >>>>>>> vgastate mptctl radeon ttm drm_kms_helper drm bnx2 psmouse i5000_edac >>>>>>> edac_core agpgart serio_raw i5k_amb i2c_algo_bit shpchp dell_wmi dcdbas >>>>>>> usbhid mptsas mptscsih mptbase scsi_transport_sas >>>>>>> [99048.596652] >>>>>>> [99048.596652] Pid: 6258, comm: cosd Tainted: P >>>>>>> (2.6.32-24-generic-pae #41-Ubuntu) PowerEdge 1950 >>>>>>> [99048.596652] EIP: 0060:[] EFLAGS: 00210296 CPU: 3 >>>>>>> [99048.596652] EIP is at ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>>>>>> [99048.596652] EAX: 00000027 EBX: f6dd5480 ECX: fffe48f7 EDX: 00000000 >>>>>>> [99048.596652] ESI: 02147fff EDI: f625e200 EBP: f5ccbc54 ESP: f5ccbc14 >>>>>>> [99048.596652] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >>>>>>> [99048.596652] Process cosd (pid: 6258, ti=f5cca000 task=f6263300 >>>>>>> task.ti=f5cca000) >>>>>>> [99048.596652] Stack: >>>>>>> [99048.596652] 00000428 f14f1bc0 c026cc88 00001000 00000007 f1a80e9c >>>>>>> f6dd5494 02147fff >>>>>>> [99048.596652] <0> f70d89c0 02140000 00000428 f625d800 00000001 00000428 >>>>>>> f1058500 00000000 >>>>>>> [99048.596652] <0> f5ccbcc8 c026e048 f14f1bc0 ffffffff f6dd5480 f5ccbcb0 >>>>>>> f5ccbcb4 f5ccbc90 >>>>>>> [99048.596652] Call Trace: >>>>>>> [99048.596652] [] ? read_block_bitmap+0x48/0x160 >>>>>>> [99048.596652] [] ? ext3_new_blocks+0x228/0x6c0 >>>>>>> [99048.596652] [] ? mb_cache_entry_find_first+0x67/0x80 >>>>>>> [99048.596652] [] ? ext3_new_block+0x25/0x30 >>>>>>> [99048.596652] [] ? ext3_xattr_block_set+0x554/0x670 >>>>>>> [99048.596652] [] ? ext3_xattr_set_entry+0x29/0x350 >>>>>>> [99048.596652] [] ? ext3_xattr_set_handle+0x2cb/0x3e0 >>>>>>> [99048.596652] [] ? ext3_xattr_set+0x75/0xc0 >>>>>>> [99048.596652] [] ? ext3_xattr_user_set+0x76/0x80 >>>>>>> [99048.596652] [] ? generic_setxattr+0x9c/0xb0 >>>>>>> [99048.596652] [] ? generic_setxattr+0x0/0xb0 >>>>>>> [99048.596652] [] ? __vfs_setxattr_noperm+0x44/0x160 >>>>>>> [99048.596652] [] ? cap_inode_setxattr+0x2c/0x60 >>>>>>> [99048.596652] [] ? vfs_setxattr+0x91/0xa0 >>>>>>> [99048.596652] [] ? setxattr+0xb8/0x110 >>>>>>> [99048.596652] [] ? __link_path_walk+0x632/0xca0 >>>>>>> [99048.596652] [] ? enqueue_task_fair+0x39/0x80 >>>>>>> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 >>>>>>> [99048.596652] [] ? mntput_no_expire+0x1f/0xe0 >>>>>>> [99048.596652] [] ? path_put+0x25/0x30 >>>>>>> [99048.596652] [] ? putname+0x2b/0x40 >>>>>>> [99048.596652] [] ? user_path_at+0x4a/0x80 >>>>>>> [99048.596652] [] ? sys_futex+0x72/0x120 >>>>>>> [99048.596652] [] ? sys_setxattr+0x83/0x90 >>>>>>> [99048.596652] [] ? sysenter_do_call+0x12/0x28 >>>>>>> [99048.596652] Code: 83 3a ff ff ff 90 39 45 dc c7 45 0c ff ff ff ff 0f >>>>>>> 83 >>>>>>> 32 ff ff ff 8b 87 84 01 00 00 ba ba c6 5c c0 05 d0 00 00 00 e8 73 f1 >>>>>>> ff<0f> >>>>>>> 0b eb fe 8b 45 ec 89 55 d4 05 cc 00 00 00 89 45 ec e8 fc 53 >>>>>>> [99048.596652] EIP: [] >>>>>>> ext3_try_to_allocate_with_rsv+0x1cd/0x2b0 >>>>>>> SS:ESP 0068:f5ccbc14 >>>>>>> [99049.044090] ---[ end trace 35860103963ee444 ]--- >>>>>>> h1farm184# >>>>>>> -------------------- >>>>>>> >>>>>>> my ceph.conf is: >>>>>>> ------- >>>>>>> [global] >>>>>>> pid file = /var/run/ceph/$name.pid >>>>>>> debug ms = 1 >>>>>>> keyring = /etc/ceph/keyring.bin >>>>>>> ; monitors >>>>>>> [mon] >>>>>>> ;Directory for monitor files >>>>>>> mon data = /x02/mon$id >>>>>>> debug mon = 20 >>>>>>> debug paxos = 20 >>>>>>> mon lease wiggle room = 0.5 >>>>>>> >>>>>>> [mon0] >>>>>>> host = h1farm182 >>>>>>> mon addr = xxx.xxx.xx.116:6789 >>>>>>> [mon1] >>>>>>> host = h1farm183 >>>>>>> mon addr = xxx.xxx.xx.117:6789 >>>>>>> ; metadata servers >>>>>>> [mds] >>>>>>> debug mds = 20 >>>>>>> mds log max segments = 2 >>>>>>> keyring = /etc/ceph/keyring.$name >>>>>>> [mds0] >>>>>>> host = h1farm182 >>>>>>> [mds1] >>>>>>> host = h1farm183 >>>>>>> [osd] >>>>>>> sudo = true >>>>>>> osd data = /x02/osd$id >>>>>>> osd journal = /x02/osd$id/journal >>>>>>> osd journal size = 100 >>>>>>> keyring = /etc/ceph/keyring.$name >>>>>>> debug osd = 20 >>>>>>> debug journal = 20 >>>>>>> debug filestore = 20 >>>>>>> ;osd journal size = 100 >>>>>>> [osd0] >>>>>>> host = h1farm182 >>>>>>> [osd1] >>>>>>> host = h1farm183 >>>>>>> [osd2] >>>>>>> host = h1farm184 >>>>>>> >>>>>>> ------- >>>>>>> >>>>>>> Any idea how to improve the situation ? >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yehuda Sadeh Weinraub Subject: Re: Write operation is stuck Date: Fri, 3 Sep 2010 10:10:19 -0700 Message-ID: References: <1283369392.3894.8.camel@wido-laptop.pcextreme.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-ew0-f46.google.com ([209.85.215.46]:51021 "EHLO mail-ew0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755505Ab0ICRKV convert rfc822-to-8bit (ORCPT ); Fri, 3 Sep 2010 13:10:21 -0400 Received: by ewy23 with SMTP id 23so1275310ewy.19 for ; Fri, 03 Sep 2010 10:10:20 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Bogdan Lobodzinski Cc: Wido den Hollander , Sage Weil , ceph-devel@vger.kernel.org On Fri, Sep 3, 2010 at 8:02 AM, Bogdan Lobodzinski wrote: > > Hello all, > > let me continue my troubles, the title can stay the same. > As I wrote, my ceph configuration survived my critical test > svn co https://root.cern.ch/svn/root/trunk root > and suddenly, during the night, at 5 oclock ceph became stuck again -= without any kind of user activity, no work at all with /ceph directory= =2E > The node is running as > mds1, mon1, osd0 > > System log file reports (the problem starts with entry: > "Sep =A02 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps s= tale" ): > -------- > Sep =A01 12:40:38 h1farm183 kernel: [10983.398458] Btrfs loaded > Sep =A01 12:44:25 h1farm183 kernel: [11210.109913] ceph: loaded (mon/= mds/osd proto 15/32/24, osdmap 5/5 5/5) > Sep =A01 13:08:25 h1farm183 kernel: [12650.255052] device fsid 754ae4= 9f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb > 1 > Sep =A01 14:25:06 h1farm183 kernel: [17251.100851] RPC: Registered ud= p transport module. > Sep =A01 14:25:06 h1farm183 kernel: [17251.100854] RPC: Registered tc= p transport module. > Sep =A01 14:25:06 h1farm183 kernel: [17251.100855] RPC: Registered tc= p NFSv4.1 backchannel transport module. > Sep =A01 14:25:20 h1farm183 kernel: [17265.404967] device fsid 754ae4= 9f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb > 1 > Sep =A01 14:25:20 h1farm183 kernel: [17265.562870] udev: starting ver= sion 151 > Sep =A01 14:25:26 h1farm183 kernel: [17271.752817] device fsid 754ae4= 9f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb > 1 > ... > Sep =A01 16:41:51 h1farm183 kernel: [25456.385184] device fsid 4940ea= fa1c110ce7-c14b44192348589f devid 1 transid 12 /dev/sdb1 > Sep =A01 16:42:21 h1farm183 kernel: [25486.297025] ceph: client4100 f= sid 4ea08089-acf1-b738-6f72-96c3ed029b71 > Sep =A01 16:42:21 h1farm183 kernel: [25486.297169] ceph: mon0 131.169= =2E74.116:6789 session established > Sep =A02 02:37:54 h1farm183 rsyslogd: [origin software=3D"rsyslogd" s= wVersion=3D"4.2.0" x-pid=3D"863" x-info=3D"http://www.rsyslog.com"] rsy= slogd was HUPed, type 'lightweight'. > Sep =A02 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps st= ale > Sep =A02 05:44:57 h1farm183 kernel: [72441.976037] ceph: mds0 caps st= ale > Sep =A02 05:45:27 h1farm183 kernel: [72472.066320] ceph: mds0 reconne= ct start > Sep =A02 05:45:27 h1farm183 kernel: [72472.069681] Modules linked in:= nfs lockd nfs_acl auth_rpcgss sunrpc ceph btrfs zlib_deflate crc32c li= bcrc32c ppdev lp parport openafs(P) ipt_MASQUERADE iptable_nat nf_nat n= f_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcp= udp iptable_filter ip_tables x_tables bridge stp fbcon tileblit font bi= tblit softcursor vga16fb vgastate radeon ttm mptctl drm_kms_helper bnx2= drm usbhid i5000_edac hid dell_wmi shpchp edac_core agpgart i2c_algo_b= it i5k_amb dcdbas psmouse serio_raw mptsas mptscsih mptbase scsi_transp= ort_sas [last unloaded: kvm] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] Pid: 6184, comm: c= eph-msgr/1 Tainted: P =A0 =A0 =A0 =A0 =A0 (2.6.32-24-generic-pae #42-Ub= untu) PowerEdge 1950 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] EIP: 0060:[] EFLAGS: 00010246 CPU: 1 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] EIP is at kunmap_h= igh+0x97/0xa0 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] EAX: 00000000 EBX:= f5d17000 ECX: c0916848 EDX: 00000292 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] ESI: c17ee940 EDI:= f5d18000 EBP: f5fb3c6c ESP: f5fb3c64 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0DS: 007b ES: 00= 7b FS: 00d8 GS: 00e0 SS: 0068 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0c07d9280 f50b10= a0 f5fb3c74 c0138307 f5fb3c98 f9ad7d54 00000000 f5fb3cbc > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] <0> 00000038 00000= 02b eaee1018 ee4bcd70 00000000 f5fb3d14 f9ada09d 00000000 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] <0> eaee108c 00000= 05c f60bab40 eaee0e00 ee788440 f50b10a0 00000a21 00000000 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = kunmap+0x57/0x60 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = ceph_pagelist_append+0x54/0x110 [ceph] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = encode_caps_cb+0x16d/0x1f0 [ceph] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = iterate_session_caps+0xa0/0x170 [ceph] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = encode_caps_cb+0x0/0x1f0 [ceph] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = send_mds_reconnect+0x23f/0x3b0 [ceph] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = ceph_mdsc_handle_map+0x224/0x380 [ceph] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = dispatch+0x8e/0x430 [ceph] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = con_work+0x1cf6/0x1ed0 [ceph] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = __switch_to+0xcd/0x180 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = finish_task_switch+0x43/0xc0 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = schedule+0x44c/0x840 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = run_workqueue+0x8e/0x150 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = con_work+0x0/0x1ed0 [ceph] > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = worker_thread+0x84/0xe0 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = autoremove_wake_function+0x0/0x50 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = worker_thread+0x0/0xe0 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = kthread+0x74/0x80 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = kthread+0x0/0x80 > Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ? = kernel_thread_helper+0x7/0x10 > Sep =A02 05:45:27 h1farm183 kernel: [72472.304298] ---[ end trace 47e= 346731d47774d ]--- Is that all info? Missing some info about what triggered this trace. > -------- > > The node was stuck at all. > Do you know what can be a reason ? What client version are you using? This looks like a 32-bit related client issue that we haven't hit, as we mostly run 64 bit. Does this client have more than 4GB of memory? If not, you can try running it on a non-pae kernel. Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yehuda Sadeh Weinraub Subject: Re: Write operation is stuck Date: Fri, 3 Sep 2010 12:20:21 -0700 Message-ID: References: <1283369392.3894.8.camel@wido-laptop.pcextreme.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-wy0-f174.google.com ([74.125.82.174]:47988 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753120Ab0ICTUY convert rfc822-to-8bit (ORCPT ); Fri, 3 Sep 2010 15:20:24 -0400 Received: by wyf22 with SMTP id 22so190914wyf.19 for ; Fri, 03 Sep 2010 12:20:22 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Bogdan Lobodzinski Cc: Wido den Hollander , Sage Weil , ceph-devel@vger.kernel.org > On Fri, Sep 3, 2010 at 8:02 AM, Bogdan Lobodzinski wrote: >> >> Hello all, >> >> let me continue my troubles, the title can stay the same. >> As I wrote, my ceph configuration survived my critical test >> svn co https://root.cern.ch/svn/root/trunk root >> and suddenly, during the night, at 5 oclock ceph became stuck again = - without any kind of user activity, no work at all with /ceph director= y. >> The node is running as >> mds1, mon1, osd0 >> >> System log file reports (the problem starts with entry: >> "Sep =A02 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps = stale" ): >> -------- >> Sep =A01 12:40:38 h1farm183 kernel: [10983.398458] Btrfs loaded >> Sep =A01 12:44:25 h1farm183 kernel: [11210.109913] ceph: loaded (mon= /mds/osd proto 15/32/24, osdmap 5/5 5/5) >> Sep =A01 13:08:25 h1farm183 kernel: [12650.255052] device fsid 754ae= 49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb >> 1 >> Sep =A01 14:25:06 h1farm183 kernel: [17251.100851] RPC: Registered u= dp transport module. >> Sep =A01 14:25:06 h1farm183 kernel: [17251.100854] RPC: Registered t= cp transport module. >> Sep =A01 14:25:06 h1farm183 kernel: [17251.100855] RPC: Registered t= cp NFSv4.1 backchannel transport module. >> Sep =A01 14:25:20 h1farm183 kernel: [17265.404967] device fsid 754ae= 49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb >> 1 >> Sep =A01 14:25:20 h1farm183 kernel: [17265.562870] udev: starting ve= rsion 151 >> Sep =A01 14:25:26 h1farm183 kernel: [17271.752817] device fsid 754ae= 49f827ffac4-290543ed0a3b19a1 devid 1 transid 7 /dev/sdb >> 1 >> ... >> Sep =A01 16:41:51 h1farm183 kernel: [25456.385184] device fsid 4940e= afa1c110ce7-c14b44192348589f devid 1 transid 12 /dev/sdb1 >> Sep =A01 16:42:21 h1farm183 kernel: [25486.297025] ceph: client4100 = fsid 4ea08089-acf1-b738-6f72-96c3ed029b71 >> Sep =A01 16:42:21 h1farm183 kernel: [25486.297169] ceph: mon0 131.16= 9.74.116:6789 session established >> Sep =A02 02:37:54 h1farm183 rsyslogd: [origin software=3D"rsyslogd" = swVersion=3D"4.2.0" x-pid=3D"863" x-info=3D"http://www.rsyslog.com"] rs= yslogd was HUPed, type 'lightweight'. >> Sep =A02 05:44:42 h1farm183 kernel: [72426.976029] ceph: mds0 caps s= tale >> Sep =A02 05:44:57 h1farm183 kernel: [72441.976037] ceph: mds0 caps s= tale >> Sep =A02 05:45:27 h1farm183 kernel: [72472.066320] ceph: mds0 reconn= ect start >> Sep =A02 05:45:27 h1farm183 kernel: [72472.069681] Modules linked in= : nfs lockd nfs_acl auth_rpcgss sunrpc ceph btrfs zlib_deflate crc32c l= ibcrc32c ppdev lp parport openafs(P) ipt_MASQUERADE iptable_nat nf_nat = nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tc= pudp iptable_filter ip_tables x_tables bridge stp fbcon tileblit font b= itblit softcursor vga16fb vgastate radeon ttm mptctl drm_kms_helper bnx= 2 drm usbhid i5000_edac hid dell_wmi shpchp edac_core agpgart i2c_algo_= bit i5k_amb dcdbas psmouse serio_raw mptsas mptscsih mptbase scsi_trans= port_sas [last unloaded: kvm] >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] Pid: 6184, comm: = ceph-msgr/1 Tainted: P =A0 =A0 =A0 =A0 =A0 (2.6.32-24-generic-pae #42-U= buntu) PowerEdge 1950 >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] EIP: 0060:[] EFLAGS: 00010246 CPU: 1 >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] EIP is at kunmap_= high+0x97/0xa0 >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] EAX: 00000000 EBX= : f5d17000 ECX: c0916848 EDX: 00000292 >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] ESI: c17ee940 EDI= : f5d18000 EBP: f5fb3c6c ESP: f5fb3c64 >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0DS: 007b ES: 0= 07b FS: 00d8 GS: 00e0 SS: 0068 >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0c07d9280 f50b1= 0a0 f5fb3c74 c0138307 f5fb3c98 f9ad7d54 00000000 f5fb3cbc >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] <0> 00000038 0000= 002b eaee1018 ee4bcd70 00000000 f5fb3d14 f9ada09d 00000000 >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] <0> eaee108c 0000= 005c f60bab40 eaee0e00 ee788440 f50b10a0 00000a21 00000000 >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ?= kunmap+0x57/0x60 >> Sep =A02 05:45:27 h1farm183 kernel: [72472.072332] =A0[] ?= ceph_pagelist_append+0x54/0x110 [ceph] =2E.. >> The node was stuck at all. >> Do you know what can be a reason ? Maybe the following patch fixes it? I'll push a fix to the unstable branch, let me know if it works for you. Thanks, Yehuda diff --git a/fs/ceph/pagelist.c b/fs/ceph/pagelist.c index b6859f4..46a368b 100644 --- a/fs/ceph/pagelist.c +++ b/fs/ceph/pagelist.c @@ -5,10 +5,18 @@ #include "pagelist.h" +static void ceph_pagelist_unmap_tail(struct ceph_pagelist *pl) +{ + struct page *page =3D list_entry(pl->head.prev, struct page, + lru); + kunmap(page); +} + int ceph_pagelist_release(struct ceph_pagelist *pl) { if (pl->mapped_tail) - kunmap(pl->mapped_tail); + ceph_pagelist_unmap_tail(pl); + while (!list_empty(&pl->head)) { struct page *page =3D list_first_entry(&pl->head, struct page, lru); @@ -26,7 +34,7 @@ static int ceph_pagelist_addpage(struct ceph_pagelist= *pl) pl->room +=3D PAGE_SIZE; list_add_tail(&page->lru, &pl->head); if (pl->mapped_tail) - kunmap(pl->mapped_tail); + ceph_pagelist_unmap_tail(pl); pl->mapped_tail =3D kmap(page); return 0; } -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html