From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Yan, Zheng" Subject: Re: [PATCH 04/39] mds: make sure table request id unique Date: Thu, 21 Mar 2013 16:07:34 +0800 Message-ID: <514ABFC6.3080100@intel.com> References: <1363531902-24909-1-git-send-email-zheng.z.yan@intel.com> <1363531902-24909-5-git-send-email-zheng.z.yan@intel.com> <51494EF6.6040607@intel.com> <51495BEC.9000802@intel.com> <971E9C644F3C4AD9A4BFA042FC238F34@inktank.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mga03.intel.com ([143.182.124.21]:53807 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754575Ab3CUIHi (ORCPT ); Thu, 21 Mar 2013 04:07:38 -0400 In-Reply-To: <971E9C644F3C4AD9A4BFA042FC238F34@inktank.com> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Greg Farnum Cc: Sage Weil , ceph-devel@vger.kernel.org On 03/21/2013 02:31 AM, Greg Farnum wrote: > On Tuesday, March 19, 2013 at 11:49 PM, Yan, Zheng wrote: >> On 03/20/2013 02:15 PM, Sage Weil wrote: >>> On Wed, 20 Mar 2013, Yan, Zheng wrote: >>>> On 03/20/2013 07:09 AM, Greg Farnum wrote: >>>>> Hmm, this is definitely narrowing the race (probably enough to ne= ver hit it), but it's not actually eliminating it (if the restart happe= ns after 4 billion requests?). More importantly this kind of symptom ma= kes me worry that we might be papering over more serious issues with co= lliding states in the Table on restart. >>>>> I don't have the MDSTable semantics in my head so I'll need to lo= ok into this later unless somebody else volunteers to do so? >>>> =20 >>>> =20 >>>> =20 >>>> Not just 4 billion requests, MDS restart has several stage, mdsmap= epoch =20 >>>> increases for each stage. I don't think there are any more collidi= ng =20 >>>> states in the table. The table client/server use two phase commit.= it's =20 >>>> similar to client request that involves multiple MDS. the reqid is= =20 >>>> analogy to client request id. The difference is client request ID = is =20 >>>> unique because new client always get an unique session id. >>> =20 >>> =20 >>> =20 >>> Each time a tid is consumed (at least for an update) it is journale= d in =20 >>> the EMetaBlob::table_tids list, right? So we could actually take a = max =20 >>> from journal replay and pick up where we left off? That seems like = the =20 >>> cleanest. >>> =20 >>> I'm not too worried about 2^32 tids, I guess, but it would be nicer= to =20 >>> avoid that possibility. >> =20 >> =20 >> =20 >> Can we re-use the client request ID as table client request ID ? >> =20 >> Regards >> Yan, Zheng >=20 > Not sure what you're referring to here =E2=80=94 do you mean the ID o= f the filesystem client request which prompted the update? I don't thin= k that would work as client requests actually require two parts to be u= nique (the client GUID and the request seq number), and I'm pretty sure= a single client request can spawn multiple Table updates. >=20 You are right, client request ID does not work. > As I look over this more, it sure looks to me as if the effect of the= code we have (when non-broken) is to rollback every non-committed requ= est by an MDS which restarted =E2=80=94 the only time it can handle the= TableServer's "agree" with a different response is if the MDS was inco= rrectly marked out by the map. Am I parsing this correctly, Sage? Given= that, and without having looked at the code more broadly, I think we w= ant to add some sort of implicit or explicit handshake letting each of = them know if the MDS actually disappeared. We use the process/address n= once to accomplish this in other places=E2=80=A6 > -Greg >=20 The table server sends 'agree' message to table client after a 'prepare= entry' is safely logged. The table server re-sends 'agree' message in = two cases, one is the table client restarts, another is the table serve= r itself restarts. The purpose of re-sending 'agree' message is to check if the table clie= nt still wants to keep the update preparation. (The table client might = crash before submitting the update). The purpose of reqid is associate = table update preparation request with the server's 'agree' reply message. The proble= m here is that the table client does not make sure reqid unique between= restarts. If you feel 2^32 reqids are still enough, set the reqid to a= randomized 64bit value should be safe enough. Thanks Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html