All of lore.kernel.org
 help / color / mirror / Atom feed
* operate one file in multi clients with libceph
@ 2011-05-17 13:00 Simon Tian
  2011-05-18 14:40 ` Simon Tian
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Tian @ 2011-05-17 13:00 UTC (permalink / raw)
  To: ceph-devel

Hi folks,

       When I write and read a file in client A, open with
ceph_open(test_path, O_RDWR|O_CREAT,0 ), and read the same file in
client B, with ceph_open(test_path, O_RDONLY, 0).  Client A and B are
running on different host.
       After a while, client A will throw a exception. This exception
appear every time.

       The back trace in ceph is:

(gdb) bt
#0  0x000000367fa30265 in raise () from /lib64/libc.so.6
#1  0x000000367fa31d10 in abort () from /lib64/libc.so.6
#2  0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib64/libstdc++.so.6
#3  0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5  0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6  0x00007ffff7c51a78 in ceph::__ceph_assert_fail
(assertion=0x7ffff7ce0a1c "r == 0", file=0x7ffff7ce0a02
"common/Mutex.h", line=118,
    func=0x7ffff7ce0ca0 "void Mutex::Lock(bool)") at common/assert.cc:86
#7  0x00007ffff7b1ee1c in Mutex::Lock (this=0x6293f0,
no_lockdep=false) at common/Mutex.h:118
#8  0x00007ffff7b395f4 in Client::sync_write_commit (this=0x629090,
in=0x7ffff0001b50) at client/Client.cc:4979
#9  0x00007ffff7baf304 in C_Client_SyncCommit::finish
(this=0x7ffff3206300) at client/Client.cc:4973
#10 0x00007ffff7c951d5 in Objecter::handle_osd_op_reply
(this=0x62b420, m=0x632190) at osdc/Objecter.cc:806
#11 0x00007ffff7b56038 in Client::ms_dispatch (this=0x629090,
m=0x632190) at client/Client.cc:1414
#12 0x00007ffff7bcb01d in Messenger::ms_deliver_dispatch
(this=0x628350, m=0x632190) at msg/Messenger.h:98
#13 0x00007ffff7bb262b in SimpleMessenger::dispatch_entry
(this=0x628350) at msg/SimpleMessenger.cc:352
#14 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry
(this=0x6287d8) at msg/SimpleMessenger.h:533
#15 0x00007ffff7b5aa28 in Thread::_entry_func (arg=0x6287d8) at
./common/Thread.h:41
#16 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0
#17 0x000000367fad3c2d in clone () from /lib64/libc.so.6

So is there any way to avoid this?


Thx!
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: operate one file in multi clients with libceph
  2011-05-17 13:00 operate one file in multi clients with libceph Simon Tian
@ 2011-05-18 14:40 ` Simon Tian
  2011-05-18 15:26   ` Brian Chrisman
  2011-05-18 16:01   ` Sage Weil
  0 siblings, 2 replies; 10+ messages in thread
From: Simon Tian @ 2011-05-18 14:40 UTC (permalink / raw)
  To: ceph-devel

Hi,

    Could any one give me an answer?

   My application need to open one file with two clients in different
hosts. One for write/read and one for read only. The client could be
developed based on libceph or librbd.
   I tried librbd, exception appear too.


Thx very much! Really need your help...

Simon



2011/5/17 Simon Tian <aixt2006@gmail.com>:
> Hi folks,
>
>       When I write and read a file in client A, open with
> ceph_open(test_path, O_RDWR|O_CREAT,0 ), and read the same file in
> client B, with ceph_open(test_path, O_RDONLY, 0).  Client A and B are
> running on different host.
>       After a while, client A will throw a exception. This exception
> appear every time.
>
>       The back trace in ceph is:
>
> (gdb) bt
> #0  0x000000367fa30265 in raise () from /lib64/libc.so.6
> #1  0x000000367fa31d10 in abort () from /lib64/libc.so.6
> #2  0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() ()
> from /usr/lib64/libstdc++.so.6
> #3  0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6
> #4  0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
> #5  0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
> #6  0x00007ffff7c51a78 in ceph::__ceph_assert_fail
> (assertion=0x7ffff7ce0a1c "r == 0", file=0x7ffff7ce0a02
> "common/Mutex.h", line=118,
>    func=0x7ffff7ce0ca0 "void Mutex::Lock(bool)") at common/assert.cc:86
> #7  0x00007ffff7b1ee1c in Mutex::Lock (this=0x6293f0,
> no_lockdep=false) at common/Mutex.h:118
> #8  0x00007ffff7b395f4 in Client::sync_write_commit (this=0x629090,
> in=0x7ffff0001b50) at client/Client.cc:4979
> #9  0x00007ffff7baf304 in C_Client_SyncCommit::finish
> (this=0x7ffff3206300) at client/Client.cc:4973
> #10 0x00007ffff7c951d5 in Objecter::handle_osd_op_reply
> (this=0x62b420, m=0x632190) at osdc/Objecter.cc:806
> #11 0x00007ffff7b56038 in Client::ms_dispatch (this=0x629090,
> m=0x632190) at client/Client.cc:1414
> #12 0x00007ffff7bcb01d in Messenger::ms_deliver_dispatch
> (this=0x628350, m=0x632190) at msg/Messenger.h:98
> #13 0x00007ffff7bb262b in SimpleMessenger::dispatch_entry
> (this=0x628350) at msg/SimpleMessenger.cc:352
> #14 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry
> (this=0x6287d8) at msg/SimpleMessenger.h:533
> #15 0x00007ffff7b5aa28 in Thread::_entry_func (arg=0x6287d8) at
> ./common/Thread.h:41
> #16 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0
> #17 0x000000367fad3c2d in clone () from /lib64/libc.so.6
>
> So is there any way to avoid this?
>
>
> Thx!
> Simon
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: operate one file in multi clients with libceph
  2011-05-18 14:40 ` Simon Tian
@ 2011-05-18 15:26   ` Brian Chrisman
  2011-05-19  6:48     ` Simon Tian
  2011-05-18 16:01   ` Sage Weil
  1 sibling, 1 reply; 10+ messages in thread
From: Brian Chrisman @ 2011-05-18 15:26 UTC (permalink / raw)
  To: Simon Tian; +Cc: ceph-devel

On Wed, May 18, 2011 at 7:40 AM, Simon Tian <aixt2006@gmail.com> wrote:
> Hi,
>
>    Could any one give me an answer?
>
>   My application need to open one file with two clients in different
> hosts. One for write/read and one for read only. The client could be
> developed based on libceph or librbd.
>   I tried librbd, exception appear too.

The libceph API is not yet very well tested.
Can you simply mount the filesystem from both hosts using the kernel
client and have your application open/access the single file with the
standard system call interface?


>
>
> Thx very much! Really need your help...
>
> Simon
>
>
>
> 2011/5/17 Simon Tian <aixt2006@gmail.com>:
>> Hi folks,
>>
>>       When I write and read a file in client A, open with
>> ceph_open(test_path, O_RDWR|O_CREAT,0 ), and read the same file in
>> client B, with ceph_open(test_path, O_RDONLY, 0).  Client A and B are
>> running on different host.
>>       After a while, client A will throw a exception. This exception
>> appear every time.
>>
>>       The back trace in ceph is:
>>
>> (gdb) bt
>> #0  0x000000367fa30265 in raise () from /lib64/libc.so.6
>> #1  0x000000367fa31d10 in abort () from /lib64/libc.so.6
>> #2  0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() ()
>> from /usr/lib64/libstdc++.so.6
>> #3  0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6
>> #4  0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
>> #5  0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
>> #6  0x00007ffff7c51a78 in ceph::__ceph_assert_fail
>> (assertion=0x7ffff7ce0a1c "r == 0", file=0x7ffff7ce0a02
>> "common/Mutex.h", line=118,
>>    func=0x7ffff7ce0ca0 "void Mutex::Lock(bool)") at common/assert.cc:86
>> #7  0x00007ffff7b1ee1c in Mutex::Lock (this=0x6293f0,
>> no_lockdep=false) at common/Mutex.h:118
>> #8  0x00007ffff7b395f4 in Client::sync_write_commit (this=0x629090,
>> in=0x7ffff0001b50) at client/Client.cc:4979
>> #9  0x00007ffff7baf304 in C_Client_SyncCommit::finish
>> (this=0x7ffff3206300) at client/Client.cc:4973
>> #10 0x00007ffff7c951d5 in Objecter::handle_osd_op_reply
>> (this=0x62b420, m=0x632190) at osdc/Objecter.cc:806
>> #11 0x00007ffff7b56038 in Client::ms_dispatch (this=0x629090,
>> m=0x632190) at client/Client.cc:1414
>> #12 0x00007ffff7bcb01d in Messenger::ms_deliver_dispatch
>> (this=0x628350, m=0x632190) at msg/Messenger.h:98
>> #13 0x00007ffff7bb262b in SimpleMessenger::dispatch_entry
>> (this=0x628350) at msg/SimpleMessenger.cc:352
>> #14 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry
>> (this=0x6287d8) at msg/SimpleMessenger.h:533
>> #15 0x00007ffff7b5aa28 in Thread::_entry_func (arg=0x6287d8) at
>> ./common/Thread.h:41
>> #16 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0
>> #17 0x000000367fad3c2d in clone () from /lib64/libc.so.6
>>
>> So is there any way to avoid this?
>>
>>
>> Thx!
>> Simon
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: operate one file in multi clients with libceph
  2011-05-18 14:40 ` Simon Tian
  2011-05-18 15:26   ` Brian Chrisman
@ 2011-05-18 16:01   ` Sage Weil
  2011-05-19  6:58     ` Simon Tian
  1 sibling, 1 reply; 10+ messages in thread
From: Sage Weil @ 2011-05-18 16:01 UTC (permalink / raw)
  To: Simon Tian; +Cc: ceph-devel

Hi Simon,

On Wed, 18 May 2011, Simon Tian wrote:

> Hi,
> 
>     Could any one give me an answer?
> 
>    My application need to open one file with two clients in different
> hosts. One for write/read and one for read only. The client could be
> developed based on libceph or librbd.
>    I tried librbd, exception appear too.

I opened a bug for this, http://tracker.newdream.net/issues/1097.

Can you try the patch below?  I this may just be something missed in a 
locking rewrite way back when in a rare code path:

Thanks!
sage



diff --git a/src/client/Client.cc b/src/client/Client.cc
index 7f7fb08..bf0997a 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -4934,7 +4934,6 @@ public:
 
 void Client::sync_write_commit(Inode *in)
 {
-  client_lock.Lock();
   assert(unsafe_sync_write > 0);
   unsafe_sync_write--;
 
@@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
   }
 
   put_inode(in);
-
-  client_lock.Unlock();
 }
 
 int Client::write(int fd, const char *buf, loff_t size, loff_t offset) 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: operate one file in multi clients with libceph
  2011-05-18 15:26   ` Brian Chrisman
@ 2011-05-19  6:48     ` Simon Tian
  0 siblings, 0 replies; 10+ messages in thread
From: Simon Tian @ 2011-05-19  6:48 UTC (permalink / raw)
  To: Brian Chrisman; +Cc: ceph-devel

> The libceph API is not yet very well tested.
> Can you simply mount the filesystem from both hosts using the kernel
> client and have your application open/access the single file with the
> standard system call interface?


Hi Brian,

There are three reasons for using libceph API:
1. My application could use non-posix API, like libceph and librbd,
very conveniently and flexible.
2. If using the kernel client, performance will reduce because of the
VFS layer. I have done some test for a comparison two weeks ago.
3. If error or exception appear int the kernel client, the whole
machine may be not able to work.

Thx!
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: operate one file in multi clients with libceph
  2011-05-18 16:01   ` Sage Weil
@ 2011-05-19  6:58     ` Simon Tian
  2011-05-19 17:11       ` Sage Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Tian @ 2011-05-19  6:58 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

    I've test this patch.
With 8 times test, only 1 (the first time) got a thread assert fail.As
back trace 1 showed:
==========================  back trace 1   ==================================
(gdb) bt
#0  0x000000367fa30265 in raise () from /lib64/libc.so.6
#1  0x000000367fa31d10 in abort () from /lib64/libc.so.6
#2  0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib64/libstdc++.so.6
#3  0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5  0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6  0x00007ffff7c51a54 in ceph::__ceph_assert_fail
(assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1",
    file=0x7ffff7ce1431 "client/Client.cc", line=2190,
func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)")
    at common/assert.cc:86
#7  0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50,
onfinish=0x7fffe8000b60) at client/Client.cc:2190
#8  0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090,
in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0)
    at client/Client.cc:2930
#9  0x00007ffff7b52d2a in Client::handle_caps (this=0x629090,
m=0x631fc0) at client/Client.cc:2711
#10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090,
m=0x631fc0) at client/Client.cc:1444
#11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch
(this=0x628350, m=0x631fc0) at msg/Messenger.h:98
#12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry
(this=0x628350) at msg/SimpleMessenger.cc:352
#13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry
(this=0x6287d8) at msg/SimpleMessenger.h:533
#14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at
./common/Thread.h:41
#15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0
#16 0x000000367fad3c2d in clone () from /lib64/libc.so.6
============================================================

In addition, when writing data for a long time, the write thread will
hang on a pthread_cond_wait, as back trace 2 showed:
==========================  back trace 2  ==================================
Thread 10 (Thread 0x43f87940 (LWP 19986)):
#0  0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f147a8e763b in Cond::Wait (this=0x43f86b00,
mutex=@0x66d790) at ./common/Cond.h:46
#2  0x00007f147a86630d in Client::wait_on_list (this=0x66d430,
ls=@0x674df0) at client/Client.cc:2140
#3  0x00007f147a8771ff in Client::get_caps (this=0x66d430,
in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848)
    at client/Client.cc:1827
#4  0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510,
offset=235470336, size=512,
    buf=0x6452b0 '' <repeats 33 times>, ".001", '' <repeats 165
times>...) at client/Client.cc:5055
#5  0x00007f147a886d73 in Client::write (this=0x66d430, fd=10,
    buf=0x6452b0 '' <repeats 33 times>, ".001", '' <repeats 165
times>..., size=512, offset=235470336) at client/Client.cc:5007
#6  0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 '' <repeats
33 times>, ".001", '' <repeats 165 times>..., size=512,
    offset=235470336) at libceph.cc:322
#7  0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30,
offset=235470336, length=512,
    buf=0x6452b0 '' <repeats 33 times>, ".001", '' <repeats 165
times>..., queue=@0x644b60, io=0x6456a0)
    at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191
#8  0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30,
io=0x6456a0, queue=@0x644b60, netfd=5)
    at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268
#9  0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at
/opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390
#10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0
#11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6
============================================================

Seem not very safe with the patch.

So I applied this patch:
diff --git a/src/client/Client.cc b/src/client/Client.cc
index 7f7fb08..bf0997a 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -4934,7 +4934,6 @@ public:

 void Client::sync_write_commit(Inode *in)
 {
-  client_lock.Lock();
+ int r = client_lock.Lock();
  assert(unsafe_sync_write > 0);
  unsafe_sync_write--;

@@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
  }

  put_inode(in);
+ if (r)
  client_lock.Unlock();
 }

 int Client::write(int fd, const char *buf, loff_t size, loff_t offset)


After some test, back trace 1 didn't appear, but back trace 2 still appear.
Hmm,, not safe also.

Thx!
Simon



2011/5/19 Sage Weil <sage@newdream.net>:
> Hi Simon,
>
> On Wed, 18 May 2011, Simon Tian wrote:
>
>> Hi,
>>
>>     Could any one give me an answer?
>>
>>    My application need to open one file with two clients in different
>> hosts. One for write/read and one for read only. The client could be
>> developed based on libceph or librbd.
>>    I tried librbd, exception appear too.
>
> I opened a bug for this, http://tracker.newdream.net/issues/1097.
>
> Can you try the patch below?  I this may just be something missed in a
> locking rewrite way back when in a rare code path:
>
> Thanks!
> sage
>
>
>
> diff --git a/src/client/Client.cc b/src/client/Client.cc
> index 7f7fb08..bf0997a 100644
> --- a/src/client/Client.cc
> +++ b/src/client/Client.cc
> @@ -4934,7 +4934,6 @@ public:
>
>  void Client::sync_write_commit(Inode *in)
>  {
> -  client_lock.Lock();
>   assert(unsafe_sync_write > 0);
>   unsafe_sync_write--;
>
> @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
>   }
>
>   put_inode(in);
> -
> -  client_lock.Unlock();
>  }
>
>  int Client::write(int fd, const char *buf, loff_t size, loff_t offset)
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: operate one file in multi clients with libceph
  2011-05-19  6:58     ` Simon Tian
@ 2011-05-19 17:11       ` Sage Weil
  2011-05-19 17:20         ` Sage Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2011-05-19 17:11 UTC (permalink / raw)
  To: Simon Tian; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6592 bytes --]

On Thu, 19 May 2011, Simon Tian wrote:
> Hi Sage,
> 
>     I've test this patch.
> With 8 times test, only 1 (the first time) got a thread assert fail.As
> back trace 1 showed:

Do you have a simple reproducer for this?

sage


> ==========================  back trace 1   ==================================
> (gdb) bt
> #0  0x000000367fa30265 in raise () from /lib64/libc.so.6
> #1  0x000000367fa31d10 in abort () from /lib64/libc.so.6
> #2  0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() ()
> from /usr/lib64/libstdc++.so.6
> #3  0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6
> #4  0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
> #5  0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
> #6  0x00007ffff7c51a54 in ceph::__ceph_assert_fail
> (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1",
>     file=0x7ffff7ce1431 "client/Client.cc", line=2190,
> func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)")
>     at common/assert.cc:86
> #7  0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50,
> onfinish=0x7fffe8000b60) at client/Client.cc:2190
> #8  0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090,
> in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0)
>     at client/Client.cc:2930
> #9  0x00007ffff7b52d2a in Client::handle_caps (this=0x629090,
> m=0x631fc0) at client/Client.cc:2711
> #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090,
> m=0x631fc0) at client/Client.cc:1444
> #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch
> (this=0x628350, m=0x631fc0) at msg/Messenger.h:98
> #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry
> (this=0x628350) at msg/SimpleMessenger.cc:352
> #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry
> (this=0x6287d8) at msg/SimpleMessenger.h:533
> #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at
> ./common/Thread.h:41
> #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0
> #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6
> ============================================================
> 
> In addition, when writing data for a long time, the write thread will
> hang on a pthread_cond_wait, as back trace 2 showed:
> ==========================  back trace 2  ==================================
> Thread 10 (Thread 0x43f87940 (LWP 19986)):
> #0  0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00007f147a8e763b in Cond::Wait (this=0x43f86b00,
> mutex=@0x66d790) at ./common/Cond.h:46
> #2  0x00007f147a86630d in Client::wait_on_list (this=0x66d430,
> ls=@0x674df0) at client/Client.cc:2140
> #3  0x00007f147a8771ff in Client::get_caps (this=0x66d430,
> in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848)
>     at client/Client.cc:1827
> #4  0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510,
> offset=235470336, size=512,
>     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
> times>...) at client/Client.cc:5055
> #5  0x00007f147a886d73 in Client::write (this=0x66d430, fd=10,
>     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
> times>..., size=512, offset=235470336) at client/Client.cc:5007
> #6  0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 'ÿÿ' <repeats
> 33 times>, ".001", 'ÿÿ' <repeats 165 times>..., size=512,
>     offset=235470336) at libceph.cc:322
> #7  0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30,
> offset=235470336, length=512,
>     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
> times>..., queue=@0x644b60, io=0x6456a0)
>     at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191
> #8  0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30,
> io=0x6456a0, queue=@0x644b60, netfd=5)
>     at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268
> #9  0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at
> /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390
> #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0
> #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6
> ============================================================
> 
> Seem not very safe with the patch.
> 
> So I applied this patch:
> diff --git a/src/client/Client.cc b/src/client/Client.cc
> index 7f7fb08..bf0997a 100644
> --- a/src/client/Client.cc
> +++ b/src/client/Client.cc
> @@ -4934,7 +4934,6 @@ public:
> 
>  void Client::sync_write_commit(Inode *in)
>  {
> -  client_lock.Lock();
> + int r = client_lock.Lock();
>   assert(unsafe_sync_write > 0);
>   unsafe_sync_write--;
> 
> @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
>   }
> 
>   put_inode(in);
> + if (r)
>   client_lock.Unlock();
>  }
> 
>  int Client::write(int fd, const char *buf, loff_t size, loff_t offset)
> 
> 
> After some test, back trace 1 didn't appear, but back trace 2 still appear.
> Hmm,, not safe also.
> 
> Thx!
> Simon
> 
> 
> 
> 2011/5/19 Sage Weil <sage@newdream.net>:
> > Hi Simon,
> >
> > On Wed, 18 May 2011, Simon Tian wrote:
> >
> >> Hi,
> >>
> >>     Could any one give me an answer?
> >>
> >>    My application need to open one file with two clients in different
> >> hosts. One for write/read and one for read only. The client could be
> >> developed based on libceph or librbd.
> >>    I tried librbd, exception appear too.
> >
> > I opened a bug for this, http://tracker.newdream.net/issues/1097.
> >
> > Can you try the patch below?  I this may just be something missed in a
> > locking rewrite way back when in a rare code path:
> >
> > Thanks!
> > sage
> >
> >
> >
> > diff --git a/src/client/Client.cc b/src/client/Client.cc
> > index 7f7fb08..bf0997a 100644
> > --- a/src/client/Client.cc
> > +++ b/src/client/Client.cc
> > @@ -4934,7 +4934,6 @@ public:
> >
> >  void Client::sync_write_commit(Inode *in)
> >  {
> > -  client_lock.Lock();
> >   assert(unsafe_sync_write > 0);
> >   unsafe_sync_write--;
> >
> > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
> >   }
> >
> >   put_inode(in);
> > -
> > -  client_lock.Unlock();
> >  }
> >
> >  int Client::write(int fd, const char *buf, loff_t size, loff_t offset)
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: operate one file in multi clients with libceph
  2011-05-19 17:11       ` Sage Weil
@ 2011-05-19 17:20         ` Sage Weil
  2011-05-19 21:45           ` Sage Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2011-05-19 17:20 UTC (permalink / raw)
  To: Simon Tian; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7051 bytes --]

On Thu, 19 May 2011, Sage Weil wrote:
> On Thu, 19 May 2011, Simon Tian wrote:
> > Hi Sage,
> > 
> >     I've test this patch.
> > With 8 times test, only 1 (the first time) got a thread assert fail.As
> > back trace 1 showed:
> 
> Do you have a simple reproducer for this?

Nevermind, I'm easily triggering this with two cfuse mounts.

sage


> 
> sage
> 
> 
> > ==========================  back trace 1   ==================================
> > (gdb) bt
> > #0  0x000000367fa30265 in raise () from /lib64/libc.so.6
> > #1  0x000000367fa31d10 in abort () from /lib64/libc.so.6
> > #2  0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() ()
> > from /usr/lib64/libstdc++.so.6
> > #3  0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6
> > #4  0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
> > #5  0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
> > #6  0x00007ffff7c51a54 in ceph::__ceph_assert_fail
> > (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1",
> >     file=0x7ffff7ce1431 "client/Client.cc", line=2190,
> > func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)")
> >     at common/assert.cc:86
> > #7  0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50,
> > onfinish=0x7fffe8000b60) at client/Client.cc:2190
> > #8  0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090,
> > in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0)
> >     at client/Client.cc:2930
> > #9  0x00007ffff7b52d2a in Client::handle_caps (this=0x629090,
> > m=0x631fc0) at client/Client.cc:2711
> > #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090,
> > m=0x631fc0) at client/Client.cc:1444
> > #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch
> > (this=0x628350, m=0x631fc0) at msg/Messenger.h:98
> > #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry
> > (this=0x628350) at msg/SimpleMessenger.cc:352
> > #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry
> > (this=0x6287d8) at msg/SimpleMessenger.h:533
> > #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at
> > ./common/Thread.h:41
> > #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0
> > #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6
> > ============================================================
> > 
> > In addition, when writing data for a long time, the write thread will
> > hang on a pthread_cond_wait, as back trace 2 showed:
> > ==========================  back trace 2  ==================================
> > Thread 10 (Thread 0x43f87940 (LWP 19986)):
> > #0  0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1  0x00007f147a8e763b in Cond::Wait (this=0x43f86b00,
> > mutex=@0x66d790) at ./common/Cond.h:46
> > #2  0x00007f147a86630d in Client::wait_on_list (this=0x66d430,
> > ls=@0x674df0) at client/Client.cc:2140
> > #3  0x00007f147a8771ff in Client::get_caps (this=0x66d430,
> > in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848)
> >     at client/Client.cc:1827
> > #4  0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510,
> > offset=235470336, size=512,
> >     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
> > times>...) at client/Client.cc:5055
> > #5  0x00007f147a886d73 in Client::write (this=0x66d430, fd=10,
> >     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
> > times>..., size=512, offset=235470336) at client/Client.cc:5007
> > #6  0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 'ÿÿ' <repeats
> > 33 times>, ".001", 'ÿÿ' <repeats 165 times>..., size=512,
> >     offset=235470336) at libceph.cc:322
> > #7  0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30,
> > offset=235470336, length=512,
> >     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
> > times>..., queue=@0x644b60, io=0x6456a0)
> >     at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191
> > #8  0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30,
> > io=0x6456a0, queue=@0x644b60, netfd=5)
> >     at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268
> > #9  0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at
> > /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390
> > #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0
> > #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6
> > ============================================================
> > 
> > Seem not very safe with the patch.
> > 
> > So I applied this patch:
> > diff --git a/src/client/Client.cc b/src/client/Client.cc
> > index 7f7fb08..bf0997a 100644
> > --- a/src/client/Client.cc
> > +++ b/src/client/Client.cc
> > @@ -4934,7 +4934,6 @@ public:
> > 
> >  void Client::sync_write_commit(Inode *in)
> >  {
> > -  client_lock.Lock();
> > + int r = client_lock.Lock();
> >   assert(unsafe_sync_write > 0);
> >   unsafe_sync_write--;
> > 
> > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
> >   }
> > 
> >   put_inode(in);
> > + if (r)
> >   client_lock.Unlock();
> >  }
> > 
> >  int Client::write(int fd, const char *buf, loff_t size, loff_t offset)
> > 
> > 
> > After some test, back trace 1 didn't appear, but back trace 2 still appear.
> > Hmm,, not safe also.
> > 
> > Thx!
> > Simon
> > 
> > 
> > 
> > 2011/5/19 Sage Weil <sage@newdream.net>:
> > > Hi Simon,
> > >
> > > On Wed, 18 May 2011, Simon Tian wrote:
> > >
> > >> Hi,
> > >>
> > >>     Could any one give me an answer?
> > >>
> > >>    My application need to open one file with two clients in different
> > >> hosts. One for write/read and one for read only. The client could be
> > >> developed based on libceph or librbd.
> > >>    I tried librbd, exception appear too.
> > >
> > > I opened a bug for this, http://tracker.newdream.net/issues/1097.
> > >
> > > Can you try the patch below?  I this may just be something missed in a
> > > locking rewrite way back when in a rare code path:
> > >
> > > Thanks!
> > > sage
> > >
> > >
> > >
> > > diff --git a/src/client/Client.cc b/src/client/Client.cc
> > > index 7f7fb08..bf0997a 100644
> > > --- a/src/client/Client.cc
> > > +++ b/src/client/Client.cc
> > > @@ -4934,7 +4934,6 @@ public:
> > >
> > >  void Client::sync_write_commit(Inode *in)
> > >  {
> > > -  client_lock.Lock();
> > >   assert(unsafe_sync_write > 0);
> > >   unsafe_sync_write--;
> > >
> > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
> > >   }
> > >
> > >   put_inode(in);
> > > -
> > > -  client_lock.Unlock();
> > >  }
> > >
> > >  int Client::write(int fd, const char *buf, loff_t size, loff_t offset)
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: operate one file in multi clients with libceph
  2011-05-19 17:20         ` Sage Weil
@ 2011-05-19 21:45           ` Sage Weil
  2011-05-20  8:56             ` Simon Tian
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2011-05-19 21:45 UTC (permalink / raw)
  To: Simon Tian; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7624 bytes --]

On Thu, 19 May 2011, Sage Weil wrote:
> On Thu, 19 May 2011, Sage Weil wrote:
> > On Thu, 19 May 2011, Simon Tian wrote:
> > > Hi Sage,
> > > 
> > >     I've test this patch.
> > > With 8 times test, only 1 (the first time) got a thread assert fail.As
> > > back trace 1 showed:
> > 
> > Do you have a simple reproducer for this?
> 
> Nevermind, I'm easily triggering this with two cfuse mounts.

Hi Simon,

I just pushed a series of fixes to this code to the master branch that 
fixes both of these crashes in my testing.  Can you try it out?

Thanks!
sage



> 
> sage
> 
> 
> > 
> > sage
> > 
> > 
> > > ==========================  back trace 1   ==================================
> > > (gdb) bt
> > > #0  0x000000367fa30265 in raise () from /lib64/libc.so.6
> > > #1  0x000000367fa31d10 in abort () from /lib64/libc.so.6
> > > #2  0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() ()
> > > from /usr/lib64/libstdc++.so.6
> > > #3  0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6
> > > #4  0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
> > > #5  0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
> > > #6  0x00007ffff7c51a54 in ceph::__ceph_assert_fail
> > > (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1",
> > >     file=0x7ffff7ce1431 "client/Client.cc", line=2190,
> > > func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)")
> > >     at common/assert.cc:86
> > > #7  0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50,
> > > onfinish=0x7fffe8000b60) at client/Client.cc:2190
> > > #8  0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090,
> > > in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0)
> > >     at client/Client.cc:2930
> > > #9  0x00007ffff7b52d2a in Client::handle_caps (this=0x629090,
> > > m=0x631fc0) at client/Client.cc:2711
> > > #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090,
> > > m=0x631fc0) at client/Client.cc:1444
> > > #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch
> > > (this=0x628350, m=0x631fc0) at msg/Messenger.h:98
> > > #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry
> > > (this=0x628350) at msg/SimpleMessenger.cc:352
> > > #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry
> > > (this=0x6287d8) at msg/SimpleMessenger.h:533
> > > #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at
> > > ./common/Thread.h:41
> > > #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0
> > > #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6
> > > ============================================================
> > > 
> > > In addition, when writing data for a long time, the write thread will
> > > hang on a pthread_cond_wait, as back trace 2 showed:
> > > ==========================  back trace 2  ==================================
> > > Thread 10 (Thread 0x43f87940 (LWP 19986)):
> > > #0  0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from
> > > /lib64/libpthread.so.0
> > > #1  0x00007f147a8e763b in Cond::Wait (this=0x43f86b00,
> > > mutex=@0x66d790) at ./common/Cond.h:46
> > > #2  0x00007f147a86630d in Client::wait_on_list (this=0x66d430,
> > > ls=@0x674df0) at client/Client.cc:2140
> > > #3  0x00007f147a8771ff in Client::get_caps (this=0x66d430,
> > > in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848)
> > >     at client/Client.cc:1827
> > > #4  0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510,
> > > offset=235470336, size=512,
> > >     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
> > > times>...) at client/Client.cc:5055
> > > #5  0x00007f147a886d73 in Client::write (this=0x66d430, fd=10,
> > >     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
> > > times>..., size=512, offset=235470336) at client/Client.cc:5007
> > > #6  0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 'ÿÿ' <repeats
> > > 33 times>, ".001", 'ÿÿ' <repeats 165 times>..., size=512,
> > >     offset=235470336) at libceph.cc:322
> > > #7  0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30,
> > > offset=235470336, length=512,
> > >     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
> > > times>..., queue=@0x644b60, io=0x6456a0)
> > >     at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191
> > > #8  0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30,
> > > io=0x6456a0, queue=@0x644b60, netfd=5)
> > >     at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268
> > > #9  0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at
> > > /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390
> > > #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0
> > > #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6
> > > ============================================================
> > > 
> > > Seem not very safe with the patch.
> > > 
> > > So I applied this patch:
> > > diff --git a/src/client/Client.cc b/src/client/Client.cc
> > > index 7f7fb08..bf0997a 100644
> > > --- a/src/client/Client.cc
> > > +++ b/src/client/Client.cc
> > > @@ -4934,7 +4934,6 @@ public:
> > > 
> > >  void Client::sync_write_commit(Inode *in)
> > >  {
> > > -  client_lock.Lock();
> > > + int r = client_lock.Lock();
> > >   assert(unsafe_sync_write > 0);
> > >   unsafe_sync_write--;
> > > 
> > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
> > >   }
> > > 
> > >   put_inode(in);
> > > + if (r)
> > >   client_lock.Unlock();
> > >  }
> > > 
> > >  int Client::write(int fd, const char *buf, loff_t size, loff_t offset)
> > > 
> > > 
> > > After some test, back trace 1 didn't appear, but back trace 2 still appear.
> > > Hmm,, not safe also.
> > > 
> > > Thx!
> > > Simon
> > > 
> > > 
> > > 
> > > 2011/5/19 Sage Weil <sage@newdream.net>:
> > > > Hi Simon,
> > > >
> > > > On Wed, 18 May 2011, Simon Tian wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >>     Could any one give me an answer?
> > > >>
> > > >>    My application need to open one file with two clients in different
> > > >> hosts. One for write/read and one for read only. The client could be
> > > >> developed based on libceph or librbd.
> > > >>    I tried librbd, exception appear too.
> > > >
> > > > I opened a bug for this, http://tracker.newdream.net/issues/1097.
> > > >
> > > > Can you try the patch below?  I this may just be something missed in a
> > > > locking rewrite way back when in a rare code path:
> > > >
> > > > Thanks!
> > > > sage
> > > >
> > > >
> > > >
> > > > diff --git a/src/client/Client.cc b/src/client/Client.cc
> > > > index 7f7fb08..bf0997a 100644
> > > > --- a/src/client/Client.cc
> > > > +++ b/src/client/Client.cc
> > > > @@ -4934,7 +4934,6 @@ public:
> > > >
> > > >  void Client::sync_write_commit(Inode *in)
> > > >  {
> > > > -  client_lock.Lock();
> > > >   assert(unsafe_sync_write > 0);
> > > >   unsafe_sync_write--;
> > > >
> > > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
> > > >   }
> > > >
> > > >   put_inode(in);
> > > > -
> > > > -  client_lock.Unlock();
> > > >  }
> > > >
> > > >  int Client::write(int fd, const char *buf, loff_t size, loff_t offset)
> > > >
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: operate one file in multi clients with libceph
  2011-05-19 21:45           ` Sage Weil
@ 2011-05-20  8:56             ` Simon Tian
  0 siblings, 0 replies; 10+ messages in thread
From: Simon Tian @ 2011-05-20  8:56 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

OK,  I'll test the branch and give the result these days.

Thanks very much!

Simon

2011/5/20 Sage Weil <sage@newdream.net>:
> On Thu, 19 May 2011, Sage Weil wrote:
>> On Thu, 19 May 2011, Sage Weil wrote:
>> > On Thu, 19 May 2011, Simon Tian wrote:
>> > > Hi Sage,
>> > >
>> > >     I've test this patch.
>> > > With 8 times test, only 1 (the first time) got a thread assert fail.As
>> > > back trace 1 showed:
>> >
>> > Do you have a simple reproducer for this?
>>
>> Nevermind, I'm easily triggering this with two cfuse mounts.
>
> Hi Simon,
>
> I just pushed a series of fixes to this code to the master branch that
> fixes both of these crashes in my testing.  Can you try it out?
>
> Thanks!
> sage
>
>
>
>>
>> sage
>>
>>
>> >
>> > sage
>> >
>> >
>> > > ==========================  back trace 1   ==================================
>> > > (gdb) bt
>> > > #0  0x000000367fa30265 in raise () from /lib64/libc.so.6
>> > > #1  0x000000367fa31d10 in abort () from /lib64/libc.so.6
>> > > #2  0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() ()
>> > > from /usr/lib64/libstdc++.so.6
>> > > #3  0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6
>> > > #4  0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
>> > > #5  0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
>> > > #6  0x00007ffff7c51a54 in ceph::__ceph_assert_fail
>> > > (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1",
>> > >     file=0x7ffff7ce1431 "client/Client.cc", line=2190,
>> > > func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)")
>> > >     at common/assert.cc:86
>> > > #7  0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50,
>> > > onfinish=0x7fffe8000b60) at client/Client.cc:2190
>> > > #8  0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090,
>> > > in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0)
>> > >     at client/Client.cc:2930
>> > > #9  0x00007ffff7b52d2a in Client::handle_caps (this=0x629090,
>> > > m=0x631fc0) at client/Client.cc:2711
>> > > #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090,
>> > > m=0x631fc0) at client/Client.cc:1444
>> > > #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch
>> > > (this=0x628350, m=0x631fc0) at msg/Messenger.h:98
>> > > #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry
>> > > (this=0x628350) at msg/SimpleMessenger.cc:352
>> > > #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry
>> > > (this=0x6287d8) at msg/SimpleMessenger.h:533
>> > > #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at
>> > > ./common/Thread.h:41
>> > > #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0
>> > > #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6
>> > > ============================================================
>> > >
>> > > In addition, when writing data for a long time, the write thread will
>> > > hang on a pthread_cond_wait, as back trace 2 showed:
>> > > ==========================  back trace 2  ==================================
>> > > Thread 10 (Thread 0x43f87940 (LWP 19986)):
>> > > #0  0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from
>> > > /lib64/libpthread.so.0
>> > > #1  0x00007f147a8e763b in Cond::Wait (this=0x43f86b00,
>> > > mutex=@0x66d790) at ./common/Cond.h:46
>> > > #2  0x00007f147a86630d in Client::wait_on_list (this=0x66d430,
>> > > ls=@0x674df0) at client/Client.cc:2140
>> > > #3  0x00007f147a8771ff in Client::get_caps (this=0x66d430,
>> > > in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848)
>> > >     at client/Client.cc:1827
>> > > #4  0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510,
>> > > offset=235470336, size=512,
>> > >     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
>> > > times>...) at client/Client.cc:5055
>> > > #5  0x00007f147a886d73 in Client::write (this=0x66d430, fd=10,
>> > >     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
>> > > times>..., size=512, offset=235470336) at client/Client.cc:5007
>> > > #6  0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 'ÿÿ' <repeats
>> > > 33 times>, ".001", 'ÿÿ' <repeats 165 times>..., size=512,
>> > >     offset=235470336) at libceph.cc:322
>> > > #7  0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30,
>> > > offset=235470336, length=512,
>> > >     buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165
>> > > times>..., queue=@0x644b60, io=0x6456a0)
>> > >     at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191
>> > > #8  0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30,
>> > > io=0x6456a0, queue=@0x644b60, netfd=5)
>> > >     at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268
>> > > #9  0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at
>> > > /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390
>> > > #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0
>> > > #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6
>> > > ============================================================
>> > >
>> > > Seem not very safe with the patch.
>> > >
>> > > So I applied this patch:
>> > > diff --git a/src/client/Client.cc b/src/client/Client.cc
>> > > index 7f7fb08..bf0997a 100644
>> > > --- a/src/client/Client.cc
>> > > +++ b/src/client/Client.cc
>> > > @@ -4934,7 +4934,6 @@ public:
>> > >
>> > >  void Client::sync_write_commit(Inode *in)
>> > >  {
>> > > -  client_lock.Lock();
>> > > + int r = client_lock.Lock();
>> > >   assert(unsafe_sync_write > 0);
>> > >   unsafe_sync_write--;
>> > >
>> > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
>> > >   }
>> > >
>> > >   put_inode(in);
>> > > + if (r)
>> > >   client_lock.Unlock();
>> > >  }
>> > >
>> > >  int Client::write(int fd, const char *buf, loff_t size, loff_t offset)
>> > >
>> > >
>> > > After some test, back trace 1 didn't appear, but back trace 2 still appear.
>> > > Hmm,, not safe also.
>> > >
>> > > Thx!
>> > > Simon
>> > >
>> > >
>> > >
>> > > 2011/5/19 Sage Weil <sage@newdream.net>:
>> > > > Hi Simon,
>> > > >
>> > > > On Wed, 18 May 2011, Simon Tian wrote:
>> > > >
>> > > >> Hi,
>> > > >>
>> > > >>     Could any one give me an answer?
>> > > >>
>> > > >>    My application need to open one file with two clients in different
>> > > >> hosts. One for write/read and one for read only. The client could be
>> > > >> developed based on libceph or librbd.
>> > > >>    I tried librbd, exception appear too.
>> > > >
>> > > > I opened a bug for this, http://tracker.newdream.net/issues/1097.
>> > > >
>> > > > Can you try the patch below?  I this may just be something missed in a
>> > > > locking rewrite way back when in a rare code path:
>> > > >
>> > > > Thanks!
>> > > > sage
>> > > >
>> > > >
>> > > >
>> > > > diff --git a/src/client/Client.cc b/src/client/Client.cc
>> > > > index 7f7fb08..bf0997a 100644
>> > > > --- a/src/client/Client.cc
>> > > > +++ b/src/client/Client.cc
>> > > > @@ -4934,7 +4934,6 @@ public:
>> > > >
>> > > >  void Client::sync_write_commit(Inode *in)
>> > > >  {
>> > > > -  client_lock.Lock();
>> > > >   assert(unsafe_sync_write > 0);
>> > > >   unsafe_sync_write--;
>> > > >
>> > > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in)
>> > > >   }
>> > > >
>> > > >   put_inode(in);
>> > > > -
>> > > > -  client_lock.Unlock();
>> > > >  }
>> > > >
>> > > >  int Client::write(int fd, const char *buf, loff_t size, loff_t offset)
>> > > >
>> > > >
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > > the body of a message to majordomo@vger.kernel.org
>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > >
>> > >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-05-20  8:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-17 13:00 operate one file in multi clients with libceph Simon Tian
2011-05-18 14:40 ` Simon Tian
2011-05-18 15:26   ` Brian Chrisman
2011-05-19  6:48     ` Simon Tian
2011-05-18 16:01   ` Sage Weil
2011-05-19  6:58     ` Simon Tian
2011-05-19 17:11       ` Sage Weil
2011-05-19 17:20         ` Sage Weil
2011-05-19 21:45           ` Sage Weil
2011-05-20  8:56             ` Simon Tian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.