All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Deadlock in ceph journal
       [not found] <755F6B91B3BE364F9BCA11EA3F9E0C6F2646AE7B@SACMBXIP02.sdcorp.global.sandisk.com>
@ 2014-08-20  3:55 ` Sage Weil
  2014-08-20  4:38   ` Somnath Roy
  2014-08-20  4:49   ` Somnath Roy
  0 siblings, 2 replies; 22+ messages in thread
From: Sage Weil @ 2014-08-20  3:55 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Samuel Just (sam.just@inktank.com),
	ceph-devel, Mark Kirkwood, jianpeng.ma

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7262 bytes --]

[Copying ceph-devel, dropping ceph-users]

Yeah, that looks like a bug.  I pushed wip-filejournal that reapplies 
Jianpeng's original patch and this one.  I'm not certain about last other 
suggested fix, though, but I'm hoping that this fix explains the strange 
behavior Jianpeng and Mark have seen?

sage


On Wed, 20 Aug 2014, Somnath Roy wrote:
> 
> I think this is the issue..
> 
>  
> 
> http://tracker.ceph.com/issues/9073
> 
>  
> 
> Thanks & Regards
> 
> Somnath
> 
>  
> 
> From: Somnath Roy
> Sent: Tuesday, August 19, 2014 6:25 PM
> To: Sage Weil (sage@inktank.com); Samuel Just (sam.just@inktank.com)
> Cc: ceph-users@lists.ceph.com
> Subject: Deadlock in ceph journal
> 
>  
> 
> Hi Sage/Sam,
> 
> During our testing we found a potential deadlock scenario in the filestore
> journal code base. This is happening because of two reason.
> 
>  
> 
> 1.       This is because code is not signaling aio_cond from
> check_aio_completion() in case seq = 0
> 
> 2.       Following changes in the write_thread_entry() is allowing a very
> first header write with seq = 0.
> 
>                if (writeq.empty() && !must_write_header) {
> 
>  
> 
>  
> 
> Now, during ceph-deploy activate this is what happening.
> 
>  
> 
> 1. The very first write of header with seq = 0 issued and it is waiting for
> aio completion. So, aio_num = 1.
> 
> 2. superblock write came in and got into while (aio_num > 0) block of
> write_thread_entry() and waiting on the aio_cond
> 
> 3. The seq = 0 aio completed but not setting completed_something = true and
> as a result aio_cond is not signaled.
> 
> 4. write_thread_entry() is getting into deadlock.
> 
>  
> 
> This is a timing problem and if header write is returned before superblock
> write this will not happen and will be happening in case of block journal
> device only (aio is enabled).
> 
>  
> 
> Here is the log snippet we are getting.
> 
>  
> 
> 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal write_thread_entry start
> 
> 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal prepare_multi_write
> queue_pos now 4096
> 
> 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write writing
> 4096~0 + header
> 
> 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl 0~4096 seq 0
> 
> 2014-08-19 12:59:10.029442 7f60f9339700 10 journal write_finish_thread_entry
> enter
> 
> 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl .. 0~4096 in
> 1
> 
> 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl 4096~0 seq 0
> 
> 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle finished 0
> ops and 0 bytes, now 0 ops and 0 bytes
> 
> 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal write_thread_entry going
> to sleep
> 
> 2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
> 
> 2014-08-19 12:59:10.029566 7f60f9339700 20 journal write_finish_thread_entry
> waiting for aio(s)
> 
> 2014-08-19 12:59:10.029726 7f60ff178800 15
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> meta/23c2fcde/osd_superblock/0//-1 0~0
> 
> 2014-08-19 12:59:10.029793 7f60ff178800 -1
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
> 
> 2014-08-19 12:59:10.029815 7f60ff178800 10
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error: (2) No such
> file or directory
> 
> 2014-08-19 12:59:10.029892 7f60ff178800  5
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new osr(default
> 0x42ea9f0)/0x42ea9f0
> 
> 2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start 2
> 
> 2014-08-19 12:59:10.030009 7f60ff178800  5
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions (writeahead) 2
> 0x7fff6e817080
> 
> 2014-08-19 12:59:10.030028 7f60ff178800 10 journal op_journal_transactions 2
> 0x7fff6e817080
> 
> 2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry seq 2 len
> 505 (0x42a76f0)
> 
> 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal write_thread_entry woke
> up
> 
> 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal write_thread_entry aio
> throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
> 
> 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal write_thread_entry
> deferring until more aios complete: 1 aios with 4096 bytes needs 4 bytes to
> start a new aio (currently 0 pending)
> 
> 2014-08-19 12:59:10.030084 7f60ff178800 10 journal op_submit_finish 2
> 
> 2014-08-19 12:59:10.030389 7f60f9339700 10 journal write_finish_thread_entry
> aio 0~4096 done
> 
> 2014-08-19 12:59:10.030402 7f60f9339700 20 journal check_aio_completion
> 
> 2014-08-19 12:59:10.030406 7f60f9339700 20 journal check_aio_completion
> completed seq 0 0~4096
> 
> 2014-08-19 12:59:10.030412 7f60f9339700 20 journal write_finish_thread_entry
> sleeping
> 
> 2014-08-19 12:59:15.026609 7f60fab3c700 20
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000459
> 
> 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start
> max_applied_seq 1, open_ops 0
> 
> 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start blocked, all
> open_ops have completed
> 
> 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start nothing to
> do
> 
> 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
> 
> 2014-08-19 12:59:15.026691 7f60fab3c700 20
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for max_interval
> 5.000000
> 
> 2014-08-19 12:59:20.026826 7f60fab3c700 20
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000135
> 
> 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start
> max_applied_seq 1, open_ops 0
> 
> 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal commit_start blocked, all
> open_ops have completed
> 
> 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal commit_start nothing to
> do
> 
> 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal commit_start
> 
>  
> 
>  
> 
> Could you please confirm this as a valid defect ?
> 
>  
> 
> If so, sending a signal on aio_cond in case of seq = 0, could be the
> solution ?
> 
>  
> 
> Please let me know if there is any potential workaround for this while
> deploying with ceph-deploy. Will ceph-deploy accept file path as journal ?
> 
>  
> 
> Thanks & Regards
> 
> Somnath
> 
> 
> ____________________________________________________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
> 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-20  3:55 ` Deadlock in ceph journal Sage Weil
@ 2014-08-20  4:38   ` Somnath Roy
  2014-08-20  4:50     ` Sage Weil
  2014-08-20  4:58     ` Mark Kirkwood
  2014-08-20  4:49   ` Somnath Roy
  1 sibling, 2 replies; 22+ messages in thread
From: Somnath Roy @ 2014-08-20  4:38 UTC (permalink / raw)
  To: Sage Weil
  Cc: Samuel Just (sam.just@inktank.com),
	ceph-devel, Mark Kirkwood, jianpeng.ma

Thanks Sage !
So, the latest master should have the fix, right ?

Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com]
Sent: Tuesday, August 19, 2014 8:55 PM
To: Somnath Roy
Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
Subject: RE: Deadlock in ceph journal

[Copying ceph-devel, dropping ceph-users]

Yeah, that looks like a bug.  I pushed wip-filejournal that reapplies Jianpeng's original patch and this one.  I'm not certain about last other suggested fix, though, but I'm hoping that this fix explains the strange behavior Jianpeng and Mark have seen?

sage


On Wed, 20 Aug 2014, Somnath Roy wrote:
>
> I think this is the issue..
>
>
>
> http://tracker.ceph.com/issues/9073
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> From: Somnath Roy
> Sent: Tuesday, August 19, 2014 6:25 PM
> To: Sage Weil (sage@inktank.com); Samuel Just (sam.just@inktank.com)
> Cc: ceph-users@lists.ceph.com
> Subject: Deadlock in ceph journal
>
>
>
> Hi Sage/Sam,
>
> During our testing we found a potential deadlock scenario in the
> filestore journal code base. This is happening because of two reason.
>
>
>
> 1.       This is because code is not signaling aio_cond from
> check_aio_completion() in case seq = 0
>
> 2.       Following changes in the write_thread_entry() is allowing a
> very first header write with seq = 0.
>
>                if (writeq.empty() && !must_write_header) {
>
>
>
>
>
> Now, during ceph-deploy activate this is what happening.
>
>
>
> 1. The very first write of header with seq = 0 issued and it is
> waiting for aio completion. So, aio_num = 1.
>
> 2. superblock write came in and got into while (aio_num > 0) block of
> write_thread_entry() and waiting on the aio_cond
>
> 3. The seq = 0 aio completed but not setting completed_something =
> true and as a result aio_cond is not signaled.
>
> 4. write_thread_entry() is getting into deadlock.
>
>
>
> This is a timing problem and if header write is returned before
> superblock write this will not happen and will be happening in case of
> block journal device only (aio is enabled).
>
>
>
> Here is the log snippet we are getting.
>
>
>
> 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal write_thread_entry
> start
>
> 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal prepare_multi_write
> queue_pos now 4096
>
> 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write
> writing
> 4096~0 + header
>
> 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl 0~4096
> seq 0
>
> 2014-08-19 12:59:10.029442 7f60f9339700 10 journal
> write_finish_thread_entry enter
>
> 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl ..
> 0~4096 in
> 1
>
> 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl 4096~0
> seq 0
>
> 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle
> finished 0 ops and 0 bytes, now 0 ops and 0 bytes
>
> 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal write_thread_entry
> going to sleep
>
> 2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
>
> 2014-08-19 12:59:10.029566 7f60f9339700 20 journal
> write_finish_thread_entry waiting for aio(s)
>
> 2014-08-19 12:59:10.029726 7f60ff178800 15
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> meta/23c2fcde/osd_superblock/0//-1 0~0
>
> 2014-08-19 12:59:10.029793 7f60ff178800 -1
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>
> 2014-08-19 12:59:10.029815 7f60ff178800 10
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error: (2) No
> such file or directory
>
> 2014-08-19 12:59:10.029892 7f60ff178800  5
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new
> osr(default
> 0x42ea9f0)/0x42ea9f0
>
> 2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start 2
>
> 2014-08-19 12:59:10.030009 7f60ff178800  5
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions
> (writeahead) 2
> 0x7fff6e817080
>
> 2014-08-19 12:59:10.030028 7f60ff178800 10 journal
> op_journal_transactions 2
> 0x7fff6e817080
>
> 2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry seq 2
> len
> 505 (0x42a76f0)
>
> 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal write_thread_entry
> woke up
>
> 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal write_thread_entry
> aio
> throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
>
> 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal write_thread_entry
> deferring until more aios complete: 1 aios with 4096 bytes needs 4
> bytes to start a new aio (currently 0 pending)
>
> 2014-08-19 12:59:10.030084 7f60ff178800 10 journal op_submit_finish 2
>
> 2014-08-19 12:59:10.030389 7f60f9339700 10 journal
> write_finish_thread_entry aio 0~4096 done
>
> 2014-08-19 12:59:10.030402 7f60f9339700 20 journal
> check_aio_completion
>
> 2014-08-19 12:59:10.030406 7f60f9339700 20 journal
> check_aio_completion completed seq 0 0~4096
>
> 2014-08-19 12:59:10.030412 7f60f9339700 20 journal
> write_finish_thread_entry sleeping
>
> 2014-08-19 12:59:15.026609 7f60fab3c700 20
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000459
>
> 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start
> max_applied_seq 1, open_ops 0
>
> 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start
> blocked, all open_ops have completed
>
> 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start
> nothing to do
>
> 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
>
> 2014-08-19 12:59:15.026691 7f60fab3c700 20
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for
> max_interval
> 5.000000
>
> 2014-08-19 12:59:20.026826 7f60fab3c700 20
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000135
>
> 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start
> max_applied_seq 1, open_ops 0
>
> 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal commit_start
> blocked, all open_ops have completed
>
> 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal commit_start
> nothing to do
>
> 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal commit_start
>
>
>
>
>
> Could you please confirm this as a valid defect ?
>
>
>
> If so, sending a signal on aio_cond in case of seq = 0, could be the
> solution ?
>
>
>
> Please let me know if there is any potential workaround for this while
> deploying with ceph-deploy. Will ceph-deploy accept file path as journal ?
>
>
>
> Thanks & Regards
>
> Somnath
>
>
> ______________________________________________________________________
> ______
>
> PLEASE NOTE: The information contained in this electronic mail message
> is intended only for the use of the designated recipient(s) named
> above. If the reader of this message is not the intended recipient,
> you are hereby notified that you have received this message in error
> and that any review, dissemination, distribution, or copying of this
> message is strictly prohibited. If you have received this
> communication in error, please notify the sender by telephone or
> e-mail (as shown above) immediately and destroy any and all copies of
> this message in your possession (whether hard copies or electronically stored copies).
>
>
>

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-20  3:55 ` Deadlock in ceph journal Sage Weil
  2014-08-20  4:38   ` Somnath Roy
@ 2014-08-20  4:49   ` Somnath Roy
  1 sibling, 0 replies; 22+ messages in thread
From: Somnath Roy @ 2014-08-20  4:49 UTC (permalink / raw)
  To: Sage Weil
  Cc: Samuel Just (sam.just@inktank.com),
	ceph-devel, Mark Kirkwood, jianpeng.ma

I got it, not yet there in master.

https://github.com/ceph/ceph/commit/b40ddc5dcf95b4849706314b34e72b607629773f

Sorry for the confusion.

Thanks & Regards
Somnath

-----Original Message-----
From: Somnath Roy
Sent: Tuesday, August 19, 2014 9:38 PM
To: 'Sage Weil'
Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
Subject: RE: Deadlock in ceph journal

Thanks Sage !
So, the latest master should have the fix, right ?

Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com]
Sent: Tuesday, August 19, 2014 8:55 PM
To: Somnath Roy
Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
Subject: RE: Deadlock in ceph journal

[Copying ceph-devel, dropping ceph-users]

Yeah, that looks like a bug.  I pushed wip-filejournal that reapplies Jianpeng's original patch and this one.  I'm not certain about last other suggested fix, though, but I'm hoping that this fix explains the strange behavior Jianpeng and Mark have seen?

sage


On Wed, 20 Aug 2014, Somnath Roy wrote:
>
> I think this is the issue..
>
>
>
> http://tracker.ceph.com/issues/9073
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> From: Somnath Roy
> Sent: Tuesday, August 19, 2014 6:25 PM
> To: Sage Weil (sage@inktank.com); Samuel Just (sam.just@inktank.com)
> Cc: ceph-users@lists.ceph.com
> Subject: Deadlock in ceph journal
>
>
>
> Hi Sage/Sam,
>
> During our testing we found a potential deadlock scenario in the
> filestore journal code base. This is happening because of two reason.
>
>
>
> 1.       This is because code is not signaling aio_cond from
> check_aio_completion() in case seq = 0
>
> 2.       Following changes in the write_thread_entry() is allowing a
> very first header write with seq = 0.
>
>                if (writeq.empty() && !must_write_header) {
>
>
>
>
>
> Now, during ceph-deploy activate this is what happening.
>
>
>
> 1. The very first write of header with seq = 0 issued and it is
> waiting for aio completion. So, aio_num = 1.
>
> 2. superblock write came in and got into while (aio_num > 0) block of
> write_thread_entry() and waiting on the aio_cond
>
> 3. The seq = 0 aio completed but not setting completed_something =
> true and as a result aio_cond is not signaled.
>
> 4. write_thread_entry() is getting into deadlock.
>
>
>
> This is a timing problem and if header write is returned before
> superblock write this will not happen and will be happening in case of
> block journal device only (aio is enabled).
>
>
>
> Here is the log snippet we are getting.
>
>
>
> 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal write_thread_entry
> start
>
> 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal prepare_multi_write
> queue_pos now 4096
>
> 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write
> writing
> 4096~0 + header
>
> 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl 0~4096
> seq 0
>
> 2014-08-19 12:59:10.029442 7f60f9339700 10 journal
> write_finish_thread_entry enter
>
> 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl ..
> 0~4096 in
> 1
>
> 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl 4096~0
> seq 0
>
> 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle
> finished 0 ops and 0 bytes, now 0 ops and 0 bytes
>
> 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal write_thread_entry
> going to sleep
>
> 2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
>
> 2014-08-19 12:59:10.029566 7f60f9339700 20 journal
> write_finish_thread_entry waiting for aio(s)
>
> 2014-08-19 12:59:10.029726 7f60ff178800 15
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> meta/23c2fcde/osd_superblock/0//-1 0~0
>
> 2014-08-19 12:59:10.029793 7f60ff178800 -1
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>
> 2014-08-19 12:59:10.029815 7f60ff178800 10
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error: (2) No
> such file or directory
>
> 2014-08-19 12:59:10.029892 7f60ff178800  5
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new
> osr(default
> 0x42ea9f0)/0x42ea9f0
>
> 2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start 2
>
> 2014-08-19 12:59:10.030009 7f60ff178800  5
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions
> (writeahead) 2
> 0x7fff6e817080
>
> 2014-08-19 12:59:10.030028 7f60ff178800 10 journal
> op_journal_transactions 2
> 0x7fff6e817080
>
> 2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry seq 2
> len
> 505 (0x42a76f0)
>
> 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal write_thread_entry
> woke up
>
> 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal write_thread_entry
> aio
> throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
>
> 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal write_thread_entry
> deferring until more aios complete: 1 aios with 4096 bytes needs 4
> bytes to start a new aio (currently 0 pending)
>
> 2014-08-19 12:59:10.030084 7f60ff178800 10 journal op_submit_finish 2
>
> 2014-08-19 12:59:10.030389 7f60f9339700 10 journal
> write_finish_thread_entry aio 0~4096 done
>
> 2014-08-19 12:59:10.030402 7f60f9339700 20 journal
> check_aio_completion
>
> 2014-08-19 12:59:10.030406 7f60f9339700 20 journal
> check_aio_completion completed seq 0 0~4096
>
> 2014-08-19 12:59:10.030412 7f60f9339700 20 journal
> write_finish_thread_entry sleeping
>
> 2014-08-19 12:59:15.026609 7f60fab3c700 20
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000459
>
> 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start
> max_applied_seq 1, open_ops 0
>
> 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start
> blocked, all open_ops have completed
>
> 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start
> nothing to do
>
> 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
>
> 2014-08-19 12:59:15.026691 7f60fab3c700 20
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for
> max_interval
> 5.000000
>
> 2014-08-19 12:59:20.026826 7f60fab3c700 20
> filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000135
>
> 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start
> max_applied_seq 1, open_ops 0
>
> 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal commit_start
> blocked, all open_ops have completed
>
> 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal commit_start
> nothing to do
>
> 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal commit_start
>
>
>
>
>
> Could you please confirm this as a valid defect ?
>
>
>
> If so, sending a signal on aio_cond in case of seq = 0, could be the
> solution ?
>
>
>
> Please let me know if there is any potential workaround for this while
> deploying with ceph-deploy. Will ceph-deploy accept file path as journal ?
>
>
>
> Thanks & Regards
>
> Somnath
>
>
> ______________________________________________________________________
> ______
>
> PLEASE NOTE: The information contained in this electronic mail message
> is intended only for the use of the designated recipient(s) named
> above. If the reader of this message is not the intended recipient,
> you are hereby notified that you have received this message in error
> and that any review, dissemination, distribution, or copying of this
> message is strictly prohibited. If you have received this
> communication in error, please notify the sender by telephone or
> e-mail (as shown above) immediately and destroy any and all copies of
> this message in your possession (whether hard copies or electronically stored copies).
>
>
>

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-20  4:38   ` Somnath Roy
@ 2014-08-20  4:50     ` Sage Weil
  2014-08-20  4:52       ` Somnath Roy
  2014-08-20  4:58     ` Mark Kirkwood
  1 sibling, 1 reply; 22+ messages in thread
From: Sage Weil @ 2014-08-20  4:50 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Samuel Just (sam.just@inktank.com),
	ceph-devel, Mark Kirkwood, jianpeng.ma

On Wed, 20 Aug 2014, Somnath Roy wrote:
> Thanks Sage !
> So, the latest master should have the fix, right ?

The original patch that caused the regression is reverted, but we'd like 
to reapply it if we sort out the issues.  wip-filejournal has the 
offending patch and your fix.. but I'm eager to hear if Jianpeng and Mark 
can confirm it's complete/correct or if there is still a problem.

sage

> 
> Regards
> Somnath
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Tuesday, August 19, 2014 8:55 PM
> To: Somnath Roy
> Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
> Subject: RE: Deadlock in ceph journal
> 
> [Copying ceph-devel, dropping ceph-users]
> 
> Yeah, that looks like a bug.  I pushed wip-filejournal that reapplies Jianpeng's original patch and this one.  I'm not certain about last other suggested fix, though, but I'm hoping that this fix explains the strange behavior Jianpeng and Mark have seen?
> 
> sage
> 
> 
> On Wed, 20 Aug 2014, Somnath Roy wrote:
> >
> > I think this is the issue..
> >
> >
> >
> > http://tracker.ceph.com/issues/9073
> >
> >
> >
> > Thanks & Regards
> >
> > Somnath
> >
> >
> >
> > From: Somnath Roy
> > Sent: Tuesday, August 19, 2014 6:25 PM
> > To: Sage Weil (sage@inktank.com); Samuel Just (sam.just@inktank.com)
> > Cc: ceph-users@lists.ceph.com
> > Subject: Deadlock in ceph journal
> >
> >
> >
> > Hi Sage/Sam,
> >
> > During our testing we found a potential deadlock scenario in the
> > filestore journal code base. This is happening because of two reason.
> >
> >
> >
> > 1.       This is because code is not signaling aio_cond from
> > check_aio_completion() in case seq = 0
> >
> > 2.       Following changes in the write_thread_entry() is allowing a
> > very first header write with seq = 0.
> >
> >                if (writeq.empty() && !must_write_header) {
> >
> >
> >
> >
> >
> > Now, during ceph-deploy activate this is what happening.
> >
> >
> >
> > 1. The very first write of header with seq = 0 issued and it is
> > waiting for aio completion. So, aio_num = 1.
> >
> > 2. superblock write came in and got into while (aio_num > 0) block of
> > write_thread_entry() and waiting on the aio_cond
> >
> > 3. The seq = 0 aio completed but not setting completed_something =
> > true and as a result aio_cond is not signaled.
> >
> > 4. write_thread_entry() is getting into deadlock.
> >
> >
> >
> > This is a timing problem and if header write is returned before
> > superblock write this will not happen and will be happening in case of
> > block journal device only (aio is enabled).
> >
> >
> >
> > Here is the log snippet we are getting.
> >
> >
> >
> > 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal write_thread_entry
> > start
> >
> > 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal prepare_multi_write
> > queue_pos now 4096
> >
> > 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write
> > writing
> > 4096~0 + header
> >
> > 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl 0~4096
> > seq 0
> >
> > 2014-08-19 12:59:10.029442 7f60f9339700 10 journal
> > write_finish_thread_entry enter
> >
> > 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl ..
> > 0~4096 in
> > 1
> >
> > 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl 4096~0
> > seq 0
> >
> > 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle
> > finished 0 ops and 0 bytes, now 0 ops and 0 bytes
> >
> > 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal write_thread_entry
> > going to sleep
> >
> > 2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
> >
> > 2014-08-19 12:59:10.029566 7f60f9339700 20 journal
> > write_finish_thread_entry waiting for aio(s)
> >
> > 2014-08-19 12:59:10.029726 7f60ff178800 15
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> > meta/23c2fcde/osd_superblock/0//-1 0~0
> >
> > 2014-08-19 12:59:10.029793 7f60ff178800 -1
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> > 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
> >
> > 2014-08-19 12:59:10.029815 7f60ff178800 10
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> > FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error: (2) No
> > such file or directory
> >
> > 2014-08-19 12:59:10.029892 7f60ff178800  5
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new
> > osr(default
> > 0x42ea9f0)/0x42ea9f0
> >
> > 2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start 2
> >
> > 2014-08-19 12:59:10.030009 7f60ff178800  5
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions
> > (writeahead) 2
> > 0x7fff6e817080
> >
> > 2014-08-19 12:59:10.030028 7f60ff178800 10 journal
> > op_journal_transactions 2
> > 0x7fff6e817080
> >
> > 2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry seq 2
> > len
> > 505 (0x42a76f0)
> >
> > 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal write_thread_entry
> > woke up
> >
> > 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal write_thread_entry
> > aio
> > throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
> >
> > 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal write_thread_entry
> > deferring until more aios complete: 1 aios with 4096 bytes needs 4
> > bytes to start a new aio (currently 0 pending)
> >
> > 2014-08-19 12:59:10.030084 7f60ff178800 10 journal op_submit_finish 2
> >
> > 2014-08-19 12:59:10.030389 7f60f9339700 10 journal
> > write_finish_thread_entry aio 0~4096 done
> >
> > 2014-08-19 12:59:10.030402 7f60f9339700 20 journal
> > check_aio_completion
> >
> > 2014-08-19 12:59:10.030406 7f60f9339700 20 journal
> > check_aio_completion completed seq 0 0~4096
> >
> > 2014-08-19 12:59:10.030412 7f60f9339700 20 journal
> > write_finish_thread_entry sleeping
> >
> > 2014-08-19 12:59:15.026609 7f60fab3c700 20
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000459
> >
> > 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start
> > max_applied_seq 1, open_ops 0
> >
> > 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start
> > blocked, all open_ops have completed
> >
> > 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start
> > nothing to do
> >
> > 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
> >
> > 2014-08-19 12:59:15.026691 7f60fab3c700 20
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for
> > max_interval
> > 5.000000
> >
> > 2014-08-19 12:59:20.026826 7f60fab3c700 20
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 5.000135
> >
> > 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start
> > max_applied_seq 1, open_ops 0
> >
> > 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal commit_start
> > blocked, all open_ops have completed
> >
> > 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal commit_start
> > nothing to do
> >
> > 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal commit_start
> >
> >
> >
> >
> >
> > Could you please confirm this as a valid defect ?
> >
> >
> >
> > If so, sending a signal on aio_cond in case of seq = 0, could be the
> > solution ?
> >
> >
> >
> > Please let me know if there is any potential workaround for this while
> > deploying with ceph-deploy. Will ceph-deploy accept file path as journal ?
> >
> >
> >
> > Thanks & Regards
> >
> > Somnath
> >
> >
> > ______________________________________________________________________
> > ______
> >
> > PLEASE NOTE: The information contained in this electronic mail message
> > is intended only for the use of the designated recipient(s) named
> > above. If the reader of this message is not the intended recipient,
> > you are hereby notified that you have received this message in error
> > and that any review, dissemination, distribution, or copying of this
> > message is strictly prohibited. If you have received this
> > communication in error, please notify the sender by telephone or
> > e-mail (as shown above) immediately and destroy any and all copies of
> > this message in your possession (whether hard copies or electronically stored copies).
> >
> >
> >
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-20  4:50     ` Sage Weil
@ 2014-08-20  4:52       ` Somnath Roy
  2014-08-20 15:33         ` Sage Weil
  0 siblings, 1 reply; 22+ messages in thread
From: Somnath Roy @ 2014-08-20  4:52 UTC (permalink / raw)
  To: Sage Weil
  Cc: Samuel Just (sam.just@inktank.com),
	ceph-devel, Mark Kirkwood, jianpeng.ma

I will also take the patch and test it out.

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com] 
Sent: Tuesday, August 19, 2014 9:51 PM
To: Somnath Roy
Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
Subject: RE: Deadlock in ceph journal

On Wed, 20 Aug 2014, Somnath Roy wrote:
> Thanks Sage !
> So, the latest master should have the fix, right ?

The original patch that caused the regression is reverted, but we'd like to reapply it if we sort out the issues.  wip-filejournal has the offending patch and your fix.. but I'm eager to hear if Jianpeng and Mark can confirm it's complete/correct or if there is still a problem.

sage

> 
> Regards
> Somnath
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Tuesday, August 19, 2014 8:55 PM
> To: Somnath Roy
> Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; 
> Mark Kirkwood; jianpeng.ma@intel.com
> Subject: RE: Deadlock in ceph journal
> 
> [Copying ceph-devel, dropping ceph-users]
> 
> Yeah, that looks like a bug.  I pushed wip-filejournal that reapplies Jianpeng's original patch and this one.  I'm not certain about last other suggested fix, though, but I'm hoping that this fix explains the strange behavior Jianpeng and Mark have seen?
> 
> sage
> 
> 
> On Wed, 20 Aug 2014, Somnath Roy wrote:
> >
> > I think this is the issue..
> >
> >
> >
> > http://tracker.ceph.com/issues/9073
> >
> >
> >
> > Thanks & Regards
> >
> > Somnath
> >
> >
> >
> > From: Somnath Roy
> > Sent: Tuesday, August 19, 2014 6:25 PM
> > To: Sage Weil (sage@inktank.com); Samuel Just (sam.just@inktank.com)
> > Cc: ceph-users@lists.ceph.com
> > Subject: Deadlock in ceph journal
> >
> >
> >
> > Hi Sage/Sam,
> >
> > During our testing we found a potential deadlock scenario in the 
> > filestore journal code base. This is happening because of two reason.
> >
> >
> >
> > 1.       This is because code is not signaling aio_cond from
> > check_aio_completion() in case seq = 0
> >
> > 2.       Following changes in the write_thread_entry() is allowing a
> > very first header write with seq = 0.
> >
> >                if (writeq.empty() && !must_write_header) {
> >
> >
> >
> >
> >
> > Now, during ceph-deploy activate this is what happening.
> >
> >
> >
> > 1. The very first write of header with seq = 0 issued and it is 
> > waiting for aio completion. So, aio_num = 1.
> >
> > 2. superblock write came in and got into while (aio_num > 0) block 
> > of
> > write_thread_entry() and waiting on the aio_cond
> >
> > 3. The seq = 0 aio completed but not setting completed_something = 
> > true and as a result aio_cond is not signaled.
> >
> > 4. write_thread_entry() is getting into deadlock.
> >
> >
> >
> > This is a timing problem and if header write is returned before 
> > superblock write this will not happen and will be happening in case 
> > of block journal device only (aio is enabled).
> >
> >
> >
> > Here is the log snippet we are getting.
> >
> >
> >
> > 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal 
> > write_thread_entry start
> >
> > 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal 
> > prepare_multi_write queue_pos now 4096
> >
> > 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write 
> > writing
> > 4096~0 + header
> >
> > 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl 
> > 0~4096 seq 0
> >
> > 2014-08-19 12:59:10.029442 7f60f9339700 10 journal 
> > write_finish_thread_entry enter
> >
> > 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl ..
> > 0~4096 in
> > 1
> >
> > 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl 
> > 4096~0 seq 0
> >
> > 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle 
> > finished 0 ops and 0 bytes, now 0 ops and 0 bytes
> >
> > 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal 
> > write_thread_entry going to sleep
> >
> > 2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
> >
> > 2014-08-19 12:59:10.029566 7f60f9339700 20 journal 
> > write_finish_thread_entry waiting for aio(s)
> >
> > 2014-08-19 12:59:10.029726 7f60ff178800 15
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> > meta/23c2fcde/osd_superblock/0//-1 0~0
> >
> > 2014-08-19 12:59:10.029793 7f60ff178800 -1
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> > 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or 
> > directory
> >
> > 2014-08-19 12:59:10.029815 7f60ff178800 10
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> > FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error: (2) 
> > No such file or directory
> >
> > 2014-08-19 12:59:10.029892 7f60ff178800  5
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new 
> > osr(default
> > 0x42ea9f0)/0x42ea9f0
> >
> > 2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start 2
> >
> > 2014-08-19 12:59:10.030009 7f60ff178800  5
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions
> > (writeahead) 2
> > 0x7fff6e817080
> >
> > 2014-08-19 12:59:10.030028 7f60ff178800 10 journal 
> > op_journal_transactions 2
> > 0x7fff6e817080
> >
> > 2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry seq 
> > 2 len
> > 505 (0x42a76f0)
> >
> > 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal 
> > write_thread_entry woke up
> >
> > 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal 
> > write_thread_entry aio
> > throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
> >
> > 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal 
> > write_thread_entry deferring until more aios complete: 1 aios with 
> > 4096 bytes needs 4 bytes to start a new aio (currently 0 pending)
> >
> > 2014-08-19 12:59:10.030084 7f60ff178800 10 journal op_submit_finish 
> > 2
> >
> > 2014-08-19 12:59:10.030389 7f60f9339700 10 journal 
> > write_finish_thread_entry aio 0~4096 done
> >
> > 2014-08-19 12:59:10.030402 7f60f9339700 20 journal 
> > check_aio_completion
> >
> > 2014-08-19 12:59:10.030406 7f60f9339700 20 journal 
> > check_aio_completion completed seq 0 0~4096
> >
> > 2014-08-19 12:59:10.030412 7f60f9339700 20 journal 
> > write_finish_thread_entry sleeping
> >
> > 2014-08-19 12:59:15.026609 7f60fab3c700 20
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 
> > 5.000459
> >
> > 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start 
> > max_applied_seq 1, open_ops 0
> >
> > 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start 
> > blocked, all open_ops have completed
> >
> > 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start 
> > nothing to do
> >
> > 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
> >
> > 2014-08-19 12:59:15.026691 7f60fab3c700 20
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for 
> > max_interval
> > 5.000000
> >
> > 2014-08-19 12:59:20.026826 7f60fab3c700 20
> > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 
> > 5.000135
> >
> > 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start 
> > max_applied_seq 1, open_ops 0
> >
> > 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal commit_start 
> > blocked, all open_ops have completed
> >
> > 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal commit_start 
> > nothing to do
> >
> > 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal commit_start
> >
> >
> >
> >
> >
> > Could you please confirm this as a valid defect ?
> >
> >
> >
> > If so, sending a signal on aio_cond in case of seq = 0, could be the 
> > solution ?
> >
> >
> >
> > Please let me know if there is any potential workaround for this 
> > while deploying with ceph-deploy. Will ceph-deploy accept file path as journal ?
> >
> >
> >
> > Thanks & Regards
> >
> > Somnath
> >
> >
> > ____________________________________________________________________
> > __
> > ______
> >
> > PLEASE NOTE: The information contained in this electronic mail 
> > message is intended only for the use of the designated recipient(s) 
> > named above. If the reader of this message is not the intended 
> > recipient, you are hereby notified that you have received this 
> > message in error and that any review, dissemination, distribution, 
> > or copying of this message is strictly prohibited. If you have 
> > received this communication in error, please notify the sender by 
> > telephone or e-mail (as shown above) immediately and destroy any and 
> > all copies of this message in your possession (whether hard copies or electronically stored copies).
> >
> >
> >
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deadlock in ceph journal
  2014-08-20  4:38   ` Somnath Roy
  2014-08-20  4:50     ` Sage Weil
@ 2014-08-20  4:58     ` Mark Kirkwood
  2014-08-20  5:04       ` Mark Kirkwood
  1 sibling, 1 reply; 22+ messages in thread
From: Mark Kirkwood @ 2014-08-20  4:58 UTC (permalink / raw)
  To: Somnath Roy, Sage Weil
  Cc: Samuel Just (sam.just@inktank.com), ceph-devel, jianpeng.ma

Not yet,

If you have to use master either revert commit 
4eb18dd487da4cb621dcbecfc475fc0871b356ac or apply the patch for fixing 
the hang mentioned here https://github.com/ceph/ceph/pull/2185

Otherwise you could use the wip-filejournal branch which Sage has just 
added!

Cheers

Mark


On 20/08/14 16:38, Somnath Roy wrote:
> Thanks Sage !
> So, the latest master should have the fix, right ?
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deadlock in ceph journal
  2014-08-20  4:58     ` Mark Kirkwood
@ 2014-08-20  5:04       ` Mark Kirkwood
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Kirkwood @ 2014-08-20  5:04 UTC (permalink / raw)
  To: Somnath Roy, Sage Weil
  Cc: Samuel Just (sam.just@inktank.com), ceph-devel, jianpeng.ma

Sorry, I see that sage has reverted it.

On 20/08/14 16:58, Mark Kirkwood wrote:
> Not yet,
>
> If you have to use master either revert commit
> 4eb18dd487da4cb621dcbecfc475fc0871b356ac or apply the patch for fixing
> the hang mentioned here https://github.com/ceph/ceph/pull/2185
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-20  4:52       ` Somnath Roy
@ 2014-08-20 15:33         ` Sage Weil
  2014-08-21  1:54           ` Ma, Jianpeng
  0 siblings, 1 reply; 22+ messages in thread
From: Sage Weil @ 2014-08-20 15:33 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Samuel Just (sam.just@inktank.com),
	ceph-devel, Mark Kirkwood, jianpeng.ma

I suspect what is really needed is a drain_aio() function that will wait 
for all pending aio ops to complete on shutdown.  What happens to those 
IOs if the process exists while they are in flight is probably undefined; 
we should just avoid doing that.

sage


On Wed, 20 Aug 2014, Somnath Roy wrote:

> I will also take the patch and test it out.
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com] 
> Sent: Tuesday, August 19, 2014 9:51 PM
> To: Somnath Roy
> Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
> Subject: RE: Deadlock in ceph journal
> 
> On Wed, 20 Aug 2014, Somnath Roy wrote:
> > Thanks Sage !
> > So, the latest master should have the fix, right ?
> 
> The original patch that caused the regression is reverted, but we'd like to reapply it if we sort out the issues.  wip-filejournal has the offending patch and your fix.. but I'm eager to hear if Jianpeng and Mark can confirm it's complete/correct or if there is still a problem.
> 
> sage
> 
> > 
> > Regards
> > Somnath
> > 
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@redhat.com]
> > Sent: Tuesday, August 19, 2014 8:55 PM
> > To: Somnath Roy
> > Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; 
> > Mark Kirkwood; jianpeng.ma@intel.com
> > Subject: RE: Deadlock in ceph journal
> > 
> > [Copying ceph-devel, dropping ceph-users]
> > 
> > Yeah, that looks like a bug.  I pushed wip-filejournal that reapplies Jianpeng's original patch and this one.  I'm not certain about last other suggested fix, though, but I'm hoping that this fix explains the strange behavior Jianpeng and Mark have seen?
> > 
> > sage
> > 
> > 
> > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > >
> > > I think this is the issue..
> > >
> > >
> > >
> > > http://tracker.ceph.com/issues/9073
> > >
> > >
> > >
> > > Thanks & Regards
> > >
> > > Somnath
> > >
> > >
> > >
> > > From: Somnath Roy
> > > Sent: Tuesday, August 19, 2014 6:25 PM
> > > To: Sage Weil (sage@inktank.com); Samuel Just (sam.just@inktank.com)
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: Deadlock in ceph journal
> > >
> > >
> > >
> > > Hi Sage/Sam,
> > >
> > > During our testing we found a potential deadlock scenario in the 
> > > filestore journal code base. This is happening because of two reason.
> > >
> > >
> > >
> > > 1.       This is because code is not signaling aio_cond from
> > > check_aio_completion() in case seq = 0
> > >
> > > 2.       Following changes in the write_thread_entry() is allowing a
> > > very first header write with seq = 0.
> > >
> > >                if (writeq.empty() && !must_write_header) {
> > >
> > >
> > >
> > >
> > >
> > > Now, during ceph-deploy activate this is what happening.
> > >
> > >
> > >
> > > 1. The very first write of header with seq = 0 issued and it is 
> > > waiting for aio completion. So, aio_num = 1.
> > >
> > > 2. superblock write came in and got into while (aio_num > 0) block 
> > > of
> > > write_thread_entry() and waiting on the aio_cond
> > >
> > > 3. The seq = 0 aio completed but not setting completed_something = 
> > > true and as a result aio_cond is not signaled.
> > >
> > > 4. write_thread_entry() is getting into deadlock.
> > >
> > >
> > >
> > > This is a timing problem and if header write is returned before 
> > > superblock write this will not happen and will be happening in case 
> > > of block journal device only (aio is enabled).
> > >
> > >
> > >
> > > Here is the log snippet we are getting.
> > >
> > >
> > >
> > > 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal 
> > > write_thread_entry start
> > >
> > > 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal 
> > > prepare_multi_write queue_pos now 4096
> > >
> > > 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write 
> > > writing
> > > 4096~0 + header
> > >
> > > 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl 
> > > 0~4096 seq 0
> > >
> > > 2014-08-19 12:59:10.029442 7f60f9339700 10 journal 
> > > write_finish_thread_entry enter
> > >
> > > 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl ..
> > > 0~4096 in
> > > 1
> > >
> > > 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl 
> > > 4096~0 seq 0
> > >
> > > 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle 
> > > finished 0 ops and 0 bytes, now 0 ops and 0 bytes
> > >
> > > 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal 
> > > write_thread_entry going to sleep
> > >
> > > 2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
> > >
> > > 2014-08-19 12:59:10.029566 7f60f9339700 20 journal 
> > > write_finish_thread_entry waiting for aio(s)
> > >
> > > 2014-08-19 12:59:10.029726 7f60ff178800 15
> > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> > > meta/23c2fcde/osd_superblock/0//-1 0~0
> > >
> > > 2014-08-19 12:59:10.029793 7f60ff178800 -1
> > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> > > 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or 
> > > directory
> > >
> > > 2014-08-19 12:59:10.029815 7f60ff178800 10
> > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> > > FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error: (2) 
> > > No such file or directory
> > >
> > > 2014-08-19 12:59:10.029892 7f60ff178800  5
> > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new 
> > > osr(default
> > > 0x42ea9f0)/0x42ea9f0
> > >
> > > 2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start 2
> > >
> > > 2014-08-19 12:59:10.030009 7f60ff178800  5
> > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions
> > > (writeahead) 2
> > > 0x7fff6e817080
> > >
> > > 2014-08-19 12:59:10.030028 7f60ff178800 10 journal 
> > > op_journal_transactions 2
> > > 0x7fff6e817080
> > >
> > > 2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry seq 
> > > 2 len
> > > 505 (0x42a76f0)
> > >
> > > 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal 
> > > write_thread_entry woke up
> > >
> > > 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal 
> > > write_thread_entry aio
> > > throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
> > >
> > > 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal 
> > > write_thread_entry deferring until more aios complete: 1 aios with 
> > > 4096 bytes needs 4 bytes to start a new aio (currently 0 pending)
> > >
> > > 2014-08-19 12:59:10.030084 7f60ff178800 10 journal op_submit_finish 
> > > 2
> > >
> > > 2014-08-19 12:59:10.030389 7f60f9339700 10 journal 
> > > write_finish_thread_entry aio 0~4096 done
> > >
> > > 2014-08-19 12:59:10.030402 7f60f9339700 20 journal 
> > > check_aio_completion
> > >
> > > 2014-08-19 12:59:10.030406 7f60f9339700 20 journal 
> > > check_aio_completion completed seq 0 0~4096
> > >
> > > 2014-08-19 12:59:10.030412 7f60f9339700 20 journal 
> > > write_finish_thread_entry sleeping
> > >
> > > 2014-08-19 12:59:15.026609 7f60fab3c700 20
> > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 
> > > 5.000459
> > >
> > > 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start 
> > > max_applied_seq 1, open_ops 0
> > >
> > > 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start 
> > > blocked, all open_ops have completed
> > >
> > > 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start 
> > > nothing to do
> > >
> > > 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
> > >
> > > 2014-08-19 12:59:15.026691 7f60fab3c700 20
> > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for 
> > > max_interval
> > > 5.000000
> > >
> > > 2014-08-19 12:59:20.026826 7f60fab3c700 20
> > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after 
> > > 5.000135
> > >
> > > 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start 
> > > max_applied_seq 1, open_ops 0
> > >
> > > 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal commit_start 
> > > blocked, all open_ops have completed
> > >
> > > 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal commit_start 
> > > nothing to do
> > >
> > > 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal commit_start
> > >
> > >
> > >
> > >
> > >
> > > Could you please confirm this as a valid defect ?
> > >
> > >
> > >
> > > If so, sending a signal on aio_cond in case of seq = 0, could be the 
> > > solution ?
> > >
> > >
> > >
> > > Please let me know if there is any potential workaround for this 
> > > while deploying with ceph-deploy. Will ceph-deploy accept file path as journal ?
> > >
> > >
> > >
> > > Thanks & Regards
> > >
> > > Somnath
> > >
> > >
> > > ____________________________________________________________________
> > > __
> > > ______
> > >
> > > PLEASE NOTE: The information contained in this electronic mail 
> > > message is intended only for the use of the designated recipient(s) 
> > > named above. If the reader of this message is not the intended 
> > > recipient, you are hereby notified that you have received this 
> > > message in error and that any review, dissemination, distribution, 
> > > or copying of this message is strictly prohibited. If you have 
> > > received this communication in error, please notify the sender by 
> > > telephone or e-mail (as shown above) immediately and destroy any and 
> > > all copies of this message in your possession (whether hard copies or electronically stored copies).
> > >
> > >
> > >
> > 
> > ________________________________
> > 
> > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-20 15:33         ` Sage Weil
@ 2014-08-21  1:54           ` Ma, Jianpeng
  2014-08-21  3:52             ` Sage Weil
  0 siblings, 1 reply; 22+ messages in thread
From: Ma, Jianpeng @ 2014-08-21  1:54 UTC (permalink / raw)
  To: Sage Weil, Somnath Roy
  Cc: Samuel Just (sam.just@inktank.com), ceph-devel, Mark Kirkwood

Yes, Maybe for io_submit, it must use io_getevent. Otherwise the result is undefined.
If stop_write == true, we don't use aio. How about this way?

Jianpeng

> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Wednesday, August 20, 2014 11:34 PM
> To: Somnath Roy
> Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; Mark
> Kirkwood; Ma, Jianpeng
> Subject: RE: Deadlock in ceph journal
> 
> I suspect what is really needed is a drain_aio() function that will wait for all
> pending aio ops to complete on shutdown.  What happens to those IOs if the
> process exists while they are in flight is probably undefined; we should just
> avoid doing that.
> 
> sage
> 
> 
> On Wed, 20 Aug 2014, Somnath Roy wrote:
> 
> > I will also take the patch and test it out.
> >
> > Thanks & Regards
> > Somnath
> >
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@redhat.com]
> > Sent: Tuesday, August 19, 2014 9:51 PM
> > To: Somnath Roy
> > Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org;
> > Mark Kirkwood; jianpeng.ma@intel.com
> > Subject: RE: Deadlock in ceph journal
> >
> > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > > Thanks Sage !
> > > So, the latest master should have the fix, right ?
> >
> > The original patch that caused the regression is reverted, but we'd like to
> reapply it if we sort out the issues.  wip-filejournal has the offending patch and
> your fix.. but I'm eager to hear if Jianpeng and Mark can confirm it's
> complete/correct or if there is still a problem.
> >
> > sage
> >
> > >
> > > Regards
> > > Somnath
> > >
> > > -----Original Message-----
> > > From: Sage Weil [mailto:sweil@redhat.com]
> > > Sent: Tuesday, August 19, 2014 8:55 PM
> > > To: Somnath Roy
> > > Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org;
> > > Mark Kirkwood; jianpeng.ma@intel.com
> > > Subject: RE: Deadlock in ceph journal
> > >
> > > [Copying ceph-devel, dropping ceph-users]
> > >
> > > Yeah, that looks like a bug.  I pushed wip-filejournal that reapplies
> Jianpeng's original patch and this one.  I'm not certain about last other
> suggested fix, though, but I'm hoping that this fix explains the strange behavior
> Jianpeng and Mark have seen?
> > >
> > > sage
> > >
> > >
> > > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > > >
> > > > I think this is the issue..
> > > >
> > > >
> > > >
> > > > http://tracker.ceph.com/issues/9073
> > > >
> > > >
> > > >
> > > > Thanks & Regards
> > > >
> > > > Somnath
> > > >
> > > >
> > > >
> > > > From: Somnath Roy
> > > > Sent: Tuesday, August 19, 2014 6:25 PM
> > > > To: Sage Weil (sage@inktank.com); Samuel Just
> > > > (sam.just@inktank.com)
> > > > Cc: ceph-users@lists.ceph.com
> > > > Subject: Deadlock in ceph journal
> > > >
> > > >
> > > >
> > > > Hi Sage/Sam,
> > > >
> > > > During our testing we found a potential deadlock scenario in the
> > > > filestore journal code base. This is happening because of two reason.
> > > >
> > > >
> > > >
> > > > 1.       This is because code is not signaling aio_cond from
> > > > check_aio_completion() in case seq = 0
> > > >
> > > > 2.       Following changes in the write_thread_entry() is allowing a
> > > > very first header write with seq = 0.
> > > >
> > > >                if (writeq.empty() && !must_write_header) {
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Now, during ceph-deploy activate this is what happening.
> > > >
> > > >
> > > >
> > > > 1. The very first write of header with seq = 0 issued and it is
> > > > waiting for aio completion. So, aio_num = 1.
> > > >
> > > > 2. superblock write came in and got into while (aio_num > 0) block
> > > > of
> > > > write_thread_entry() and waiting on the aio_cond
> > > >
> > > > 3. The seq = 0 aio completed but not setting completed_something =
> > > > true and as a result aio_cond is not signaled.
> > > >
> > > > 4. write_thread_entry() is getting into deadlock.
> > > >
> > > >
> > > >
> > > > This is a timing problem and if header write is returned before
> > > > superblock write this will not happen and will be happening in
> > > > case of block journal device only (aio is enabled).
> > > >
> > > >
> > > >
> > > > Here is the log snippet we are getting.
> > > >
> > > >
> > > >
> > > > 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal
> > > > write_thread_entry start
> > > >
> > > > 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal
> > > > prepare_multi_write queue_pos now 4096
> > > >
> > > > 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write
> > > > writing
> > > > 4096~0 + header
> > > >
> > > > 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl
> > > > 0~4096 seq 0
> > > >
> > > > 2014-08-19 12:59:10.029442 7f60f9339700 10 journal
> > > > write_finish_thread_entry enter
> > > >
> > > > 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl ..
> > > > 0~4096 in
> > > > 1
> > > >
> > > > 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl
> > > > 4096~0 seq 0
> > > >
> > > > 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle
> > > > finished 0 ops and 0 bytes, now 0 ops and 0 bytes
> > > >
> > > > 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal
> > > > write_thread_entry going to sleep
> > > >
> > > > 2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
> > > >
> > > > 2014-08-19 12:59:10.029566 7f60f9339700 20 journal
> > > > write_finish_thread_entry waiting for aio(s)
> > > >
> > > > 2014-08-19 12:59:10.029726 7f60ff178800 15
> > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> > > > meta/23c2fcde/osd_superblock/0//-1 0~0
> > > >
> > > > 2014-08-19 12:59:10.029793 7f60ff178800 -1
> > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> > > > 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or
> > > > directory
> > > >
> > > > 2014-08-19 12:59:10.029815 7f60ff178800 10
> > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> > > > FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error:
> > > > (2) No such file or directory
> > > >
> > > > 2014-08-19 12:59:10.029892 7f60ff178800  5
> > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new
> > > > osr(default
> > > > 0x42ea9f0)/0x42ea9f0
> > > >
> > > > 2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start
> > > > 2
> > > >
> > > > 2014-08-19 12:59:10.030009 7f60ff178800  5
> > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions
> > > > (writeahead) 2
> > > > 0x7fff6e817080
> > > >
> > > > 2014-08-19 12:59:10.030028 7f60ff178800 10 journal
> > > > op_journal_transactions 2
> > > > 0x7fff6e817080
> > > >
> > > > 2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry
> > > > seq
> > > > 2 len
> > > > 505 (0x42a76f0)
> > > >
> > > > 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal
> > > > write_thread_entry woke up
> > > >
> > > > 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal
> > > > write_thread_entry aio
> > > > throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
> > > >
> > > > 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal
> > > > write_thread_entry deferring until more aios complete: 1 aios with
> > > > 4096 bytes needs 4 bytes to start a new aio (currently 0 pending)
> > > >
> > > > 2014-08-19 12:59:10.030084 7f60ff178800 10 journal
> > > > op_submit_finish
> > > > 2
> > > >
> > > > 2014-08-19 12:59:10.030389 7f60f9339700 10 journal
> > > > write_finish_thread_entry aio 0~4096 done
> > > >
> > > > 2014-08-19 12:59:10.030402 7f60f9339700 20 journal
> > > > check_aio_completion
> > > >
> > > > 2014-08-19 12:59:10.030406 7f60f9339700 20 journal
> > > > check_aio_completion completed seq 0 0~4096
> > > >
> > > > 2014-08-19 12:59:10.030412 7f60f9339700 20 journal
> > > > write_finish_thread_entry sleeping
> > > >
> > > > 2014-08-19 12:59:15.026609 7f60fab3c700 20
> > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after
> > > > 5.000459
> > > >
> > > > 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start
> > > > max_applied_seq 1, open_ops 0
> > > >
> > > > 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start
> > > > blocked, all open_ops have completed
> > > >
> > > > 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start
> > > > nothing to do
> > > >
> > > > 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
> > > >
> > > > 2014-08-19 12:59:15.026691 7f60fab3c700 20
> > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for
> > > > max_interval
> > > > 5.000000
> > > >
> > > > 2014-08-19 12:59:20.026826 7f60fab3c700 20
> > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after
> > > > 5.000135
> > > >
> > > > 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start
> > > > max_applied_seq 1, open_ops 0
> > > >
> > > > 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal commit_start
> > > > blocked, all open_ops have completed
> > > >
> > > > 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal commit_start
> > > > nothing to do
> > > >
> > > > 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal commit_start
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Could you please confirm this as a valid defect ?
> > > >
> > > >
> > > >
> > > > If so, sending a signal on aio_cond in case of seq = 0, could be
> > > > the solution ?
> > > >
> > > >
> > > >
> > > > Please let me know if there is any potential workaround for this
> > > > while deploying with ceph-deploy. Will ceph-deploy accept file path as
> journal ?
> > > >
> > > >
> > > >
> > > > Thanks & Regards
> > > >
> > > > Somnath
> > > >
> > > >
> > > >
> ________________________________________________________________
> __
> > > > __
> > > > __
> > > > ______
> > > >
> > > > PLEASE NOTE: The information contained in this electronic mail
> > > > message is intended only for the use of the designated
> > > > recipient(s) named above. If the reader of this message is not the
> > > > intended recipient, you are hereby notified that you have received
> > > > this message in error and that any review, dissemination,
> > > > distribution, or copying of this message is strictly prohibited.
> > > > If you have received this communication in error, please notify
> > > > the sender by telephone or e-mail (as shown above) immediately and
> > > > destroy any and all copies of this message in your possession (whether
> hard copies or electronically stored copies).
> > > >
> > > >
> > > >
> > >
> > > ________________________________
> > >
> > > PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby notified
> that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly prohibited. If
> you have received this communication in error, please notify the sender by
> telephone or e-mail (as shown above) immediately and destroy any and all
> copies of this message in your possession (whether hard copies or
> electronically stored copies).
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > in the body of a message to majordomo@vger.kernel.org More majordomo
> > > info at  http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-21  1:54           ` Ma, Jianpeng
@ 2014-08-21  3:52             ` Sage Weil
  2014-08-21  7:30               ` Ma, Jianpeng
  0 siblings, 1 reply; 22+ messages in thread
From: Sage Weil @ 2014-08-21  3:52 UTC (permalink / raw)
  To: Ma, Jianpeng
  Cc: Somnath Roy, Samuel Just (sam.just@inktank.com),
	ceph-devel, Mark Kirkwood

On Thu, 21 Aug 2014, Ma, Jianpeng wrote:
> Yes, Maybe for io_submit, it must use io_getevent. Otherwise the result is undefined.
> If stop_write == true, we don't use aio. How about this way?

That seems reasonable, now that I understand why it doesn't work the 
other way.  Do you mind resending your original patch with a comment 
in the code to that effect?  ("do sync write since we don't wait for 
aio completions for header-only writes during shutdown")

sage


> 
> Jianpeng
> 
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@redhat.com]
> > Sent: Wednesday, August 20, 2014 11:34 PM
> > To: Somnath Roy
> > Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org; Mark
> > Kirkwood; Ma, Jianpeng
> > Subject: RE: Deadlock in ceph journal
> > 
> > I suspect what is really needed is a drain_aio() function that will wait for all
> > pending aio ops to complete on shutdown.  What happens to those IOs if the
> > process exists while they are in flight is probably undefined; we should just
> > avoid doing that.
> > 
> > sage
> > 
> > 
> > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > 
> > > I will also take the patch and test it out.
> > >
> > > Thanks & Regards
> > > Somnath
> > >
> > > -----Original Message-----
> > > From: Sage Weil [mailto:sweil@redhat.com]
> > > Sent: Tuesday, August 19, 2014 9:51 PM
> > > To: Somnath Roy
> > > Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org;
> > > Mark Kirkwood; jianpeng.ma@intel.com
> > > Subject: RE: Deadlock in ceph journal
> > >
> > > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > > > Thanks Sage !
> > > > So, the latest master should have the fix, right ?
> > >
> > > The original patch that caused the regression is reverted, but we'd like to
> > reapply it if we sort out the issues.  wip-filejournal has the offending patch and
> > your fix.. but I'm eager to hear if Jianpeng and Mark can confirm it's
> > complete/correct or if there is still a problem.
> > >
> > > sage
> > >
> > > >
> > > > Regards
> > > > Somnath
> > > >
> > > > -----Original Message-----
> > > > From: Sage Weil [mailto:sweil@redhat.com]
> > > > Sent: Tuesday, August 19, 2014 8:55 PM
> > > > To: Somnath Roy
> > > > Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org;
> > > > Mark Kirkwood; jianpeng.ma@intel.com
> > > > Subject: RE: Deadlock in ceph journal
> > > >
> > > > [Copying ceph-devel, dropping ceph-users]
> > > >
> > > > Yeah, that looks like a bug.  I pushed wip-filejournal that reapplies
> > Jianpeng's original patch and this one.  I'm not certain about last other
> > suggested fix, though, but I'm hoping that this fix explains the strange behavior
> > Jianpeng and Mark have seen?
> > > >
> > > > sage
> > > >
> > > >
> > > > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > > > >
> > > > > I think this is the issue..
> > > > >
> > > > >
> > > > >
> > > > > http://tracker.ceph.com/issues/9073
> > > > >
> > > > >
> > > > >
> > > > > Thanks & Regards
> > > > >
> > > > > Somnath
> > > > >
> > > > >
> > > > >
> > > > > From: Somnath Roy
> > > > > Sent: Tuesday, August 19, 2014 6:25 PM
> > > > > To: Sage Weil (sage@inktank.com); Samuel Just
> > > > > (sam.just@inktank.com)
> > > > > Cc: ceph-users@lists.ceph.com
> > > > > Subject: Deadlock in ceph journal
> > > > >
> > > > >
> > > > >
> > > > > Hi Sage/Sam,
> > > > >
> > > > > During our testing we found a potential deadlock scenario in the
> > > > > filestore journal code base. This is happening because of two reason.
> > > > >
> > > > >
> > > > >
> > > > > 1.       This is because code is not signaling aio_cond from
> > > > > check_aio_completion() in case seq = 0
> > > > >
> > > > > 2.       Following changes in the write_thread_entry() is allowing a
> > > > > very first header write with seq = 0.
> > > > >
> > > > >                if (writeq.empty() && !must_write_header) {
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Now, during ceph-deploy activate this is what happening.
> > > > >
> > > > >
> > > > >
> > > > > 1. The very first write of header with seq = 0 issued and it is
> > > > > waiting for aio completion. So, aio_num = 1.
> > > > >
> > > > > 2. superblock write came in and got into while (aio_num > 0) block
> > > > > of
> > > > > write_thread_entry() and waiting on the aio_cond
> > > > >
> > > > > 3. The seq = 0 aio completed but not setting completed_something =
> > > > > true and as a result aio_cond is not signaled.
> > > > >
> > > > > 4. write_thread_entry() is getting into deadlock.
> > > > >
> > > > >
> > > > >
> > > > > This is a timing problem and if header write is returned before
> > > > > superblock write this will not happen and will be happening in
> > > > > case of block journal device only (aio is enabled).
> > > > >
> > > > >
> > > > >
> > > > > Here is the log snippet we are getting.
> > > > >
> > > > >
> > > > >
> > > > > 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal
> > > > > write_thread_entry start
> > > > >
> > > > > 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal
> > > > > prepare_multi_write queue_pos now 4096
> > > > >
> > > > > 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal do_aio_write
> > > > > writing
> > > > > 4096~0 + header
> > > > >
> > > > > 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal write_aio_bl
> > > > > 0~4096 seq 0
> > > > >
> > > > > 2014-08-19 12:59:10.029442 7f60f9339700 10 journal
> > > > > write_finish_thread_entry enter
> > > > >
> > > > > 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl ..
> > > > > 0~4096 in
> > > > > 1
> > > > >
> > > > > 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal write_aio_bl
> > > > > 4096~0 seq 0
> > > > >
> > > > > 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal put_throttle
> > > > > finished 0 ops and 0 bytes, now 0 ops and 0 bytes
> > > > >
> > > > > 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal
> > > > > write_thread_entry going to sleep
> > > > >
> > > > > 2014-08-19 12:59:10.029538 7f60ff178800 10 journal journal_start
> > > > >
> > > > > 2014-08-19 12:59:10.029566 7f60f9339700 20 journal
> > > > > write_finish_thread_entry waiting for aio(s)
> > > > >
> > > > > 2014-08-19 12:59:10.029726 7f60ff178800 15
> > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> > > > > meta/23c2fcde/osd_superblock/0//-1 0~0
> > > > >
> > > > > 2014-08-19 12:59:10.029793 7f60ff178800 -1
> > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> > > > > 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or
> > > > > directory
> > > > >
> > > > > 2014-08-19 12:59:10.029815 7f60ff178800 10
> > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> > > > > FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error:
> > > > > (2) No such file or directory
> > > > >
> > > > > 2014-08-19 12:59:10.029892 7f60ff178800  5
> > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new
> > > > > osr(default
> > > > > 0x42ea9f0)/0x42ea9f0
> > > > >
> > > > > 2014-08-19 12:59:10.029922 7f60ff178800 10 journal op_submit_start
> > > > > 2
> > > > >
> > > > > 2014-08-19 12:59:10.030009 7f60ff178800  5
> > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions
> > > > > (writeahead) 2
> > > > > 0x7fff6e817080
> > > > >
> > > > > 2014-08-19 12:59:10.030028 7f60ff178800 10 journal
> > > > > op_journal_transactions 2
> > > > > 0x7fff6e817080
> > > > >
> > > > > 2014-08-19 12:59:10.030039 7f60ff178800  5 journal submit_entry
> > > > > seq
> > > > > 2 len
> > > > > 505 (0x42a76f0)
> > > > >
> > > > > 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal
> > > > > write_thread_entry woke up
> > > > >
> > > > > 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal
> > > > > write_thread_entry aio
> > > > > throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending 0
> > > > >
> > > > > 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal
> > > > > write_thread_entry deferring until more aios complete: 1 aios with
> > > > > 4096 bytes needs 4 bytes to start a new aio (currently 0 pending)
> > > > >
> > > > > 2014-08-19 12:59:10.030084 7f60ff178800 10 journal
> > > > > op_submit_finish
> > > > > 2
> > > > >
> > > > > 2014-08-19 12:59:10.030389 7f60f9339700 10 journal
> > > > > write_finish_thread_entry aio 0~4096 done
> > > > >
> > > > > 2014-08-19 12:59:10.030402 7f60f9339700 20 journal
> > > > > check_aio_completion
> > > > >
> > > > > 2014-08-19 12:59:10.030406 7f60f9339700 20 journal
> > > > > check_aio_completion completed seq 0 0~4096
> > > > >
> > > > > 2014-08-19 12:59:10.030412 7f60f9339700 20 journal
> > > > > write_finish_thread_entry sleeping
> > > > >
> > > > > 2014-08-19 12:59:15.026609 7f60fab3c700 20
> > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after
> > > > > 5.000459
> > > > >
> > > > > 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal commit_start
> > > > > max_applied_seq 1, open_ops 0
> > > > >
> > > > > 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal commit_start
> > > > > blocked, all open_ops have completed
> > > > >
> > > > > 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal commit_start
> > > > > nothing to do
> > > > >
> > > > > 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal commit_start
> > > > >
> > > > > 2014-08-19 12:59:15.026691 7f60fab3c700 20
> > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for
> > > > > max_interval
> > > > > 5.000000
> > > > >
> > > > > 2014-08-19 12:59:20.026826 7f60fab3c700 20
> > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after
> > > > > 5.000135
> > > > >
> > > > > 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal commit_start
> > > > > max_applied_seq 1, open_ops 0
> > > > >
> > > > > 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal commit_start
> > > > > blocked, all open_ops have completed
> > > > >
> > > > > 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal commit_start
> > > > > nothing to do
> > > > >
> > > > > 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal commit_start
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Could you please confirm this as a valid defect ?
> > > > >
> > > > >
> > > > >
> > > > > If so, sending a signal on aio_cond in case of seq = 0, could be
> > > > > the solution ?
> > > > >
> > > > >
> > > > >
> > > > > Please let me know if there is any potential workaround for this
> > > > > while deploying with ceph-deploy. Will ceph-deploy accept file path as
> > journal ?
> > > > >
> > > > >
> > > > >
> > > > > Thanks & Regards
> > > > >
> > > > > Somnath
> > > > >
> > > > >
> > > > >
> > ________________________________________________________________
> > __
> > > > > __
> > > > > __
> > > > > ______
> > > > >
> > > > > PLEASE NOTE: The information contained in this electronic mail
> > > > > message is intended only for the use of the designated
> > > > > recipient(s) named above. If the reader of this message is not the
> > > > > intended recipient, you are hereby notified that you have received
> > > > > this message in error and that any review, dissemination,
> > > > > distribution, or copying of this message is strictly prohibited.
> > > > > If you have received this communication in error, please notify
> > > > > the sender by telephone or e-mail (as shown above) immediately and
> > > > > destroy any and all copies of this message in your possession (whether
> > hard copies or electronically stored copies).
> > > > >
> > > > >
> > > > >
> > > >
> > > > ________________________________
> > > >
> > > > PLEASE NOTE: The information contained in this electronic mail message is
> > intended only for the use of the designated recipient(s) named above. If the
> > reader of this message is not the intended recipient, you are hereby notified
> > that you have received this message in error and that any review,
> > dissemination, distribution, or copying of this message is strictly prohibited. If
> > you have received this communication in error, please notify the sender by
> > telephone or e-mail (as shown above) immediately and destroy any and all
> > copies of this message in your possession (whether hard copies or
> > electronically stored copies).
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > in the body of a message to majordomo@vger.kernel.org More majordomo
> > > > info at  http://vger.kernel.org/majordomo-info.html
> > > >
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > in the body of a message to majordomo@vger.kernel.org More majordomo
> > > info at  http://vger.kernel.org/majordomo-info.html
> > >
> > >
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-21  3:52             ` Sage Weil
@ 2014-08-21  7:30               ` Ma, Jianpeng
  2014-08-21  8:17                 ` Mark Kirkwood
  2014-08-21 15:23                 ` Sage Weil
  0 siblings, 2 replies; 22+ messages in thread
From: Ma, Jianpeng @ 2014-08-21  7:30 UTC (permalink / raw)
  To: Sage Weil
  Cc: Somnath Roy, Samuel Just (sam.just@inktank.com),
	ceph-devel, Mark Kirkwood

Hi sage,
   The pull request:https://github.com/ceph/ceph/pull/2296.

Mark 
   After sage merge this into wip-filejournal, can you test again? I think at present only you can do this work!

Thanks!
Jianpeng 

> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Thursday, August 21, 2014 11:53 AM
> To: Ma, Jianpeng
> Cc: Somnath Roy; Samuel Just (sam.just@inktank.com);
> ceph-devel@vger.kernel.org; Mark Kirkwood
> Subject: RE: Deadlock in ceph journal
> 
> On Thu, 21 Aug 2014, Ma, Jianpeng wrote:
> > Yes, Maybe for io_submit, it must use io_getevent. Otherwise the result is
> undefined.
> > If stop_write == true, we don't use aio. How about this way?
> 
> That seems reasonable, now that I understand why it doesn't work the other
> way.  Do you mind resending your original patch with a comment in the code
> to that effect?  ("do sync write since we don't wait for aio completions for
> header-only writes during shutdown")
> 
> sage
> 
> 
> >
> > Jianpeng
> >
> > > -----Original Message-----
> > > From: Sage Weil [mailto:sweil@redhat.com]
> > > Sent: Wednesday, August 20, 2014 11:34 PM
> > > To: Somnath Roy
> > > Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org;
> > > Mark Kirkwood; Ma, Jianpeng
> > > Subject: RE: Deadlock in ceph journal
> > >
> > > I suspect what is really needed is a drain_aio() function that will
> > > wait for all pending aio ops to complete on shutdown.  What happens
> > > to those IOs if the process exists while they are in flight is
> > > probably undefined; we should just avoid doing that.
> > >
> > > sage
> > >
> > >
> > > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > >
> > > > I will also take the patch and test it out.
> > > >
> > > > Thanks & Regards
> > > > Somnath
> > > >
> > > > -----Original Message-----
> > > > From: Sage Weil [mailto:sweil@redhat.com]
> > > > Sent: Tuesday, August 19, 2014 9:51 PM
> > > > To: Somnath Roy
> > > > Cc: Samuel Just (sam.just@inktank.com);
> > > > ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
> > > > Subject: RE: Deadlock in ceph journal
> > > >
> > > > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > > > > Thanks Sage !
> > > > > So, the latest master should have the fix, right ?
> > > >
> > > > The original patch that caused the regression is reverted, but
> > > > we'd like to
> > > reapply it if we sort out the issues.  wip-filejournal has the
> > > offending patch and your fix.. but I'm eager to hear if Jianpeng and
> > > Mark can confirm it's complete/correct or if there is still a problem.
> > > >
> > > > sage
> > > >
> > > > >
> > > > > Regards
> > > > > Somnath
> > > > >
> > > > > -----Original Message-----
> > > > > From: Sage Weil [mailto:sweil@redhat.com]
> > > > > Sent: Tuesday, August 19, 2014 8:55 PM
> > > > > To: Somnath Roy
> > > > > Cc: Samuel Just (sam.just@inktank.com);
> > > > > ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
> > > > > Subject: RE: Deadlock in ceph journal
> > > > >
> > > > > [Copying ceph-devel, dropping ceph-users]
> > > > >
> > > > > Yeah, that looks like a bug.  I pushed wip-filejournal that
> > > > > reapplies
> > > Jianpeng's original patch and this one.  I'm not certain about last
> > > other suggested fix, though, but I'm hoping that this fix explains
> > > the strange behavior Jianpeng and Mark have seen?
> > > > >
> > > > > sage
> > > > >
> > > > >
> > > > > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > > > > >
> > > > > > I think this is the issue..
> > > > > >
> > > > > >
> > > > > >
> > > > > > http://tracker.ceph.com/issues/9073
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks & Regards
> > > > > >
> > > > > > Somnath
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Somnath Roy
> > > > > > Sent: Tuesday, August 19, 2014 6:25 PM
> > > > > > To: Sage Weil (sage@inktank.com); Samuel Just
> > > > > > (sam.just@inktank.com)
> > > > > > Cc: ceph-users@lists.ceph.com
> > > > > > Subject: Deadlock in ceph journal
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Sage/Sam,
> > > > > >
> > > > > > During our testing we found a potential deadlock scenario in
> > > > > > the filestore journal code base. This is happening because of two
> reason.
> > > > > >
> > > > > >
> > > > > >
> > > > > > 1.       This is because code is not signaling aio_cond from
> > > > > > check_aio_completion() in case seq = 0
> > > > > >
> > > > > > 2.       Following changes in the write_thread_entry() is allowing a
> > > > > > very first header write with seq = 0.
> > > > > >
> > > > > >                if (writeq.empty() && !must_write_header) {
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Now, during ceph-deploy activate this is what happening.
> > > > > >
> > > > > >
> > > > > >
> > > > > > 1. The very first write of header with seq = 0 issued and it
> > > > > > is waiting for aio completion. So, aio_num = 1.
> > > > > >
> > > > > > 2. superblock write came in and got into while (aio_num > 0)
> > > > > > block of
> > > > > > write_thread_entry() and waiting on the aio_cond
> > > > > >
> > > > > > 3. The seq = 0 aio completed but not setting
> > > > > > completed_something = true and as a result aio_cond is not signaled.
> > > > > >
> > > > > > 4. write_thread_entry() is getting into deadlock.
> > > > > >
> > > > > >
> > > > > >
> > > > > > This is a timing problem and if header write is returned
> > > > > > before superblock write this will not happen and will be
> > > > > > happening in case of block journal device only (aio is enabled).
> > > > > >
> > > > > >
> > > > > >
> > > > > > Here is the log snippet we are getting.
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal
> > > > > > write_thread_entry start
> > > > > >
> > > > > > 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal
> > > > > > prepare_multi_write queue_pos now 4096
> > > > > >
> > > > > > 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal
> > > > > > do_aio_write writing
> > > > > > 4096~0 + header
> > > > > >
> > > > > > 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal
> > > > > > write_aio_bl
> > > > > > 0~4096 seq 0
> > > > > >
> > > > > > 2014-08-19 12:59:10.029442 7f60f9339700 10 journal
> > > > > > write_finish_thread_entry enter
> > > > > >
> > > > > > 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl ..
> > > > > > 0~4096 in
> > > > > > 1
> > > > > >
> > > > > > 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal
> > > > > > write_aio_bl
> > > > > > 4096~0 seq 0
> > > > > >
> > > > > > 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal
> > > > > > put_throttle finished 0 ops and 0 bytes, now 0 ops and 0 bytes
> > > > > >
> > > > > > 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal
> > > > > > write_thread_entry going to sleep
> > > > > >
> > > > > > 2014-08-19 12:59:10.029538 7f60ff178800 10 journal
> > > > > > journal_start
> > > > > >
> > > > > > 2014-08-19 12:59:10.029566 7f60f9339700 20 journal
> > > > > > write_finish_thread_entry waiting for aio(s)
> > > > > >
> > > > > > 2014-08-19 12:59:10.029726 7f60ff178800 15
> > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> > > > > > meta/23c2fcde/osd_superblock/0//-1 0~0
> > > > > >
> > > > > > 2014-08-19 12:59:10.029793 7f60ff178800 -1
> > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> > > > > > 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or
> > > > > > directory
> > > > > >
> > > > > > 2014-08-19 12:59:10.029815 7f60ff178800 10
> > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> > > > > > FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error:
> > > > > > (2) No such file or directory
> > > > > >
> > > > > > 2014-08-19 12:59:10.029892 7f60ff178800  5
> > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new
> > > > > > osr(default
> > > > > > 0x42ea9f0)/0x42ea9f0
> > > > > >
> > > > > > 2014-08-19 12:59:10.029922 7f60ff178800 10 journal
> > > > > > op_submit_start
> > > > > > 2
> > > > > >
> > > > > > 2014-08-19 12:59:10.030009 7f60ff178800  5
> > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions
> > > > > > (writeahead) 2
> > > > > > 0x7fff6e817080
> > > > > >
> > > > > > 2014-08-19 12:59:10.030028 7f60ff178800 10 journal
> > > > > > op_journal_transactions 2
> > > > > > 0x7fff6e817080
> > > > > >
> > > > > > 2014-08-19 12:59:10.030039 7f60ff178800  5 journal
> > > > > > submit_entry seq
> > > > > > 2 len
> > > > > > 505 (0x42a76f0)
> > > > > >
> > > > > > 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal
> > > > > > write_thread_entry woke up
> > > > > >
> > > > > > 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal
> > > > > > write_thread_entry aio
> > > > > > throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending
> > > > > > 0
> > > > > >
> > > > > > 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal
> > > > > > write_thread_entry deferring until more aios complete: 1 aios
> > > > > > with
> > > > > > 4096 bytes needs 4 bytes to start a new aio (currently 0
> > > > > > pending)
> > > > > >
> > > > > > 2014-08-19 12:59:10.030084 7f60ff178800 10 journal
> > > > > > op_submit_finish
> > > > > > 2
> > > > > >
> > > > > > 2014-08-19 12:59:10.030389 7f60f9339700 10 journal
> > > > > > write_finish_thread_entry aio 0~4096 done
> > > > > >
> > > > > > 2014-08-19 12:59:10.030402 7f60f9339700 20 journal
> > > > > > check_aio_completion
> > > > > >
> > > > > > 2014-08-19 12:59:10.030406 7f60f9339700 20 journal
> > > > > > check_aio_completion completed seq 0 0~4096
> > > > > >
> > > > > > 2014-08-19 12:59:10.030412 7f60f9339700 20 journal
> > > > > > write_finish_thread_entry sleeping
> > > > > >
> > > > > > 2014-08-19 12:59:15.026609 7f60fab3c700 20
> > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after
> > > > > > 5.000459
> > > > > >
> > > > > > 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal
> > > > > > commit_start max_applied_seq 1, open_ops 0
> > > > > >
> > > > > > 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal
> > > > > > commit_start blocked, all open_ops have completed
> > > > > >
> > > > > > 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal
> > > > > > commit_start nothing to do
> > > > > >
> > > > > > 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal
> > > > > > commit_start
> > > > > >
> > > > > > 2014-08-19 12:59:15.026691 7f60fab3c700 20
> > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for
> > > > > > max_interval
> > > > > > 5.000000
> > > > > >
> > > > > > 2014-08-19 12:59:20.026826 7f60fab3c700 20
> > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after
> > > > > > 5.000135
> > > > > >
> > > > > > 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal
> > > > > > commit_start max_applied_seq 1, open_ops 0
> > > > > >
> > > > > > 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal
> > > > > > commit_start blocked, all open_ops have completed
> > > > > >
> > > > > > 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal
> > > > > > commit_start nothing to do
> > > > > >
> > > > > > 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal
> > > > > > commit_start
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Could you please confirm this as a valid defect ?
> > > > > >
> > > > > >
> > > > > >
> > > > > > If so, sending a signal on aio_cond in case of seq = 0, could
> > > > > > be the solution ?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Please let me know if there is any potential workaround for
> > > > > > this while deploying with ceph-deploy. Will ceph-deploy accept
> > > > > > file path as
> > > journal ?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks & Regards
> > > > > >
> > > > > > Somnath
> > > > > >
> > > > > >
> > > > > >
> > >
> ________________________________________________________________
> > > __
> > > > > > __
> > > > > > __
> > > > > > ______
> > > > > >
> > > > > > PLEASE NOTE: The information contained in this electronic mail
> > > > > > message is intended only for the use of the designated
> > > > > > recipient(s) named above. If the reader of this message is not
> > > > > > the intended recipient, you are hereby notified that you have
> > > > > > received this message in error and that any review,
> > > > > > dissemination, distribution, or copying of this message is strictly
> prohibited.
> > > > > > If you have received this communication in error, please
> > > > > > notify the sender by telephone or e-mail (as shown above)
> > > > > > immediately and destroy any and all copies of this message in
> > > > > > your possession (whether
> > > hard copies or electronically stored copies).
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > ________________________________
> > > > >
> > > > > PLEASE NOTE: The information contained in this electronic mail
> > > > > message is
> > > intended only for the use of the designated recipient(s) named
> > > above. If the reader of this message is not the intended recipient,
> > > you are hereby notified that you have received this message in error
> > > and that any review, dissemination, distribution, or copying of this
> > > message is strictly prohibited. If you have received this
> > > communication in error, please notify the sender by telephone or
> > > e-mail (as shown above) immediately and destroy any and all copies
> > > of this message in your possession (whether hard copies or electronically
> stored copies).
> > > > >
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > > in the body of a message to majordomo@vger.kernel.org More
> > > > > majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > >
> > > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > in the body of a message to majordomo@vger.kernel.org More
> > > > majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > >
> > > >
> >
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deadlock in ceph journal
  2014-08-21  7:30               ` Ma, Jianpeng
@ 2014-08-21  8:17                 ` Mark Kirkwood
  2014-08-21 15:23                 ` Sage Weil
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Kirkwood @ 2014-08-21  8:17 UTC (permalink / raw)
  To: Ma, Jianpeng, Sage Weil
  Cc: Somnath Roy, Samuel Just (sam.just@inktank.com), ceph-devel

Will do.

On 21/08/14 19:30, Ma, Jianpeng wrote:
> Mark
>     After sage merge this into wip-filejournal, can you test again? I think at present only you can do this work!
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-21  7:30               ` Ma, Jianpeng
  2014-08-21  8:17                 ` Mark Kirkwood
@ 2014-08-21 15:23                 ` Sage Weil
  2014-08-22  0:45                   ` Mark Kirkwood
  1 sibling, 1 reply; 22+ messages in thread
From: Sage Weil @ 2014-08-21 15:23 UTC (permalink / raw)
  To: Ma, Jianpeng
  Cc: Somnath Roy, Samuel Just (sam.just@inktank.com),
	ceph-devel, Mark Kirkwood

I've pushed the patch to wip-filejournal.  Mark, can you test please?

Thanks!
sage


On Thu, 21 Aug 2014, Ma, Jianpeng wrote:

> Hi sage,
>    The pull request:https://github.com/ceph/ceph/pull/2296.
> 
> Mark 
>    After sage merge this into wip-filejournal, can you test again? I think at present only you can do this work!
> 
> Thanks!
> Jianpeng 
> 
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@redhat.com]
> > Sent: Thursday, August 21, 2014 11:53 AM
> > To: Ma, Jianpeng
> > Cc: Somnath Roy; Samuel Just (sam.just@inktank.com);
> > ceph-devel@vger.kernel.org; Mark Kirkwood
> > Subject: RE: Deadlock in ceph journal
> > 
> > On Thu, 21 Aug 2014, Ma, Jianpeng wrote:
> > > Yes, Maybe for io_submit, it must use io_getevent. Otherwise the result is
> > undefined.
> > > If stop_write == true, we don't use aio. How about this way?
> > 
> > That seems reasonable, now that I understand why it doesn't work the other
> > way.  Do you mind resending your original patch with a comment in the code
> > to that effect?  ("do sync write since we don't wait for aio completions for
> > header-only writes during shutdown")
> > 
> > sage
> > 
> > 
> > >
> > > Jianpeng
> > >
> > > > -----Original Message-----
> > > > From: Sage Weil [mailto:sweil@redhat.com]
> > > > Sent: Wednesday, August 20, 2014 11:34 PM
> > > > To: Somnath Roy
> > > > Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org;
> > > > Mark Kirkwood; Ma, Jianpeng
> > > > Subject: RE: Deadlock in ceph journal
> > > >
> > > > I suspect what is really needed is a drain_aio() function that will
> > > > wait for all pending aio ops to complete on shutdown.  What happens
> > > > to those IOs if the process exists while they are in flight is
> > > > probably undefined; we should just avoid doing that.
> > > >
> > > > sage
> > > >
> > > >
> > > > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > > >
> > > > > I will also take the patch and test it out.
> > > > >
> > > > > Thanks & Regards
> > > > > Somnath
> > > > >
> > > > > -----Original Message-----
> > > > > From: Sage Weil [mailto:sweil@redhat.com]
> > > > > Sent: Tuesday, August 19, 2014 9:51 PM
> > > > > To: Somnath Roy
> > > > > Cc: Samuel Just (sam.just@inktank.com);
> > > > > ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
> > > > > Subject: RE: Deadlock in ceph journal
> > > > >
> > > > > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > > > > > Thanks Sage !
> > > > > > So, the latest master should have the fix, right ?
> > > > >
> > > > > The original patch that caused the regression is reverted, but
> > > > > we'd like to
> > > > reapply it if we sort out the issues.  wip-filejournal has the
> > > > offending patch and your fix.. but I'm eager to hear if Jianpeng and
> > > > Mark can confirm it's complete/correct or if there is still a problem.
> > > > >
> > > > > sage
> > > > >
> > > > > >
> > > > > > Regards
> > > > > > Somnath
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Sage Weil [mailto:sweil@redhat.com]
> > > > > > Sent: Tuesday, August 19, 2014 8:55 PM
> > > > > > To: Somnath Roy
> > > > > > Cc: Samuel Just (sam.just@inktank.com);
> > > > > > ceph-devel@vger.kernel.org; Mark Kirkwood; jianpeng.ma@intel.com
> > > > > > Subject: RE: Deadlock in ceph journal
> > > > > >
> > > > > > [Copying ceph-devel, dropping ceph-users]
> > > > > >
> > > > > > Yeah, that looks like a bug.  I pushed wip-filejournal that
> > > > > > reapplies
> > > > Jianpeng's original patch and this one.  I'm not certain about last
> > > > other suggested fix, though, but I'm hoping that this fix explains
> > > > the strange behavior Jianpeng and Mark have seen?
> > > > > >
> > > > > > sage
> > > > > >
> > > > > >
> > > > > > On Wed, 20 Aug 2014, Somnath Roy wrote:
> > > > > > >
> > > > > > > I think this is the issue..
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > http://tracker.ceph.com/issues/9073
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks & Regards
> > > > > > >
> > > > > > > Somnath
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > From: Somnath Roy
> > > > > > > Sent: Tuesday, August 19, 2014 6:25 PM
> > > > > > > To: Sage Weil (sage@inktank.com); Samuel Just
> > > > > > > (sam.just@inktank.com)
> > > > > > > Cc: ceph-users@lists.ceph.com
> > > > > > > Subject: Deadlock in ceph journal
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi Sage/Sam,
> > > > > > >
> > > > > > > During our testing we found a potential deadlock scenario in
> > > > > > > the filestore journal code base. This is happening because of two
> > reason.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 1.       This is because code is not signaling aio_cond from
> > > > > > > check_aio_completion() in case seq = 0
> > > > > > >
> > > > > > > 2.       Following changes in the write_thread_entry() is allowing a
> > > > > > > very first header write with seq = 0.
> > > > > > >
> > > > > > >                if (writeq.empty() && !must_write_header) {
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Now, during ceph-deploy activate this is what happening.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 1. The very first write of header with seq = 0 issued and it
> > > > > > > is waiting for aio completion. So, aio_num = 1.
> > > > > > >
> > > > > > > 2. superblock write came in and got into while (aio_num > 0)
> > > > > > > block of
> > > > > > > write_thread_entry() and waiting on the aio_cond
> > > > > > >
> > > > > > > 3. The seq = 0 aio completed but not setting
> > > > > > > completed_something = true and as a result aio_cond is not signaled.
> > > > > > >
> > > > > > > 4. write_thread_entry() is getting into deadlock.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > This is a timing problem and if header write is returned
> > > > > > > before superblock write this will not happen and will be
> > > > > > > happening in case of block journal device only (aio is enabled).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Here is the log snippet we are getting.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029363 7f60fa33b700 10 journal
> > > > > > > write_thread_entry start
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029395 7f60fa33b700 20 journal
> > > > > > > prepare_multi_write queue_pos now 4096
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029427 7f60fa33b700 15 journal
> > > > > > > do_aio_write writing
> > > > > > > 4096~0 + header
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029439 7f60fa33b700 20 journal
> > > > > > > write_aio_bl
> > > > > > > 0~4096 seq 0
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029442 7f60f9339700 10 journal
> > > > > > > write_finish_thread_entry enter
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029466 7f60fa33b700 20 journal write_aio_bl ..
> > > > > > > 0~4096 in
> > > > > > > 1
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029498 7f60fa33b700 20 journal
> > > > > > > write_aio_bl
> > > > > > > 4096~0 seq 0
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029505 7f60fa33b700  5 journal
> > > > > > > put_throttle finished 0 ops and 0 bytes, now 0 ops and 0 bytes
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029510 7f60fa33b700 20 journal
> > > > > > > write_thread_entry going to sleep
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029538 7f60ff178800 10 journal
> > > > > > > journal_start
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029566 7f60f9339700 20 journal
> > > > > > > write_finish_thread_entry waiting for aio(s)
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029726 7f60ff178800 15
> > > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) read
> > > > > > > meta/23c2fcde/osd_superblock/0//-1 0~0
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029793 7f60ff178800 -1
> > > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) could not find
> > > > > > > 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or
> > > > > > > directory
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029815 7f60ff178800 10
> > > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R)
> > > > > > > FileStore::read(meta/23c2fcde/osd_superblock/0//-1) open error:
> > > > > > > (2) No such file or directory
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029892 7f60ff178800  5
> > > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions new
> > > > > > > osr(default
> > > > > > > 0x42ea9f0)/0x42ea9f0
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.029922 7f60ff178800 10 journal
> > > > > > > op_submit_start
> > > > > > > 2
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030009 7f60ff178800  5
> > > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) queue_transactions
> > > > > > > (writeahead) 2
> > > > > > > 0x7fff6e817080
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030028 7f60ff178800 10 journal
> > > > > > > op_journal_transactions 2
> > > > > > > 0x7fff6e817080
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030039 7f60ff178800  5 journal
> > > > > > > submit_entry seq
> > > > > > > 2 len
> > > > > > > 505 (0x42a76f0)
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030065 7f60fa33b700 20 journal
> > > > > > > write_thread_entry woke up
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030070 7f60fa33b700 20 journal
> > > > > > > write_thread_entry aio
> > > > > > > throttle: aio num 1 bytes 4096 ... exp 2 min_new 4 ... pending
> > > > > > > 0
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030076 7f60fa33b700 20 journal
> > > > > > > write_thread_entry deferring until more aios complete: 1 aios
> > > > > > > with
> > > > > > > 4096 bytes needs 4 bytes to start a new aio (currently 0
> > > > > > > pending)
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030084 7f60ff178800 10 journal
> > > > > > > op_submit_finish
> > > > > > > 2
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030389 7f60f9339700 10 journal
> > > > > > > write_finish_thread_entry aio 0~4096 done
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030402 7f60f9339700 20 journal
> > > > > > > check_aio_completion
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030406 7f60f9339700 20 journal
> > > > > > > check_aio_completion completed seq 0 0~4096
> > > > > > >
> > > > > > > 2014-08-19 12:59:10.030412 7f60f9339700 20 journal
> > > > > > > write_finish_thread_entry sleeping
> > > > > > >
> > > > > > > 2014-08-19 12:59:15.026609 7f60fab3c700 20
> > > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after
> > > > > > > 5.000459
> > > > > > >
> > > > > > > 2014-08-19 12:59:15.026659 7f60fab3c700 10 journal
> > > > > > > commit_start max_applied_seq 1, open_ops 0
> > > > > > >
> > > > > > > 2014-08-19 12:59:15.026665 7f60fab3c700 10 journal
> > > > > > > commit_start blocked, all open_ops have completed
> > > > > > >
> > > > > > > 2014-08-19 12:59:15.026670 7f60fab3c700 10 journal
> > > > > > > commit_start nothing to do
> > > > > > >
> > > > > > > 2014-08-19 12:59:15.026676 7f60fab3c700 10 journal
> > > > > > > commit_start
> > > > > > >
> > > > > > > 2014-08-19 12:59:15.026691 7f60fab3c700 20
> > > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry waiting for
> > > > > > > max_interval
> > > > > > > 5.000000
> > > > > > >
> > > > > > > 2014-08-19 12:59:20.026826 7f60fab3c700 20
> > > > > > > filestore(/var/lib/ceph/tmp/mnt.NKfs2R) sync_entry woke after
> > > > > > > 5.000135
> > > > > > >
> > > > > > > 2014-08-19 12:59:20.026870 7f60fab3c700 10 journal
> > > > > > > commit_start max_applied_seq 1, open_ops 0
> > > > > > >
> > > > > > > 2014-08-19 12:59:20.026876 7f60fab3c700 10 journal
> > > > > > > commit_start blocked, all open_ops have completed
> > > > > > >
> > > > > > > 2014-08-19 12:59:20.026879 7f60fab3c700 10 journal
> > > > > > > commit_start nothing to do
> > > > > > >
> > > > > > > 2014-08-19 12:59:20.026891 7f60fab3c700 10 journal
> > > > > > > commit_start
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Could you please confirm this as a valid defect ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > If so, sending a signal on aio_cond in case of seq = 0, could
> > > > > > > be the solution ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Please let me know if there is any potential workaround for
> > > > > > > this while deploying with ceph-deploy. Will ceph-deploy accept
> > > > > > > file path as
> > > > journal ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks & Regards
> > > > > > >
> > > > > > > Somnath
> > > > > > >
> > > > > > >
> > > > > > >
> > > >
> > ________________________________________________________________
> > > > __
> > > > > > > __
> > > > > > > __
> > > > > > > ______
> > > > > > >
> > > > > > > PLEASE NOTE: The information contained in this electronic mail
> > > > > > > message is intended only for the use of the designated
> > > > > > > recipient(s) named above. If the reader of this message is not
> > > > > > > the intended recipient, you are hereby notified that you have
> > > > > > > received this message in error and that any review,
> > > > > > > dissemination, distribution, or copying of this message is strictly
> > prohibited.
> > > > > > > If you have received this communication in error, please
> > > > > > > notify the sender by telephone or e-mail (as shown above)
> > > > > > > immediately and destroy any and all copies of this message in
> > > > > > > your possession (whether
> > > > hard copies or electronically stored copies).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > ________________________________
> > > > > >
> > > > > > PLEASE NOTE: The information contained in this electronic mail
> > > > > > message is
> > > > intended only for the use of the designated recipient(s) named
> > > > above. If the reader of this message is not the intended recipient,
> > > > you are hereby notified that you have received this message in error
> > > > and that any review, dissemination, distribution, or copying of this
> > > > message is strictly prohibited. If you have received this
> > > > communication in error, please notify the sender by telephone or
> > > > e-mail (as shown above) immediately and destroy any and all copies
> > > > of this message in your possession (whether hard copies or electronically
> > stored copies).
> > > > > >
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > > > in the body of a message to majordomo@vger.kernel.org More
> > > > > > majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > >
> > > > > >
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > > in the body of a message to majordomo@vger.kernel.org More
> > > > > majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > >
> > > > >
> > >
> > >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deadlock in ceph journal
  2014-08-21 15:23                 ` Sage Weil
@ 2014-08-22  0:45                   ` Mark Kirkwood
  2014-08-22  0:49                     ` Sage Weil
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Kirkwood @ 2014-08-22  0:45 UTC (permalink / raw)
  To: Sage Weil, Ma, Jianpeng
  Cc: Somnath Roy, Samuel Just (sam.just@inktank.com), ceph-devel

On 22/08/14 03:23, Sage Weil wrote:
> I've pushed the patch to wip-filejournal.  Mark, can you test please?
>

I've tested wip-filejournal and looks good (25 test runs, good journal 
header each time).

Cheers

Mark

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deadlock in ceph journal
  2014-08-22  0:45                   ` Mark Kirkwood
@ 2014-08-22  0:49                     ` Sage Weil
  2014-08-22 22:18                       ` Mark Kirkwood
  0 siblings, 1 reply; 22+ messages in thread
From: Sage Weil @ 2014-08-22  0:49 UTC (permalink / raw)
  To: Mark Kirkwood
  Cc: Ma, Jianpeng, Somnath Roy, Samuel Just (sam.just@inktank.com),
	ceph-devel

On Fri, 22 Aug 2014, Mark Kirkwood wrote:
> On 22/08/14 03:23, Sage Weil wrote:
> > I've pushed the patch to wip-filejournal.  Mark, can you test please?
> > 
> 
> I've tested wip-filejournal and looks good (25 test runs, good journal header
> each time).

Thanks!  Merged.

sage

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deadlock in ceph journal
  2014-08-22  0:49                     ` Sage Weil
@ 2014-08-22 22:18                       ` Mark Kirkwood
  2014-08-22 22:22                         ` Somnath Roy
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Kirkwood @ 2014-08-22 22:18 UTC (permalink / raw)
  To: Sage Weil
  Cc: Ma, Jianpeng, Somnath Roy, Samuel Just (sam.just@inktank.com),
	ceph-devel

On 22/08/14 12:49, Sage Weil wrote:
> On Fri, 22 Aug 2014, Mark Kirkwood wrote:
>> On 22/08/14 03:23, Sage Weil wrote:
>>> I've pushed the patch to wip-filejournal.  Mark, can you test please?
>>>
>>
>> I've tested wip-filejournal and looks good (25 test runs, good journal header
>> each time).
>
> Thanks!  Merged.
>

Excellent.

One thing that does still concern me - if I understand what is happening 
here correctly: we write to the journal using aio until we want to stop 
doing writes (presumably pre closing it), then use normal io to write at 
that point.

Given that we appear to be using direct io whenever we use aio, does 
this mean we end up mixing direct and buffered io to the journal [1] (or 
is the normal i.e non aio write still using direct io)?

Cheers

Mark

[1] which I understand is bad...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-22 22:18                       ` Mark Kirkwood
@ 2014-08-22 22:22                         ` Somnath Roy
  2014-08-22 22:32                           ` Sage Weil
  2014-08-22 22:38                           ` Mark Kirkwood
  0 siblings, 2 replies; 22+ messages in thread
From: Somnath Roy @ 2014-08-22 22:22 UTC (permalink / raw)
  To: Mark Kirkwood, Sage Weil
  Cc: Ma, Jianpeng, Samuel Just (sam.just@inktank.com), ceph-devel

I think it is using direct io for non-aio mode as well.

Thanks & Regards
Somnath

-----Original Message-----
From: Mark Kirkwood [mailto:mark.kirkwood@catalyst.net.nz]
Sent: Friday, August 22, 2014 3:19 PM
To: Sage Weil
Cc: Ma, Jianpeng; Somnath Roy; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: Re: Deadlock in ceph journal

On 22/08/14 12:49, Sage Weil wrote:
> On Fri, 22 Aug 2014, Mark Kirkwood wrote:
>> On 22/08/14 03:23, Sage Weil wrote:
>>> I've pushed the patch to wip-filejournal.  Mark, can you test please?
>>>
>>
>> I've tested wip-filejournal and looks good (25 test runs, good
>> journal header each time).
>
> Thanks!  Merged.
>

Excellent.

One thing that does still concern me - if I understand what is happening here correctly: we write to the journal using aio until we want to stop doing writes (presumably pre closing it), then use normal io to write at that point.

Given that we appear to be using direct io whenever we use aio, does this mean we end up mixing direct and buffered io to the journal [1] (or is the normal i.e non aio write still using direct io)?

Cheers

Mark

[1] which I understand is bad...

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-22 22:22                         ` Somnath Roy
@ 2014-08-22 22:32                           ` Sage Weil
  2014-08-22 22:38                           ` Mark Kirkwood
  1 sibling, 0 replies; 22+ messages in thread
From: Sage Weil @ 2014-08-22 22:32 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Mark Kirkwood, Ma, Jianpeng, Samuel Just (sam.just@inktank.com),
	ceph-devel

On Fri, 22 Aug 2014, Somnath Roy wrote:
> I think it is using direct io for non-aio mode as well.

Right.  aio is always direct io (in our case at least).

sage

> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Mark Kirkwood [mailto:mark.kirkwood@catalyst.net.nz]
> Sent: Friday, August 22, 2014 3:19 PM
> To: Sage Weil
> Cc: Ma, Jianpeng; Somnath Roy; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
> Subject: Re: Deadlock in ceph journal
> 
> On 22/08/14 12:49, Sage Weil wrote:
> > On Fri, 22 Aug 2014, Mark Kirkwood wrote:
> >> On 22/08/14 03:23, Sage Weil wrote:
> >>> I've pushed the patch to wip-filejournal.  Mark, can you test please?
> >>>
> >>
> >> I've tested wip-filejournal and looks good (25 test runs, good
> >> journal header each time).
> >
> > Thanks!  Merged.
> >
> 
> Excellent.
> 
> One thing that does still concern me - if I understand what is happening here correctly: we write to the journal using aio until we want to stop doing writes (presumably pre closing it), then use normal io to write at that point.
> 
> Given that we appear to be using direct io whenever we use aio, does this mean we end up mixing direct and buffered io to the journal [1] (or is the normal i.e non aio write still using direct io)?
> 
> Cheers
> 
> Mark
> 
> [1] which I understand is bad...
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deadlock in ceph journal
  2014-08-22 22:22                         ` Somnath Roy
  2014-08-22 22:32                           ` Sage Weil
@ 2014-08-22 22:38                           ` Mark Kirkwood
  2014-08-25  1:03                             ` Ma, Jianpeng
  1 sibling, 1 reply; 22+ messages in thread
From: Mark Kirkwood @ 2014-08-22 22:38 UTC (permalink / raw)
  To: Somnath Roy, Sage Weil
  Cc: Ma, Jianpeng, Samuel Just (sam.just@inktank.com), ceph-devel

On 23/08/14 10:22, Somnath Roy wrote:
> I think it is using direct io for non-aio mode as well.
>
> Thanks & Regards
> Somnath
>

> One thing that does still concern me - if I understand what is happening here correctly: we write to the journal using aio until we want to stop doing writes (presumably pre closing it), then use normal io to write at that point.
>
> Given that we appear to be using direct io whenever we use aio, does this mean we end up mixing direct and buffered io to the journal [1] (or is the normal i.e non aio write still using direct io)?
>
>

Thanks Somnath,

I think you are correct (I missed the bit in FileJournal::_open that 
seems to cover this):


   if (forwrite) {
     flags = O_RDWR;
     if (directio)
       flags |= O_DIRECT | O_DSYNC;


i.e the journal is opened with DIRECT, so all writes (async or not) will 
be direct.

Cheers

Mark

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-22 22:38                           ` Mark Kirkwood
@ 2014-08-25  1:03                             ` Ma, Jianpeng
  2014-08-25  2:14                               ` Sage Weil
  0 siblings, 1 reply; 22+ messages in thread
From: Ma, Jianpeng @ 2014-08-25  1:03 UTC (permalink / raw)
  To: Mark Kirkwood, Somnath Roy, Sage Weil
  Cc: Samuel Just (sam.just@inktank.com), ceph-devel

Hi all,
   At weekend, I read the kernel code about aio & direction. For close(), it don't wait aio to complete.
But for fsync(), it will wait all aio to complete. 
   Mark used this patch(which using fsync() on write_thread_entry) and the result is looks good.
   I want to revert the patch which don't use aio when closing journal. And using fsync(). It make the code simple.
   How about this?

Thanks!
Jianpeng

> ceph-devel@vger.kernel.org
> Subject: Re: Deadlock in ceph journal
> 
> On 23/08/14 10:22, Somnath Roy wrote:
> > I think it is using direct io for non-aio mode as well.
> >
> > Thanks & Regards
> > Somnath
> >
> 
> > One thing that does still concern me - if I understand what is happening here
> correctly: we write to the journal using aio until we want to stop doing writes
> (presumably pre closing it), then use normal io to write at that point.
> >
> > Given that we appear to be using direct io whenever we use aio, does this
> mean we end up mixing direct and buffered io to the journal [1] (or is the
> normal i.e non aio write still using direct io)?
> >
> >
> 
> Thanks Somnath,
> 
> I think you are correct (I missed the bit in FileJournal::_open that seems to
> cover this):
> 
> 
>    if (forwrite) {
>      flags = O_RDWR;
>      if (directio)
>        flags |= O_DIRECT | O_DSYNC;
> 
> 
> i.e the journal is opened with DIRECT, so all writes (async or not) will
> be direct.
> 
> Cheers
> 
> Mark

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-25  1:03                             ` Ma, Jianpeng
@ 2014-08-25  2:14                               ` Sage Weil
  2014-08-25  3:32                                 ` Ma, Jianpeng
  0 siblings, 1 reply; 22+ messages in thread
From: Sage Weil @ 2014-08-25  2:14 UTC (permalink / raw)
  To: Ma, Jianpeng
  Cc: Mark Kirkwood, Somnath Roy, Samuel Just (sam.just@inktank.com),
	ceph-devel

Sounds good. Can you send a patch?

sage


On Mon, 25 Aug 2014, Ma, Jianpeng wrote:

> Hi all,
>    At weekend, I read the kernel code about aio & direction. For close(), it don't wait aio to complete.
> But for fsync(), it will wait all aio to complete. 
>    Mark used this patch(which using fsync() on write_thread_entry) and the result is looks good.
>    I want to revert the patch which don't use aio when closing journal. And using fsync(). It make the code simple.
>    How about this?
> 
> Thanks!
> Jianpeng
> 
> > ceph-devel@vger.kernel.org
> > Subject: Re: Deadlock in ceph journal
> > 
> > On 23/08/14 10:22, Somnath Roy wrote:
> > > I think it is using direct io for non-aio mode as well.
> > >
> > > Thanks & Regards
> > > Somnath
> > >
> > 
> > > One thing that does still concern me - if I understand what is happening here
> > correctly: we write to the journal using aio until we want to stop doing writes
> > (presumably pre closing it), then use normal io to write at that point.
> > >
> > > Given that we appear to be using direct io whenever we use aio, does this
> > mean we end up mixing direct and buffered io to the journal [1] (or is the
> > normal i.e non aio write still using direct io)?
> > >
> > >
> > 
> > Thanks Somnath,
> > 
> > I think you are correct (I missed the bit in FileJournal::_open that seems to
> > cover this):
> > 
> > 
> >    if (forwrite) {
> >      flags = O_RDWR;
> >      if (directio)
> >        flags |= O_DIRECT | O_DSYNC;
> > 
> > 
> > i.e the journal is opened with DIRECT, so all writes (async or not) will
> > be direct.
> > 
> > Cheers
> > 
> > Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Deadlock in ceph journal
  2014-08-25  2:14                               ` Sage Weil
@ 2014-08-25  3:32                                 ` Ma, Jianpeng
  0 siblings, 0 replies; 22+ messages in thread
From: Ma, Jianpeng @ 2014-08-25  3:32 UTC (permalink / raw)
  To: Sage Weil
  Cc: Mark Kirkwood, Somnath Roy, Samuel Just (sam.just@inktank.com),
	ceph-devel

[-- Attachment #1: Type: text/plain, Size: 2616 bytes --]

The attachment is the patch.

Jianpeng

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Monday, August 25, 2014 10:15 AM
> To: Ma, Jianpeng
> Cc: Mark Kirkwood; Somnath Roy; Samuel Just (sam.just@inktank.com);
> ceph-devel@vger.kernel.org
> Subject: RE: Deadlock in ceph journal
> 
> Sounds good. Can you send a patch?
> 
> sage
> 
> 
> On Mon, 25 Aug 2014, Ma, Jianpeng wrote:
> 
> > Hi all,
> >    At weekend, I read the kernel code about aio & direction. For close(), it
> don't wait aio to complete.
> > But for fsync(), it will wait all aio to complete.
> >    Mark used this patch(which using fsync() on write_thread_entry) and the
> result is looks good.
> >    I want to revert the patch which don't use aio when closing journal. And
> using fsync(). It make the code simple.
> >    How about this?
> >
> > Thanks!
> > Jianpeng
> >
> > > ceph-devel@vger.kernel.org
> > > Subject: Re: Deadlock in ceph journal
> > >
> > > On 23/08/14 10:22, Somnath Roy wrote:
> > > > I think it is using direct io for non-aio mode as well.
> > > >
> > > > Thanks & Regards
> > > > Somnath
> > > >
> > >
> > > > One thing that does still concern me - if I understand what is
> > > > happening here
> > > correctly: we write to the journal using aio until we want to stop
> > > doing writes (presumably pre closing it), then use normal io to write at that
> point.
> > > >
> > > > Given that we appear to be using direct io whenever we use aio,
> > > > does this
> > > mean we end up mixing direct and buffered io to the journal [1] (or
> > > is the normal i.e non aio write still using direct io)?
> > > >
> > > >
> > >
> > > Thanks Somnath,
> > >
> > > I think you are correct (I missed the bit in FileJournal::_open that
> > > seems to cover this):
> > >
> > >
> > >    if (forwrite) {
> > >      flags = O_RDWR;
> > >      if (directio)
> > >        flags |= O_DIRECT | O_DSYNC;
> > >
> > >
> > > i.e the journal is opened with DIRECT, so all writes (async or not)
> > > will be direct.
> > >
> > > Cheers
> > >
> > > Mark
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: 0001-os-FileJournal-Before-write_thread_entry-exit-using-.patch --]
[-- Type: application/octet-stream, Size: 1700 bytes --]

From 56a3267fe398aa782af717460332ea9360d73729 Mon Sep 17 00:00:00 2001
From: Ma Jianpeng <jianpeng.ma@intel.com>
Date: Mon, 25 Aug 2014 11:20:48 +0800
Subject: [PATCH] os/FileJournal: Before write_thread_entry exit, using fsync()
 to make sure all data to disk.

Commit(e870fd08ce) the aim is  make sure all aio data to disk when closing
journal. The fsync() can achieve the same effect. For aio+directio,
after io_submit, the content is in request-queue of block device.
Fsync() will send a flush request to wait all requests of block queue to
complete.

Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
---
 src/os/FileJournal.cc | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/os/FileJournal.cc b/src/os/FileJournal.cc
index d6ddc7f..1f3a09b 100644
--- a/src/os/FileJournal.cc
+++ b/src/os/FileJournal.cc
@@ -1125,9 +1125,7 @@ void FileJournal::write_thread_entry()
     }
     
 #ifdef HAVE_LIBAIO
-    //We hope write_finish_thread_entry return until the last aios complete
-    //when set write_stop. But it can't. So don't use aio mode when shutdown.
-    if (aio && !write_stop) {
+    if (aio) {
       Mutex::Locker locker(aio_lock);
       // should we back off to limit aios in flight?  try to do this
       // adaptively so that we submit larger aios once we have lots of
@@ -1178,7 +1176,7 @@ void FileJournal::write_thread_entry()
     }
 
 #ifdef HAVE_LIBAIO
-    if (aio && !write_stop)
+    if (aio)
       do_aio_write(bl);
     else
       do_write(bl);
@@ -1188,6 +1186,7 @@ void FileJournal::write_thread_entry()
     put_throttle(orig_ops, orig_bytes);
   }
 
+  fsync(fd);
   dout(10) << "write_thread_entry finish" << dendl;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-08-25  3:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <755F6B91B3BE364F9BCA11EA3F9E0C6F2646AE7B@SACMBXIP02.sdcorp.global.sandisk.com>
2014-08-20  3:55 ` Deadlock in ceph journal Sage Weil
2014-08-20  4:38   ` Somnath Roy
2014-08-20  4:50     ` Sage Weil
2014-08-20  4:52       ` Somnath Roy
2014-08-20 15:33         ` Sage Weil
2014-08-21  1:54           ` Ma, Jianpeng
2014-08-21  3:52             ` Sage Weil
2014-08-21  7:30               ` Ma, Jianpeng
2014-08-21  8:17                 ` Mark Kirkwood
2014-08-21 15:23                 ` Sage Weil
2014-08-22  0:45                   ` Mark Kirkwood
2014-08-22  0:49                     ` Sage Weil
2014-08-22 22:18                       ` Mark Kirkwood
2014-08-22 22:22                         ` Somnath Roy
2014-08-22 22:32                           ` Sage Weil
2014-08-22 22:38                           ` Mark Kirkwood
2014-08-25  1:03                             ` Ma, Jianpeng
2014-08-25  2:14                               ` Sage Weil
2014-08-25  3:32                                 ` Ma, Jianpeng
2014-08-20  4:58     ` Mark Kirkwood
2014-08-20  5:04       ` Mark Kirkwood
2014-08-20  4:49   ` Somnath Roy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.