All of lore.kernel.org
 help / color / mirror / Atom feed
* is mdadm RAID1 disk full sync
@ 2015-03-21 11:01 lingli tang
  2015-03-22  3:20 ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: lingli tang @ 2015-03-21 11:01 UTC (permalink / raw)
  To: linux-raid

I am a newbie of mdadm. I have a question but find no answer in
document or google for last 10 days.

The question is : RAID1 made by mdadm is full sync? for example, I
have two disk(sdb and sdc) to make RAID1 disk (/dev/md127), if I
commit an IO to the RAID1 disk (md127), it will return back to me when
all the two disk commit successfully      or      it will return back
to me once just one of the disk successfully commit.

I have test with xfs and ext4 with sync option, and it seems that two
disk have lots of commit difference after reboot the server. is that
means mdadm return success when one of the disk is commit
successfully?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-21 11:01 is mdadm RAID1 disk full sync lingli tang
@ 2015-03-22  3:20 ` NeilBrown
  2015-03-22  5:00   ` lingli tang
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2015-03-22  3:20 UTC (permalink / raw)
  To: lingli tang; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1204 bytes --]

On Sat, 21 Mar 2015 19:01:54 +0800 lingli tang <tanglingli001@gmail.com>
wrote:

> I am a newbie of mdadm. I have a question but find no answer in
> document or google for last 10 days.
> 
> The question is : RAID1 made by mdadm is full sync? for example, I
> have two disk(sdb and sdc) to make RAID1 disk (/dev/md127), if I
> commit an IO to the RAID1 disk (md127), it will return back to me when
> all the two disk commit successfully      or      it will return back
> to me once just one of the disk successfully commit.

The write request will not return until it has been submitted to all, and
returned by, all working devices.

> 
> I have test with xfs and ext4 with sync option, and it seems that two
> disk have lots of commit difference after reboot the server. is that
> means mdadm return success when one of the disk is commit
> successfully?

That certainly shouldn't happen.  I would need more details of the experiment
that you performed.

NeilBrown


> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-22  3:20 ` NeilBrown
@ 2015-03-22  5:00   ` lingli tang
  2015-03-22  5:38     ` NeilBrown
  2015-03-22  7:28     ` Adam Goryachev
  0 siblings, 2 replies; 12+ messages in thread
From: lingli tang @ 2015-03-22  5:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thanks for reply.

I have create a raid1 with two fusion io PCIe flash disk:
mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
/dev/fioa2 /dev/mapper/mpathc
/dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
load disk export from server B.

After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option on /data1
and we will run mysql binlog on it.
In order to avoid data loss  of mysql binlog we have set
sync_binlog=1. so every sql commit will call fsync() to flush to disk.

according to your description. if we reboot the server A, the two disk
data on different server will be the same.
but after the server A restarted, we assemble the two disk on two
server, data is different on the two server, disk on server B lost
more than one sql commit.

I have checked it with strace 'mysqld' on Server A.
I found a sql commit and fsync() on binlog file handle on server A but
this sql can not find in assembled disk on server B.

I also test it with two SAS disk, Server B still has more than one sql
commit lost.


2015-03-22 11:20 GMT+08:00 NeilBrown <neilb@suse.de>:
> On Sat, 21 Mar 2015 19:01:54 +0800 lingli tang <tanglingli001@gmail.com>
> wrote:
>
>> I am a newbie of mdadm. I have a question but find no answer in
>> document or google for last 10 days.
>>
>> The question is : RAID1 made by mdadm is full sync? for example, I
>> have two disk(sdb and sdc) to make RAID1 disk (/dev/md127), if I
>> commit an IO to the RAID1 disk (md127), it will return back to me when
>> all the two disk commit successfully      or      it will return back
>> to me once just one of the disk successfully commit.
>
> The write request will not return until it has been submitted to all, and
> returned by, all working devices.
>
>>
>> I have test with xfs and ext4 with sync option, and it seems that two
>> disk have lots of commit difference after reboot the server. is that
>> means mdadm return success when one of the disk is commit
>> successfully?
>
> That certainly shouldn't happen.  I would need more details of the experiment
> that you performed.
>
> NeilBrown
>
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-22  5:00   ` lingli tang
@ 2015-03-22  5:38     ` NeilBrown
  2015-03-22 11:31       ` lingli tang
  2015-03-22  7:28     ` Adam Goryachev
  1 sibling, 1 reply; 12+ messages in thread
From: NeilBrown @ 2015-03-22  5:38 UTC (permalink / raw)
  To: lingli tang; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2934 bytes --]

On Sun, 22 Mar 2015 13:00:54 +0800 lingli tang <tanglingli001@gmail.com>
wrote:

> Thanks for reply.
> 
> I have create a raid1 with two fusion io PCIe flash disk:
> mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
> /dev/fioa2 /dev/mapper/mpathc
> /dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
> load disk export from server B.
> 
> After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option on /data1
> and we will run mysql binlog on it.
> In order to avoid data loss  of mysql binlog we have set
> sync_binlog=1. so every sql commit will call fsync() to flush to disk.
> 
> according to your description. if we reboot the server A, the two disk
> data on different server will be the same.
> but after the server A restarted, we assemble the two disk on two
> server, data is different on the two server, disk on server B lost
> more than one sql commit.

What exactly do you mean by "reboot"??
Is this a clean shutdown or do you remove the power or something like that.

If you remove the power, then it is very possible that some requests will
have been submitted to one device but not the other.
If you have a clean shutdown, then the two devices should be identical.

NeilBrown


> 
> I have checked it with strace 'mysqld' on Server A.
> I found a sql commit and fsync() on binlog file handle on server A but
> this sql can not find in assembled disk on server B.
> 
> I also test it with two SAS disk, Server B still has more than one sql
> commit lost.
> 
> 
> 2015-03-22 11:20 GMT+08:00 NeilBrown <neilb@suse.de>:
> > On Sat, 21 Mar 2015 19:01:54 +0800 lingli tang <tanglingli001@gmail.com>
> > wrote:
> >
> >> I am a newbie of mdadm. I have a question but find no answer in
> >> document or google for last 10 days.
> >>
> >> The question is : RAID1 made by mdadm is full sync? for example, I
> >> have two disk(sdb and sdc) to make RAID1 disk (/dev/md127), if I
> >> commit an IO to the RAID1 disk (md127), it will return back to me when
> >> all the two disk commit successfully      or      it will return back
> >> to me once just one of the disk successfully commit.
> >
> > The write request will not return until it has been submitted to all, and
> > returned by, all working devices.
> >
> >>
> >> I have test with xfs and ext4 with sync option, and it seems that two
> >> disk have lots of commit difference after reboot the server. is that
> >> means mdadm return success when one of the disk is commit
> >> successfully?
> >
> > That certainly shouldn't happen.  I would need more details of the experiment
> > that you performed.
> >
> > NeilBrown
> >
> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-22  5:00   ` lingli tang
  2015-03-22  5:38     ` NeilBrown
@ 2015-03-22  7:28     ` Adam Goryachev
  2015-03-22 12:29       ` lingli tang
  1 sibling, 1 reply; 12+ messages in thread
From: Adam Goryachev @ 2015-03-22  7:28 UTC (permalink / raw)
  To: lingli tang; +Cc: linux-raid



On 22/03/2015 16:00, lingli tang wrote:
> Thanks for reply.
>
> I have create a raid1 with two fusion io PCIe flash disk:
> mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
> /dev/fioa2 /dev/mapper/mpathc
> /dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
> load disk export from server B.
>
> After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option on /data1
> and we will run mysql binlog on it.
> In order to avoid data loss  of mysql binlog we have set
> sync_binlog=1. so every sql commit will call fsync() to flush to disk.
>
> according to your description. if we reboot the server A, the two disk
> data on different server will be the same.
> but after the server A restarted, we assemble the two disk on two
> server, data is different on the two server, disk on server B lost
> more than one sql commit.
>
> I have checked it with strace 'mysqld' on Server A.
> I found a sql commit and fsync() on binlog file handle on server A but
> this sql can not find in assembled disk on server B.
>
> I also test it with two SAS disk, Server B still has more than one sql
> commit lost.
Sounds like you might be better using something like DRBD (www.drbd.org) 
which has different modes, one of which will do what you are asking (not 
respond until both systems have confirmed the data is written to the 
local disk).

In your current case, even if md is correctly writing to both underlying 
'devices' you have multiple layers under one of the devices, so you 
should confirm that *all* of those layers are properly passing through 
the data without any caching/etc.

Regards,
Adam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-22  5:38     ` NeilBrown
@ 2015-03-22 11:31       ` lingli tang
  2015-03-23  2:52         ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: lingli tang @ 2015-03-22 11:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Yes, I just issue 'reboot' on Server A.
But I am curious about why 'some' request will lost to other server.
Is It should be only one request lost(the last IO committed )
according to full sync strategy.

2015-03-22 13:38 GMT+08:00 NeilBrown <neilb@suse.de>:
> On Sun, 22 Mar 2015 13:00:54 +0800 lingli tang <tanglingli001@gmail.com>
> wrote:
>
>> Thanks for reply.
>>
>> I have create a raid1 with two fusion io PCIe flash disk:
>> mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
>> /dev/fioa2 /dev/mapper/mpathc
>> /dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
>> load disk export from server B.
>>
>> After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option on /data1
>> and we will run mysql binlog on it.
>> In order to avoid data loss  of mysql binlog we have set
>> sync_binlog=1. so every sql commit will call fsync() to flush to disk.
>>
>> according to your description. if we reboot the server A, the two disk
>> data on different server will be the same.
>> but after the server A restarted, we assemble the two disk on two
>> server, data is different on the two server, disk on server B lost
>> more than one sql commit.
>
> What exactly do you mean by "reboot"??
> Is this a clean shutdown or do you remove the power or something like that.
>
> If you remove the power, then it is very possible that some requests will
> have been submitted to one device but not the other.
> If you have a clean shutdown, then the two devices should be identical.
>
> NeilBrown
>
>
>>
>> I have checked it with strace 'mysqld' on Server A.
>> I found a sql commit and fsync() on binlog file handle on server A but
>> this sql can not find in assembled disk on server B.
>>
>> I also test it with two SAS disk, Server B still has more than one sql
>> commit lost.
>>
>>
>> 2015-03-22 11:20 GMT+08:00 NeilBrown <neilb@suse.de>:
>> > On Sat, 21 Mar 2015 19:01:54 +0800 lingli tang <tanglingli001@gmail.com>
>> > wrote:
>> >
>> >> I am a newbie of mdadm. I have a question but find no answer in
>> >> document or google for last 10 days.
>> >>
>> >> The question is : RAID1 made by mdadm is full sync? for example, I
>> >> have two disk(sdb and sdc) to make RAID1 disk (/dev/md127), if I
>> >> commit an IO to the RAID1 disk (md127), it will return back to me when
>> >> all the two disk commit successfully      or      it will return back
>> >> to me once just one of the disk successfully commit.
>> >
>> > The write request will not return until it has been submitted to all, and
>> > returned by, all working devices.
>> >
>> >>
>> >> I have test with xfs and ext4 with sync option, and it seems that two
>> >> disk have lots of commit difference after reboot the server. is that
>> >> means mdadm return success when one of the disk is commit
>> >> successfully?
>> >
>> > That certainly shouldn't happen.  I would need more details of the experiment
>> > that you performed.
>> >
>> > NeilBrown
>> >
>> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-22  7:28     ` Adam Goryachev
@ 2015-03-22 12:29       ` lingli tang
  2015-03-22 12:51         ` Adam Goryachev
  0 siblings, 1 reply; 12+ messages in thread
From: lingli tang @ 2015-03-22 12:29 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: linux-raid

Thanks very much.
I will try DRBD later
But I want to figure this out.

I have export disk using tgtd and load disk on another server using
iscsiadm with infiniband  of iser protocol.
Does ISCSI/Iser have any cache on it.


2015-03-22 15:28 GMT+08:00 Adam Goryachev <mailinglists@websitemanagers.com.au>:
>
>
> On 22/03/2015 16:00, lingli tang wrote:
>>
>> Thanks for reply.
>>
>> I have create a raid1 with two fusion io PCIe flash disk:
>> mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
>> /dev/fioa2 /dev/mapper/mpathc
>> /dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
>> load disk export from server B.
>>
>> After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option on
>> /data1
>> and we will run mysql binlog on it.
>> In order to avoid data loss  of mysql binlog we have set
>> sync_binlog=1. so every sql commit will call fsync() to flush to disk.
>>
>> according to your description. if we reboot the server A, the two disk
>> data on different server will be the same.
>> but after the server A restarted, we assemble the two disk on two
>> server, data is different on the two server, disk on server B lost
>> more than one sql commit.
>>
>> I have checked it with strace 'mysqld' on Server A.
>> I found a sql commit and fsync() on binlog file handle on server A but
>> this sql can not find in assembled disk on server B.
>>
>> I also test it with two SAS disk, Server B still has more than one sql
>> commit lost.
>
> Sounds like you might be better using something like DRBD (www.drbd.org)
> which has different modes, one of which will do what you are asking (not
> respond until both systems have confirmed the data is written to the local
> disk).
>
> In your current case, even if md is correctly writing to both underlying
> 'devices' you have multiple layers under one of the devices, so you should
> confirm that *all* of those layers are properly passing through the data
> without any caching/etc.
>
> Regards,
> Adam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-22 12:29       ` lingli tang
@ 2015-03-22 12:51         ` Adam Goryachev
  2015-03-23  8:34           ` lingli tang
  0 siblings, 1 reply; 12+ messages in thread
From: Adam Goryachev @ 2015-03-22 12:51 UTC (permalink / raw)
  To: lingli tang; +Cc: linux-raid



On 22/03/2015 23:29, lingli tang wrote:
> Thanks very much.
> I will try DRBD later
> But I want to figure this out.
>
> I have export disk using tgtd and load disk on another server using
> iscsiadm with infiniband  of iser protocol.
> Does ISCSI/Iser have any cache on it.
Can you test that by removing the local disk from the MD array, or 
changing your test so writes are directly to the remote device. Then run 
the test, shutdown, and check the remote disk to see if it has all the 
expected data, or still only some of the expected data. This will remove 
MD as a suspect. Continue to try and get "closer" to the remote until 
you can find the culprit. You might also use tcpdump or similar to sniff 
the network, which will tell you if the expected data is being sent to 
the remote (and when).

Sorry, I don't know anywhere near enough to comment on things like 
infiniband/iser, but these are the steps I would look into. Hope that it 
is helpful.

PS, I do use DRBD, and iSCSI, and it has been working well in my 
environment for the last year or so, I have no commercial 
interest/benefit from you using it, just a happy customer.

Regards,
Adam
>
> 2015-03-22 15:28 GMT+08:00 Adam Goryachev <mailinglists@websitemanagers.com.au>:
>>
>> On 22/03/2015 16:00, lingli tang wrote:
>>> Thanks for reply.
>>>
>>> I have create a raid1 with two fusion io PCIe flash disk:
>>> mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
>>> /dev/fioa2 /dev/mapper/mpathc
>>> /dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
>>> load disk export from server B.
>>>
>>> After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option on
>>> /data1
>>> and we will run mysql binlog on it.
>>> In order to avoid data loss  of mysql binlog we have set
>>> sync_binlog=1. so every sql commit will call fsync() to flush to disk.
>>>
>>> according to your description. if we reboot the server A, the two disk
>>> data on different server will be the same.
>>> but after the server A restarted, we assemble the two disk on two
>>> server, data is different on the two server, disk on server B lost
>>> more than one sql commit.
>>>
>>> I have checked it with strace 'mysqld' on Server A.
>>> I found a sql commit and fsync() on binlog file handle on server A but
>>> this sql can not find in assembled disk on server B.
>>>
>>> I also test it with two SAS disk, Server B still has more than one sql
>>> commit lost.
>> Sounds like you might be better using something like DRBD (www.drbd.org)
>> which has different modes, one of which will do what you are asking (not
>> respond until both systems have confirmed the data is written to the local
>> disk).
>>
>> In your current case, even if md is correctly writing to both underlying
>> 'devices' you have multiple layers under one of the devices, so you should
>> confirm that *all* of those layers are properly passing through the data
>> without any caching/etc.
>>
>> Regards,
>> Adam
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-22 11:31       ` lingli tang
@ 2015-03-23  2:52         ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2015-03-23  2:52 UTC (permalink / raw)
  To: lingli tang; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4206 bytes --]

On Sun, 22 Mar 2015 19:31:51 +0800 lingli tang <tanglingli001@gmail.com>
wrote:

> Yes, I just issue 'reboot' on Server A.

I really cannot explain that then.
If it is a completely clean "reboot" (not "reboot -f -n" for example),
then both devices should end up identical even if you weren't calling "fsync"
all the time.


> But I am curious about why 'some' request will lost to other server.
> Is It should be only one request lost(the last IO committed )
> according to full sync strategy.

Anything written since the last 'sync' or 'fsync' could be different.  How
many individual requests that is might depend on the filesystem.  But you
certainly shouldn't seem much difference, even if there is a crash.

Sorry I can't help more.

NeilBrown



> 
> 2015-03-22 13:38 GMT+08:00 NeilBrown <neilb@suse.de>:
> > On Sun, 22 Mar 2015 13:00:54 +0800 lingli tang <tanglingli001@gmail.com>
> > wrote:
> >
> >> Thanks for reply.
> >>
> >> I have create a raid1 with two fusion io PCIe flash disk:
> >> mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
> >> /dev/fioa2 /dev/mapper/mpathc
> >> /dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
> >> load disk export from server B.
> >>
> >> After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option on /data1
> >> and we will run mysql binlog on it.
> >> In order to avoid data loss  of mysql binlog we have set
> >> sync_binlog=1. so every sql commit will call fsync() to flush to disk.
> >>
> >> according to your description. if we reboot the server A, the two disk
> >> data on different server will be the same.
> >> but after the server A restarted, we assemble the two disk on two
> >> server, data is different on the two server, disk on server B lost
> >> more than one sql commit.
> >
> > What exactly do you mean by "reboot"??
> > Is this a clean shutdown or do you remove the power or something like that.
> >
> > If you remove the power, then it is very possible that some requests will
> > have been submitted to one device but not the other.
> > If you have a clean shutdown, then the two devices should be identical.
> >
> > NeilBrown
> >
> >
> >>
> >> I have checked it with strace 'mysqld' on Server A.
> >> I found a sql commit and fsync() on binlog file handle on server A but
> >> this sql can not find in assembled disk on server B.
> >>
> >> I also test it with two SAS disk, Server B still has more than one sql
> >> commit lost.
> >>
> >>
> >> 2015-03-22 11:20 GMT+08:00 NeilBrown <neilb@suse.de>:
> >> > On Sat, 21 Mar 2015 19:01:54 +0800 lingli tang <tanglingli001@gmail.com>
> >> > wrote:
> >> >
> >> >> I am a newbie of mdadm. I have a question but find no answer in
> >> >> document or google for last 10 days.
> >> >>
> >> >> The question is : RAID1 made by mdadm is full sync? for example, I
> >> >> have two disk(sdb and sdc) to make RAID1 disk (/dev/md127), if I
> >> >> commit an IO to the RAID1 disk (md127), it will return back to me when
> >> >> all the two disk commit successfully      or      it will return back
> >> >> to me once just one of the disk successfully commit.
> >> >
> >> > The write request will not return until it has been submitted to all, and
> >> > returned by, all working devices.
> >> >
> >> >>
> >> >> I have test with xfs and ext4 with sync option, and it seems that two
> >> >> disk have lots of commit difference after reboot the server. is that
> >> >> means mdadm return success when one of the disk is commit
> >> >> successfully?
> >> >
> >> > That certainly shouldn't happen.  I would need more details of the experiment
> >> > that you performed.
> >> >
> >> > NeilBrown
> >> >
> >> >
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-22 12:51         ` Adam Goryachev
@ 2015-03-23  8:34           ` lingli tang
  2015-03-23 12:57             ` Adam Goryachev
  0 siblings, 1 reply; 12+ messages in thread
From: lingli tang @ 2015-03-23  8:34 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: linux-raid

I have test multi times of:
1. mysql binlog write only on remote disk (without mdadm raid), there
are not any mysql binlog lost.
2. mysql binlog write on RAID1 of only remote disk (no local disk),
there are not any mysql binlog lost.
mysql will return error immediately with error message "Error writing
file '/home/mysql/data/mysqldata1/binlog/mysql-bin.000001' (Errcode: 5
- Input/output error)" in the upper two case

but when MySQL binlog run on RAID1 of local and remote disk, test
program which continued commit to mysql will run for 3 second and
hang in mysql_query() after reboot server. The error messge is also
not the same with upper case: "Lost connection to MySQL server during
query"

Should it be iscsi exit before mdadm, So mysql continue to write
binlog to a downgrade RAID1, which has only a local disk but  the
remote disk was just delete from mdadm.

I will try to test it.
Thanks very much.


2015-03-22 20:51 GMT+08:00 Adam Goryachev <mailinglists@websitemanagers.com.au>:
>
>
> On 22/03/2015 23:29, lingli tang wrote:
>>
>> Thanks very much.
>> I will try DRBD later
>> But I want to figure this out.
>>
>> I have export disk using tgtd and load disk on another server using
>> iscsiadm with infiniband  of iser protocol.
>> Does ISCSI/Iser have any cache on it.
>
> Can you test that by removing the local disk from the MD array, or changing
> your test so writes are directly to the remote device. Then run the test,
> shutdown, and check the remote disk to see if it has all the expected data,
> or still only some of the expected data. This will remove MD as a suspect.
> Continue to try and get "closer" to the remote until you can find the
> culprit. You might also use tcpdump or similar to sniff the network, which
> will tell you if the expected data is being sent to the remote (and when).
>
> Sorry, I don't know anywhere near enough to comment on things like
> infiniband/iser, but these are the steps I would look into. Hope that it is
> helpful.
>
> PS, I do use DRBD, and iSCSI, and it has been working well in my environment
> for the last year or so, I have no commercial interest/benefit from you
> using it, just a happy customer.
>
> Regards,
> Adam
>>
>>
>> 2015-03-22 15:28 GMT+08:00 Adam Goryachev
>> <mailinglists@websitemanagers.com.au>:
>>>
>>>
>>> On 22/03/2015 16:00, lingli tang wrote:
>>>>
>>>> Thanks for reply.
>>>>
>>>> I have create a raid1 with two fusion io PCIe flash disk:
>>>> mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
>>>> /dev/fioa2 /dev/mapper/mpathc
>>>> /dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
>>>> load disk export from server B.
>>>>
>>>> After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option
>>>> on
>>>> /data1
>>>> and we will run mysql binlog on it.
>>>> In order to avoid data loss  of mysql binlog we have set
>>>> sync_binlog=1. so every sql commit will call fsync() to flush to disk.
>>>>
>>>> according to your description. if we reboot the server A, the two disk
>>>> data on different server will be the same.
>>>> but after the server A restarted, we assemble the two disk on two
>>>> server, data is different on the two server, disk on server B lost
>>>> more than one sql commit.
>>>>
>>>> I have checked it with strace 'mysqld' on Server A.
>>>> I found a sql commit and fsync() on binlog file handle on server A but
>>>> this sql can not find in assembled disk on server B.
>>>>
>>>> I also test it with two SAS disk, Server B still has more than one sql
>>>> commit lost.
>>>
>>> Sounds like you might be better using something like DRBD (www.drbd.org)
>>> which has different modes, one of which will do what you are asking (not
>>> respond until both systems have confirmed the data is written to the
>>> local
>>> disk).
>>>
>>> In your current case, even if md is correctly writing to both underlying
>>> 'devices' you have multiple layers under one of the devices, so you
>>> should
>>> confirm that *all* of those layers are properly passing through the data
>>> without any caching/etc.
>>>
>>> Regards,
>>> Adam
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-23  8:34           ` lingli tang
@ 2015-03-23 12:57             ` Adam Goryachev
  2015-03-24  2:09               ` lingli tang
  0 siblings, 1 reply; 12+ messages in thread
From: Adam Goryachev @ 2015-03-23 12:57 UTC (permalink / raw)
  To: lingli tang; +Cc: linux-raid


On 23/03/2015 19:34, lingli tang wrote:
> I have test multi times of:
> 1. mysql binlog write only on remote disk (without mdadm raid), there
> are not any mysql binlog lost.
> 2. mysql binlog write on RAID1 of only remote disk (no local disk),
> there are not any mysql binlog lost.
> mysql will return error immediately with error message "Error writing
> file '/home/mysql/data/mysqldata1/binlog/mysql-bin.000001' (Errcode: 5
> - Input/output error)" in the upper two case
>
> but when MySQL binlog run on RAID1 of local and remote disk, test
> program which continued commit to mysql will run for 3 second and
> hang in mysql_query() after reboot server. The error messge is also
> not the same with upper case: "Lost connection to MySQL server during
> query"
>
> Should it be iscsi exit before mdadm, So mysql continue to write
> binlog to a downgrade RAID1, which has only a local disk but  the
> remote disk was just delete from mdadm.
>
> I will try to test it.
> Thanks very much.
>
Silly question, which machine are you sending the shutdown command to?

If you are doing this one the remote disk machine, then obviously it may 
not have received all of the data yet, and therefore may have lost some 
data, even if it is a clean reboot.

Equally, as mentioned, if you shutdown the remote disk before MD shuts 
down (or shutdown the network prior to MD), then you have the same 
problem. You should check the MD status of each member disk to see if 
they think the other disk failed prior to MD being shutdown, and what is 
the event counter of each disk. You should see the local disk reporting 
the remote disk as failed, and the local disk should have a higher event 
count.

Regards,
Adam
> 2015-03-22 20:51 GMT+08:00 Adam Goryachev <mailinglists@websitemanagers.com.au>:
>>
>> On 22/03/2015 23:29, lingli tang wrote:
>>> Thanks very much.
>>> I will try DRBD later
>>> But I want to figure this out.
>>>
>>> I have export disk using tgtd and load disk on another server using
>>> iscsiadm with infiniband  of iser protocol.
>>> Does ISCSI/Iser have any cache on it.
>> Can you test that by removing the local disk from the MD array, or changing
>> your test so writes are directly to the remote device. Then run the test,
>> shutdown, and check the remote disk to see if it has all the expected data,
>> or still only some of the expected data. This will remove MD as a suspect.
>> Continue to try and get "closer" to the remote until you can find the
>> culprit. You might also use tcpdump or similar to sniff the network, which
>> will tell you if the expected data is being sent to the remote (and when).
>>
>> Sorry, I don't know anywhere near enough to comment on things like
>> infiniband/iser, but these are the steps I would look into. Hope that it is
>> helpful.
>>
>> PS, I do use DRBD, and iSCSI, and it has been working well in my environment
>> for the last year or so, I have no commercial interest/benefit from you
>> using it, just a happy customer.
>>
>> Regards,
>> Adam
>>>
>>> 2015-03-22 15:28 GMT+08:00 Adam Goryachev
>>> <mailinglists@websitemanagers.com.au>:
>>>>
>>>> On 22/03/2015 16:00, lingli tang wrote:
>>>>> Thanks for reply.
>>>>>
>>>>> I have create a raid1 with two fusion io PCIe flash disk:
>>>>> mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
>>>>> /dev/fioa2 /dev/mapper/mpathc
>>>>> /dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
>>>>> load disk export from server B.
>>>>>
>>>>> After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option
>>>>> on
>>>>> /data1
>>>>> and we will run mysql binlog on it.
>>>>> In order to avoid data loss  of mysql binlog we have set
>>>>> sync_binlog=1. so every sql commit will call fsync() to flush to disk.
>>>>>
>>>>> according to your description. if we reboot the server A, the two disk
>>>>> data on different server will be the same.
>>>>> but after the server A restarted, we assemble the two disk on two
>>>>> server, data is different on the two server, disk on server B lost
>>>>> more than one sql commit.
>>>>>
>>>>> I have checked it with strace 'mysqld' on Server A.
>>>>> I found a sql commit and fsync() on binlog file handle on server A but
>>>>> this sql can not find in assembled disk on server B.
>>>>>
>>>>> I also test it with two SAS disk, Server B still has more than one sql
>>>>> commit lost.
>>>> Sounds like you might be better using something like DRBD (www.drbd.org)
>>>> which has different modes, one of which will do what you are asking (not
>>>> respond until both systems have confirmed the data is written to the
>>>> local
>>>> disk).
>>>>
>>>> In your current case, even if md is correctly writing to both underlying
>>>> 'devices' you have multiple layers under one of the devices, so you
>>>> should
>>>> confirm that *all* of those layers are properly passing through the data
>>>> without any caching/etc.
>>>>
>>>> Regards,
>>>> Adam
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: is mdadm RAID1 disk full sync
  2015-03-23 12:57             ` Adam Goryachev
@ 2015-03-24  2:09               ` lingli tang
  0 siblings, 0 replies; 12+ messages in thread
From: lingli tang @ 2015-03-24  2:09 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: linux-raid

No, I did not shutdown the remote machine.
I just shutdown the machine of RAID1 disk with local disk and remote
disk, but not the remote machine.

And I think I have found the final reason after test with remove
iscsi/iscsid in chkconfig.

The final reason is :
when I issue command of 'reboot', linux will shutdown user process and
kernel module step by step. Because of iscsi register in chkconfig,
its session will be logout before mdadm shutdown. Therefore it is a
short time downgrade of mdadm(1-3 second), after that mdadm was
shutdown. During this time mysql will write binlog to a downgrade
mdadm which just contain the local disk (remote disk was kickout when
iscsi logout). So the remote disk lost 1-3 second binlog from the
'reboot' machine.

I have remove iscsi/iscsid from chkconfig with:
chkconfig --del iscsi
chkconfig --del iscsid
and found no data loss on the remote disk.

NeilBrown & Adam
Thanks very much for your help


2015-03-23 20:57 GMT+08:00 Adam Goryachev <mailinglists@websitemanagers.com.au>:
>
> On 23/03/2015 19:34, lingli tang wrote:
>>
>> I have test multi times of:
>> 1. mysql binlog write only on remote disk (without mdadm raid), there
>> are not any mysql binlog lost.
>> 2. mysql binlog write on RAID1 of only remote disk (no local disk),
>> there are not any mysql binlog lost.
>> mysql will return error immediately with error message "Error writing
>> file '/home/mysql/data/mysqldata1/binlog/mysql-bin.000001' (Errcode: 5
>> - Input/output error)" in the upper two case
>>
>> but when MySQL binlog run on RAID1 of local and remote disk, test
>> program which continued commit to mysql will run for 3 second and
>> hang in mysql_query() after reboot server. The error messge is also
>> not the same with upper case: "Lost connection to MySQL server during
>> query"
>>
>> Should it be iscsi exit before mdadm, So mysql continue to write
>> binlog to a downgrade RAID1, which has only a local disk but  the
>> remote disk was just delete from mdadm.
>>
>> I will try to test it.
>> Thanks very much.
>>
> Silly question, which machine are you sending the shutdown command to?
>
> If you are doing this one the remote disk machine, then obviously it may not
> have received all of the data yet, and therefore may have lost some data,
> even if it is a clean reboot.
>
> Equally, as mentioned, if you shutdown the remote disk before MD shuts down
> (or shutdown the network prior to MD), then you have the same problem. You
> should check the MD status of each member disk to see if they think the
> other disk failed prior to MD being shutdown, and what is the event counter
> of each disk. You should see the local disk reporting the remote disk as
> failed, and the local disk should have a higher event count.
>
> Regards,
> Adam
>
>> 2015-03-22 20:51 GMT+08:00 Adam Goryachev
>> <mailinglists@websitemanagers.com.au>:
>>>
>>>
>>> On 22/03/2015 23:29, lingli tang wrote:
>>>>
>>>> Thanks very much.
>>>> I will try DRBD later
>>>> But I want to figure this out.
>>>>
>>>> I have export disk using tgtd and load disk on another server using
>>>> iscsiadm with infiniband  of iser protocol.
>>>> Does ISCSI/Iser have any cache on it.
>>>
>>> Can you test that by removing the local disk from the MD array, or
>>> changing
>>> your test so writes are directly to the remote device. Then run the test,
>>> shutdown, and check the remote disk to see if it has all the expected
>>> data,
>>> or still only some of the expected data. This will remove MD as a
>>> suspect.
>>> Continue to try and get "closer" to the remote until you can find the
>>> culprit. You might also use tcpdump or similar to sniff the network,
>>> which
>>> will tell you if the expected data is being sent to the remote (and
>>> when).
>>>
>>> Sorry, I don't know anywhere near enough to comment on things like
>>> infiniband/iser, but these are the steps I would look into. Hope that it
>>> is
>>> helpful.
>>>
>>> PS, I do use DRBD, and iSCSI, and it has been working well in my
>>> environment
>>> for the last year or so, I have no commercial interest/benefit from you
>>> using it, just a happy customer.
>>>
>>> Regards,
>>> Adam
>>>>
>>>>
>>>> 2015-03-22 15:28 GMT+08:00 Adam Goryachev
>>>> <mailinglists@websitemanagers.com.au>:
>>>>>
>>>>>
>>>>> On 22/03/2015 16:00, lingli tang wrote:
>>>>>>
>>>>>> Thanks for reply.
>>>>>>
>>>>>> I have create a raid1 with two fusion io PCIe flash disk:
>>>>>> mdadm --create /dev/md/master --name=master --level=1 --raid-devices=2
>>>>>> /dev/fioa2 /dev/mapper/mpathc
>>>>>> /dev/fioa2 is local disk on server A and /dev/mapper/mpathc is a iscsi
>>>>>> load disk export from server B.
>>>>>>
>>>>>> After that we mkfs.ext4 on /dev/md/master and mount with 'sync' option
>>>>>> on
>>>>>> /data1
>>>>>> and we will run mysql binlog on it.
>>>>>> In order to avoid data loss  of mysql binlog we have set
>>>>>> sync_binlog=1. so every sql commit will call fsync() to flush to disk.
>>>>>>
>>>>>> according to your description. if we reboot the server A, the two disk
>>>>>> data on different server will be the same.
>>>>>> but after the server A restarted, we assemble the two disk on two
>>>>>> server, data is different on the two server, disk on server B lost
>>>>>> more than one sql commit.
>>>>>>
>>>>>> I have checked it with strace 'mysqld' on Server A.
>>>>>> I found a sql commit and fsync() on binlog file handle on server A but
>>>>>> this sql can not find in assembled disk on server B.
>>>>>>
>>>>>> I also test it with two SAS disk, Server B still has more than one sql
>>>>>> commit lost.
>>>>>
>>>>> Sounds like you might be better using something like DRBD
>>>>> (www.drbd.org)
>>>>> which has different modes, one of which will do what you are asking
>>>>> (not
>>>>> respond until both systems have confirmed the data is written to the
>>>>> local
>>>>> disk).
>>>>>
>>>>> In your current case, even if md is correctly writing to both
>>>>> underlying
>>>>> 'devices' you have multiple layers under one of the devices, so you
>>>>> should
>>>>> confirm that *all* of those layers are properly passing through the
>>>>> data
>>>>> without any caching/etc.
>>>>>
>>>>> Regards,
>>>>> Adam
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-03-24  2:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-21 11:01 is mdadm RAID1 disk full sync lingli tang
2015-03-22  3:20 ` NeilBrown
2015-03-22  5:00   ` lingli tang
2015-03-22  5:38     ` NeilBrown
2015-03-22 11:31       ` lingli tang
2015-03-23  2:52         ` NeilBrown
2015-03-22  7:28     ` Adam Goryachev
2015-03-22 12:29       ` lingli tang
2015-03-22 12:51         ` Adam Goryachev
2015-03-23  8:34           ` lingli tang
2015-03-23 12:57             ` Adam Goryachev
2015-03-24  2:09               ` lingli tang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.