All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: dm-integrity: integrity protection device-mapper target
@ 2013-01-17  4:54 Mikulas Patocka
  2013-01-18 21:43 ` Kasatkin, Dmitry
  0 siblings, 1 reply; 15+ messages in thread
From: Mikulas Patocka @ 2013-01-17  4:54 UTC (permalink / raw)
  To: Dmitry Kasatkin
  Cc: Alasdair G. Kergon, dm-devel, alan.cox, linux-fsdevel, akpm, Milan Broz

Hi Dmitry

I looked at dm-integrity. The major problem is that if crash happens when 
data were written and checksum wasn't, the block has incorrect checksum 
and can't be read again.

How is this integrity target going to be used? Will you use it in an 
environment where destroying data on crash doesn't matter? (can you 
describe such environment?)

It could possibly be used with ext3 or ext4 with data=journal mode - in 
this mode, the filesystem writes everything to journal and overwrites data 
and metadata with copy from journal on reboot, so it wouldn't matter if a 
block that was being written is unreadable after the reboot. But even with 
data=journal there are still some corner cases where metadata are 
overwritten without journaling (for example fsck or tune2fs utilities) - 
and if a crash happens, it could make metadata unreadable.

The big problem is that this "make blocks unreadable on crash" behavior 
cannot be easily fixed, fixing it means complete redesign.



Some minor comments about the code:

DM_INT_STATS: there are existing i/o counters in dm, you can use them

"static DEFINE_MUTEX(mutex);
static LIST_HEAD(dmi_list);
static int sync_mode;
static struct dm_int_notifier;" - you can have per-device reboot notifier 
and then you don't have to use global variables

"loff_t" - use sector_t instead. On 32-bit machines, sector_t can be 
32-bit or 64-bit (according to user's selection), but loff_t is always 
64-bit. loff_t should be generally used for indexing bytes within a file, 
sector_t for indexing sectors.

"struct dm_int->mutex" - unused variable
"struct dm_int->delay", DM_INT_FLUSH_DELAY - unused variable and macro

"struct kmem_cache *io_cache, mempool_t *io_pool" - use per-bio data 
instead (so you can remove the cache and mempool and simplify the code).
Per-bio data were committed to 3.8-rc1, see commits 
c0820cf5ad09522bdd9ff68e84841a09c9f339d8, 
39cf0ed27ec70626e416c2f4780ea0449d405941, 
e42c3f914da79102c54a7002329a086790c15327, 
42bc954f2a4525c9034667dedc9bd1c342208013

"struct dm_int->count" - this in-flight i/o count is not needed. Device 
mapper makes sure that when the device is suspended or unloaded, there are 
no bios in flight, so you don't have to duplicate this logic in the target 
driver. "struct dm_int->count" is used in dm_int_sync, dm_int_sync is 
called from ioctl BLKFLSBUF and dm_int_postsuspend.
- for BLKFLSBUF, it doesn't guarantee that in-progress io gets flushed, so 
you don't have to wait for it, doing just dm_bufio_write_dirty_buffers 
would be ok.
- for dm_int_postsuspend, dm guarantees that there is no io in progress at 
this point, so you don't have to wait for dm_int->count
- so you can remove "dm_int->count" and "dm_int->wait"

dm_bufio_prefetch can be called directly from dm_int_map routine (and 
maybe it should be called from there so that prefetch overlaps with 
workqueue processing)

dm_int_ctr at various places jumps to err label without setting ti->error. 
This results in message "table: 254:0: integrity: Unknown error".

It reports multiple messages when verification fails:
ERROR: HMACs do not match
ERROR: size is not zero: 4096
ERROR: io done: -5
And there is extra empty line after each error in the log (you shouldn't 
use '\n' in DMERR and DMWARN because there macros already append it).
I think just one message could be sufficient (maybe you can also add the 
block number of the failed block).

Mikulas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
  2013-01-17  4:54 dm-integrity: integrity protection device-mapper target Mikulas Patocka
@ 2013-01-18 21:43 ` Kasatkin, Dmitry
  2013-01-18 23:16   ` Alasdair G Kergon
  2013-01-23  1:29   ` Mikulas Patocka
  0 siblings, 2 replies; 15+ messages in thread
From: Kasatkin, Dmitry @ 2013-01-18 21:43 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alasdair G. Kergon, dm-devel, alan.cox, linux-fsdevel, akpm, Milan Broz

Hi Mikulas,

Thanks for looking into it.

On Thu, Jan 17, 2013 at 6:54 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> Hi Dmitry
>
> I looked at dm-integrity. The major problem is that if crash happens when
> data were written and checksum wasn't, the block has incorrect checksum
> and can't be read again.
>

This is how it works.
This is a purpose of integrity protection - do not allow "bad" content
to load and use.

But even with encryption it might happen that some blocks have been
updated and some not.
Even if  reading the blocks succeeds, the content can be a mess from
old and new blocks.

This patch I sent out has one missing feature what I have not pushed yet.
In the case of none-matching blocks, it just zeros blocks and returns
no error (zero-on-mismatch).
Writing to the block replaces the hmac.
It works quite nicely. mkfs and fsck is able to read and write/fix the
filesystem.


> How is this integrity target going to be used? Will you use it in an
> environment where destroying data on crash doesn't matter? (can you
> describe such environment?)
>

We are looking for possibility to use it in LSM based environment,
where we do not want
attacker could make offline modification of the filesystem and modify
the TCB related stuff.


> It could possibly be used with ext3 or ext4 with data=journal mode - in
> this mode, the filesystem writes everything to journal and overwrites data
> and metadata with copy from journal on reboot, so it wouldn't matter if a
> block that was being written is unreadable after the reboot. But even with
> data=journal there are still some corner cases where metadata are
> overwritten without journaling (for example fsck or tune2fs utilities) -
> and if a crash happens, it could make metadata unreadable.
>

In normal environment, if fsck crashes, it might corrupt file system
in the same way.
zero-on-mismatch makes block device still accessible/fixable for fsck.


> The big problem is that this "make blocks unreadable on crash" behavior
> cannot be easily fixed, fixing it means complete redesign.
>
>

zero-on-mismatch help here...

>
> Some minor comments about the code:
>
> DM_INT_STATS: there are existing i/o counters in dm, you can use them
>
> "static DEFINE_MUTEX(mutex);
> static LIST_HEAD(dmi_list);
> static int sync_mode;
> static struct dm_int_notifier;" - you can have per-device reboot notifier
> and then you don't have to use global variables
>

But why it would be better?

> "loff_t" - use sector_t instead. On 32-bit machines, sector_t can be
> 32-bit or 64-bit (according to user's selection), but loff_t is always
> 64-bit. loff_t should be generally used for indexing bytes within a file,
> sector_t for indexing sectors.

OK.

>
> "struct dm_int->mutex" - unused variable
> "struct dm_int->delay", DM_INT_FLUSH_DELAY - unused variable and macro
>

Left over when I switched from own metadata reading to BUFIO.

> "struct kmem_cache *io_cache, mempool_t *io_pool" - use per-bio data
> instead (so you can remove the cache and mempool and simplify the code).
> Per-bio data were committed to 3.8-rc1, see commits
> c0820cf5ad09522bdd9ff68e84841a09c9f339d8,
> 39cf0ed27ec70626e416c2f4780ea0449d405941,
> e42c3f914da79102c54a7002329a086790c15327,
> 42bc954f2a4525c9034667dedc9bd1c342208013
>

I have seen patch on dm-devel 2-3 or 3 months ago.
Did not know that it came to upstream yet.
Will do.

> "struct dm_int->count" - this in-flight i/o count is not needed. Device
> mapper makes sure that when the device is suspended or unloaded, there are
> no bios in flight, so you don't have to duplicate this logic in the target
> driver. "struct dm_int->count" is used in dm_int_sync, dm_int_sync is
> called from ioctl BLKFLSBUF and dm_int_postsuspend.
> - for BLKFLSBUF, it doesn't guarantee that in-progress io gets flushed, so
> you don't have to wait for it, doing just dm_bufio_write_dirty_buffers
> would be ok.
> - for dm_int_postsuspend, dm guarantees that there is no io in progress at
> this point, so you don't have to wait for dm_int->count
> - so you can remove "dm_int->count" and "dm_int->wait"
>

The purpose of dmint->count was to ensure that there is not only data blocks,
but also that there is no pending metadata writing is happening.
It was necessary before i switched to dm_bufio.
But yes, with a fresh look it seems it is not needed if
dm_bufio_write_dirty_buffer()
returns only after metadata is on the storage....

> dm_bufio_prefetch can be called directly from dm_int_map routine (and
> maybe it should be called from there so that prefetch overlaps with
> workqueue processing)
>

Will check.

> dm_int_ctr at various places jumps to err label without setting ti->error.
> This results in message "table: 254:0: integrity: Unknown error".
>

ok.

> It reports multiple messages when verification fails:
> ERROR: HMACs do not match
> ERROR: size is not zero: 4096
> ERROR: io done: -5
> And there is extra empty line after each error in the log (you shouldn't
> use '\n' in DMERR and DMWARN because there macros already append it).
> I think just one message could be sufficient (maybe you can also add the
> block number of the failed block).
>

right.

> Mikulas

Thanks for looking...
Will come back with fixes.

Dmitry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
  2013-01-18 21:43 ` Kasatkin, Dmitry
@ 2013-01-18 23:16   ` Alasdair G Kergon
  2013-01-18 23:58     ` Kasatkin, Dmitry
                       ` (2 more replies)
  2013-01-23  1:29   ` Mikulas Patocka
  1 sibling, 3 replies; 15+ messages in thread
From: Alasdair G Kergon @ 2013-01-18 23:16 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: Mikulas Patocka, Alasdair G. Kergon, dm-devel, alan.cox,
	linux-fsdevel, akpm, Milan Broz

On Fri, Jan 18, 2013 at 11:43:34PM +0200, Kasatkin, Dmitry wrote:
> This patch I sent out has one missing feature what I have not pushed yet.
> In the case of non-matching blocks, it just zeros blocks and returns
> no error (zero-on-mismatch).
> Writing to the block replaces the hmac.
> It works quite nicely. mkfs and fsck is able to read and write/fix the
> filesystem.
> In normal environment, if fsck crashes, it might corrupt file system
> in the same way.
> zero-on-mismatch makes block device still accessible/fixable for fsck.
 
I'm afraid I don't buy that.

We can hardly call this "integrity" if it's designed to lose some of
your data when the machine crashes - and worse - it doesn't tell you
what you lost, but just gives you blocks of zeroes instead!

I think a redesign is needed before this goes upstream.

Alasdair


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
  2013-01-18 23:16   ` Alasdair G Kergon
@ 2013-01-18 23:58     ` Kasatkin, Dmitry
  2013-01-21 13:51         ` Alan Cox
  2013-01-21 10:37     ` Kasatkin, Dmitry
  2013-01-21 10:38     ` Kasatkin, Dmitry
  2 siblings, 1 reply; 15+ messages in thread
From: Kasatkin, Dmitry @ 2013-01-18 23:58 UTC (permalink / raw)
  To: Alasdair G Kergon
  Cc: alan.cox, dm-devel, Mikulas Patocka, linux-fsdevel, akpm, Milan Broz


[-- Attachment #1.1: Type: text/plain, Size: 1393 bytes --]

On Jan 19, 2013 1:16 AM, "Alasdair G Kergon" <agk@redhat.com> wrote:
>
> On Fri, Jan 18, 2013 at 11:43:34PM +0200, Kasatkin, Dmitry wrote:
> > This patch I sent out has one missing feature what I have not pushed
yet.
> > In the case of non-matching blocks, it just zeros blocks and returns
> > no error (zero-on-mismatch).
> > Writing to the block replaces the hmac.
> > It works quite nicely. mkfs and fsck is able to read and write/fix the
> > filesystem.
> > In normal environment, if fsck crashes, it might corrupt file system
> > in the same way.
> > zero-on-mismatch makes block device still accessible/fixable for fsck.
>
> I'm afraid I don't buy that.
>
> We can hardly call this "integrity" if it's designed to lose some of
> your data when the machine crashes - and worse - it doesn't tell you
> what you lost, but just gives you blocks of zeroes instead!
>
> I think a redesign is needed before this goes upstream.
>
> Alasdair
>

Do not look to it as integrity from reliability point of view. This might
be wrong name for the target.
The purpose is to provide integrity from security point of view. So
modified blocks are not available. To prevent attacker to put arbitrary
content.
This is not to provide reliability.
Default is to return error. But zero might be desirable. It works as needed
for the purpose and data=journal works for this if reliability is required.

- Dmitry

[-- Attachment #1.2: Type: text/html, Size: 1735 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
  2013-01-18 23:16   ` Alasdair G Kergon
  2013-01-18 23:58     ` Kasatkin, Dmitry
@ 2013-01-21 10:37     ` Kasatkin, Dmitry
  2013-01-21 10:38     ` Kasatkin, Dmitry
  2 siblings, 0 replies; 15+ messages in thread
From: Kasatkin, Dmitry @ 2013-01-21 10:37 UTC (permalink / raw)
  To: Kasatkin, Dmitry, Mikulas Patocka, Alasdair G. Kergon, dm-devel,
	alan.cox, linux-fsdevel, akpm, Milan Broz

On Sat, Jan 19, 2013 at 1:16 AM, Alasdair G Kergon <agk@redhat.com> wrote:
> On Fri, Jan 18, 2013 at 11:43:34PM +0200, Kasatkin, Dmitry wrote:
>> This patch I sent out has one missing feature what I have not pushed yet.
>> In the case of non-matching blocks, it just zeros blocks and returns
>> no error (zero-on-mismatch).
>> Writing to the block replaces the hmac.
>> It works quite nicely. mkfs and fsck is able to read and write/fix the
>> filesystem.
>> In normal environment, if fsck crashes, it might corrupt file system
>> in the same way.
>> zero-on-mismatch makes block device still accessible/fixable for fsck.
>
> I'm afraid I don't buy that.
>
> We can hardly call this "integrity" if it's designed to lose some of
> your data when the machine crashes - and worse - it doesn't tell you
> what you lost, but just gives you blocks of zeroes instead!
>
> I think a redesign is needed before this goes upstream.
>
> Alasdair
>

Sorry, for repost, but it was from mobile in HTML format and it did
not go through...

Do not look to it as integrity from reliability point of view. This
might be a wrong name for the target.
The purpose is to provide integrity from security point of view, so
modified blocks are not available.
It is to prevent attacker to put arbitrary content. This is not in
fact to provide reliability.
Default is to return an error. But returning zeroed data might be a
better option.
It works as needed for the purpose and data=journal works for this if
reliability is required.

- Dmitry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
  2013-01-18 23:16   ` Alasdair G Kergon
  2013-01-18 23:58     ` Kasatkin, Dmitry
  2013-01-21 10:37     ` Kasatkin, Dmitry
@ 2013-01-21 10:38     ` Kasatkin, Dmitry
  2 siblings, 0 replies; 15+ messages in thread
From: Kasatkin, Dmitry @ 2013-01-21 10:38 UTC (permalink / raw)
  To: Kasatkin, Dmitry, Mikulas Patocka, Alasdair G. Kergon, dm-devel,
	alan.cox, linux-fsdevel, akpm, Milan Broz

On Sat, Jan 19, 2013 at 1:16 AM, Alasdair G Kergon <agk@redhat.com> wrote:
> On Fri, Jan 18, 2013 at 11:43:34PM +0200, Kasatkin, Dmitry wrote:
>> This patch I sent out has one missing feature what I have not pushed yet.
>> In the case of non-matching blocks, it just zeros blocks and returns
>> no error (zero-on-mismatch).
>> Writing to the block replaces the hmac.
>> It works quite nicely. mkfs and fsck is able to read and write/fix the
>> filesystem.
>> In normal environment, if fsck crashes, it might corrupt file system
>> in the same way.
>> zero-on-mismatch makes block device still accessible/fixable for fsck.
>
> I'm afraid I don't buy that.
>
> We can hardly call this "integrity" if it's designed to lose some of
> your data when the machine crashes - and worse - it doesn't tell you
> what you lost, but just gives you blocks of zeroes instead!
>
> I think a redesign is needed before this goes upstream.
>
> Alasdair
>

Sorry, for re-post, but it was from mobile in HTML format and it did
not go through...

Do not look to it as integrity from reliability point of view. This
might be a wrong name for the target.
The purpose is to provide integrity from security point of view, so
modified blocks are not available.
It is to prevent attacker to put arbitrary content. This is not in
fact to provide reliability.
Default is to return an error. But returning zeroed data might be a
better option.
It works as needed for the purpose and data=journal works for this if
reliability is required.

- Dmitry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
  2013-01-18 23:58     ` Kasatkin, Dmitry
@ 2013-01-21 13:51         ` Alan Cox
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Cox @ 2013-01-21 13:51 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: Alasdair G Kergon, dm-devel, akpm, linux-fsdevel,
	Mikulas Patocka, Milan Broz

> This is not to provide reliability.
> Default is to return error. But zero might be desirable. 

There are cases where being able to force blocks to all zeros is itself
a security attack and it seems an odd behaviour. Error on read/allow
write seems sensible enough.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
@ 2013-01-21 13:51         ` Alan Cox
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Cox @ 2013-01-21 13:51 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: Alasdair G Kergon, dm-devel, akpm, linux-fsdevel,
	Mikulas Patocka, Milan Broz

> This is not to provide reliability.
> Default is to return error. But zero might be desirable. 

There are cases where being able to force blocks to all zeros is itself
a security attack and it seems an odd behaviour. Error on read/allow
write seems sensible enough.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
  2013-01-18 21:43 ` Kasatkin, Dmitry
  2013-01-18 23:16   ` Alasdair G Kergon
@ 2013-01-23  1:29   ` Mikulas Patocka
  2013-01-23  6:09     ` [dm-devel] " Will Drewry
                       ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Mikulas Patocka @ 2013-01-23  1:29 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: Alasdair G. Kergon, dm-devel, alan.cox, linux-fsdevel, akpm, Milan Broz



On Fri, 18 Jan 2013, Kasatkin, Dmitry wrote:

> Hi Mikulas,
> 
> Thanks for looking into it.
> 
> On Thu, Jan 17, 2013 at 6:54 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > Hi Dmitry
> >
> > I looked at dm-integrity. The major problem is that if crash happens when
> > data were written and checksum wasn't, the block has incorrect checksum
> > and can't be read again.
> >
> 
> This is how it works.
> This is a purpose of integrity protection - do not allow "bad" content
> to load and use.
> 
> But even with encryption it might happen that some blocks have been
> updated and some not.
> Even if  reading the blocks succeeds, the content can be a mess from
> old and new blocks.

dm-crypt encrypts each 512-byte sector individually, so (assuming that 
there is no disk with sector size <512 bytes), it can't result in random 
data. You read either new data or old data.

> This patch I sent out has one missing feature what I have not pushed yet.
> In the case of none-matching blocks, it just zeros blocks and returns
> no error (zero-on-mismatch).
> Writing to the block replaces the hmac.
> It works quite nicely. mkfs and fsck is able to read and write/fix the
> filesystem.

But it causes silent data corruption for the user. So it's worse than 
returning an error.

> > How is this integrity target going to be used? Will you use it in an
> > environment where destroying data on crash doesn't matter? (can you
> > describe such environment?)
> >
> 
> We are looking for possibility to use it in LSM based environment,
> where we do not want
> attacker could make offline modification of the filesystem and modify
> the TCB related stuff.

What are the exact attach attack possibilities you are protecting against?

Can the attacker observe or modify the data while system is running? (for 
example the data is accessed remotely over an unsecured network 
connection?) Or is it only protecting against modifications when the 
system is down?

Can the attacker modify the partition with hashes? - or do you store it in 
another place that is supposed to be secure?

What are you going to do if you get failed checksum because of a crash?

> > It could possibly be used with ext3 or ext4 with data=journal mode - in
> > this mode, the filesystem writes everything to journal and overwrites data
> > and metadata with copy from journal on reboot, so it wouldn't matter if a
> > block that was being written is unreadable after the reboot. But even with
> > data=journal there are still some corner cases where metadata are
> > overwritten without journaling (for example fsck or tune2fs utilities) -
> > and if a crash happens, it could make metadata unreadable.
> >
> 
> In normal environment, if fsck crashes, it might corrupt file system
> in the same way.
> zero-on-mismatch makes block device still accessible/fixable for fsck.

The problem is that it apmplifies filesystem damage. For example, suppose 
that fsck is modifying an inode. You get a crash and on next reboot not 
just one inode, but the whole block of inodes is unreadable (or replaced 
with zeros). Fsck "fixes" it, but the user loses more files.


I am thinking about possibly rewriting it so that it has two hashes per 
sector so that if either old or new data is read, at least one hash 
matches and it won't result in data corruption.

Mikulas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] dm-integrity: integrity protection device-mapper target
  2013-01-23  1:29   ` Mikulas Patocka
@ 2013-01-23  6:09     ` Will Drewry
  2013-01-23 10:20       ` Kasatkin, Dmitry
  2013-01-23  9:15     ` Spelic
  2013-01-23 10:19       ` Kasatkin, Dmitry
  2 siblings, 1 reply; 15+ messages in thread
From: Will Drewry @ 2013-01-23  6:09 UTC (permalink / raw)
  To: device-mapper development, Kasatkin, Dmitry
  Cc: alan.cox, linux-fsdevel, akpm, Alasdair G. Kergon, Milan Broz

On Tue, Jan 22, 2013 at 5:29 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Fri, 18 Jan 2013, Kasatkin, Dmitry wrote:
>
>> Hi Mikulas,
>>
>> Thanks for looking into it.
>>
>> On Thu, Jan 17, 2013 at 6:54 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> > Hi Dmitry
>> >
>> > I looked at dm-integrity. The major problem is that if crash happens when
>> > data were written and checksum wasn't, the block has incorrect checksum
>> > and can't be read again.
>> >
>>
>> This is how it works.
>> This is a purpose of integrity protection - do not allow "bad" content
>> to load and use.

With respect to the use of "integrity", you may want to consider
something like dm-integrity-hmac to disambiguate from the BIO
integrity naming.  It's why I proposed the somewhat obtuse "verity"
name for the other data-integrity target.

>> But even with encryption it might happen that some blocks have been
>> updated and some not.
>> Even if  reading the blocks succeeds, the content can be a mess from
>> old and new blocks.
>
> dm-crypt encrypts each 512-byte sector individually, so (assuming that
> there is no disk with sector size <512 bytes), it can't result in random
> data. You read either new data or old data.
>
>> This patch I sent out has one missing feature what I have not pushed yet.
>> In the case of none-matching blocks, it just zeros blocks and returns
>> no error (zero-on-mismatch).
>> Writing to the block replaces the hmac.
>> It works quite nicely. mkfs and fsck is able to read and write/fix the
>> filesystem.
>
> But it causes silent data corruption for the user. So it's worse than
> returning an error.
>
>> > How is this integrity target going to be used? Will you use it in an
>> > environment where destroying data on crash doesn't matter? (can you
>> > describe such environment?)
>> >
>>
>> We are looking for possibility to use it in LSM based environment,
>> where we do not want
>> attacker could make offline modification of the filesystem and modify
>> the TCB related stuff.
>
> What are the exact attach attack possibilities you are protecting against?
>
> Can the attacker observe or modify the data while system is running? (for
> example the data is accessed remotely over an unsecured network
> connection?) Or is it only protecting against modifications when the
> system is down?
>
> Can the attacker modify the partition with hashes? - or do you store it in
> another place that is supposed to be secure?

Given that HMACs are being used to authenticate blocks, I'd assume,
until corrected, that the HMACs aren't required to be on secure
storage.  To that end, it seems like there is a distinct risk that an
attacker could use old data blocks and old HMACs to construct an
"authentic" dm-integrity target that doesn't match anything the
user/TPM ever saw in aggregate before.  Perhaps I missed something
when I skimmed the code, but it doesn't seem trivial to version the
data or bind them to a large enough group of adjacent blocks without
paying more computational costs (like using a Merkle tree with an
HMAC'd root node). Technically, all the blocks would still be
authentic, but the ordering in time and space wouldn't be. I'd love to
know what ideas you have for that, or if that sort of attack is out of
scope?  For ordering in space, inclusion of the sector index in the
HMAC might help.

thanks!
will

> What are you going to do if you get failed checksum because of a crash?
>
>> > It could possibly be used with ext3 or ext4 with data=journal mode - in
>> > this mode, the filesystem writes everything to journal and overwrites data
>> > and metadata with copy from journal on reboot, so it wouldn't matter if a
>> > block that was being written is unreadable after the reboot. But even with
>> > data=journal there are still some corner cases where metadata are
>> > overwritten without journaling (for example fsck or tune2fs utilities) -
>> > and if a crash happens, it could make metadata unreadable.
>> >
>>
>> In normal environment, if fsck crashes, it might corrupt file system
>> in the same way.
>> zero-on-mismatch makes block device still accessible/fixable for fsck.
>
> The problem is that it apmplifies filesystem damage. For example, suppose
> that fsck is modifying an inode. You get a crash and on next reboot not
> just one inode, but the whole block of inodes is unreadable (or replaced
> with zeros). Fsck "fixes" it, but the user loses more files.
>
>
> I am thinking about possibly rewriting it so that it has two hashes per
> sector so that if either old or new data is read, at least one hash
> matches and it won't result in data corruption.
>
> Mikulas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
  2013-01-23  1:29   ` Mikulas Patocka
  2013-01-23  6:09     ` [dm-devel] " Will Drewry
@ 2013-01-23  9:15     ` Spelic
  2013-01-23 10:19       ` Kasatkin, Dmitry
  2 siblings, 0 replies; 15+ messages in thread
From: Spelic @ 2013-01-23  9:15 UTC (permalink / raw)
  To: device-mapper development

On 01/23/13 02:29, Mikulas Patocka wrote:
> I am thinking about possibly rewriting it so that it has two hashes 
> per sector so that if either old or new data is read, at least one 
> hash matches and it won't result in data corruption. Mikulas 

Seems like a great idea, but have you thought that for it to work the 
new hash block has to be written before the new data block?

A naiive implementation would send a stream of flush requests and 
completely screw up the disk write cache...

It would be great if Linux could use ordered SCSI commands (that would 
be a feature of the disk but only of SCSI ones AFAIK, correct me if I am 
wrong) or had threaded I/O commands (that could be a feature of the 
elevator methinks) so implement dependencies among write commands and 
not completely screw up the disk cache like above. Is there any plan to 
do that?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
  2013-01-23  1:29   ` Mikulas Patocka
@ 2013-01-23 10:19       ` Kasatkin, Dmitry
  2013-01-23  9:15     ` Spelic
  2013-01-23 10:19       ` Kasatkin, Dmitry
  2 siblings, 0 replies; 15+ messages in thread
From: Kasatkin, Dmitry @ 2013-01-23 10:19 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alasdair G. Kergon, dm-devel, alan.cox, linux-fsdevel, akpm, Milan Broz

Hi,

On Wed, Jan 23, 2013 at 3:29 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Fri, 18 Jan 2013, Kasatkin, Dmitry wrote:
>
>> Hi Mikulas,
>>
>> Thanks for looking into it.
>>
>> On Thu, Jan 17, 2013 at 6:54 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> > Hi Dmitry
>> >
>> > I looked at dm-integrity. The major problem is that if crash happens when
>> > data were written and checksum wasn't, the block has incorrect checksum
>> > and can't be read again.
>> >
>>
>> This is how it works.
>> This is a purpose of integrity protection - do not allow "bad" content
>> to load and use.
>>
>> But even with encryption it might happen that some blocks have been
>> updated and some not.
>> Even if  reading the blocks succeeds, the content can be a mess from
>> old and new blocks.
>
> dm-crypt encrypts each 512-byte sector individually, so (assuming that
> there is no disk with sector size <512 bytes), it can't result in random
> data. You read either new data or old data.

I have not expressed correctly what I wanted to say.
Basically a file might consists of several sectors where part of them
will have an
old content and part of them will have a new content.
That "combination" content is a garbage...

>
>> This patch I sent out has one missing feature what I have not pushed yet.
>> In the case of none-matching blocks, it just zeros blocks and returns
>> no error (zero-on-mismatch).
>> Writing to the block replaces the hmac.
>> It works quite nicely. mkfs and fsck is able to read and write/fix the
>> filesystem.
>
> But it causes silent data corruption for the user. So it's worse than
> returning an error.

Agree. Error is good as it was.

>
>> > How is this integrity target going to be used? Will you use it in an
>> > environment where destroying data on crash doesn't matter? (can you
>> > describe such environment?)
>> >
>>
>> We are looking for possibility to use it in LSM based environment,
>> where we do not want
>> attacker could make offline modification of the filesystem and modify
>> the TCB related stuff.
>
> What are the exact attach attack possibilities you are protecting against?
>

That is to protect against offline attacks only - when the system is down.
LSM supposed to protect when system is running.

> Can the attacker observe or modify the data while system is running? (for
> example the data is accessed remotely over an unsecured network
> connection?) Or is it only protecting against modifications when the
> system is down?
>

Right.

> Can the attacker modify the partition with hashes? - or do you store it in
> another place that is supposed to be secure?
>

As also Will said in the next email..
That is not hashes, but HMACs. They are protected by the key and do
not require secure storage.

> What are you going to do if you get failed checksum because of a crash?
>

Integrity verification failed - return an error.
No reason to run modified /sbin/init.
I understand about unusable system, but this is to prevent running
compromised system.

>> > It could possibly be used with ext3 or ext4 with data=journal mode - in
>> > this mode, the filesystem writes everything to journal and overwrites data
>> > and metadata with copy from journal on reboot, so it wouldn't matter if a
>> > block that was being written is unreadable after the reboot. But even with
>> > data=journal there are still some corner cases where metadata are
>> > overwritten without journaling (for example fsck or tune2fs utilities) -
>> > and if a crash happens, it could make metadata unreadable.
>> >
>>
>> In normal environment, if fsck crashes, it might corrupt file system
>> in the same way.
>> zero-on-mismatch makes block device still accessible/fixable for fsck.
>
> The problem is that it apmplifies filesystem damage. For example, suppose
> that fsck is modifying an inode. You get a crash and on next reboot not
> just one inode, but the whole block of inodes is unreadable (or replaced
> with zeros). Fsck "fixes" it, but the user loses more files.
>

>From security perspective it is "unsafe" to run fsck.
System has been compromised and fixing it by fsck might result in
unexpected behavior.

>
> I am thinking about possibly rewriting it so that it has two hashes per
> sector so that if either old or new data is read, at least one hash
> matches and it won't result in data corruption.
>

This is an improving step. There are different performance issues
about such approach.
I was working on such solution before dm_bufio appeared.
In own integrity data management it was so that I could know what
block of primary integrity data
is written and could issue write for secondary integrity data.
Integrity data blocks writing/updating were handled independently of each other.
It was easily possible to get "mirror" block and issue a write request.
With bufio API I do not really see how target could be notified that
buffer has been written.
May be complete callback on like bio_end_io would be needed.
You might advise on how to achieve it.
But there are performance issues with that and it might be the next step.

Thanks,

Dmitry

> Mikulas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm-integrity: integrity protection device-mapper target
@ 2013-01-23 10:19       ` Kasatkin, Dmitry
  0 siblings, 0 replies; 15+ messages in thread
From: Kasatkin, Dmitry @ 2013-01-23 10:19 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alasdair G. Kergon, dm-devel, alan.cox, linux-fsdevel, akpm, Milan Broz

Hi,

On Wed, Jan 23, 2013 at 3:29 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Fri, 18 Jan 2013, Kasatkin, Dmitry wrote:
>
>> Hi Mikulas,
>>
>> Thanks for looking into it.
>>
>> On Thu, Jan 17, 2013 at 6:54 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> > Hi Dmitry
>> >
>> > I looked at dm-integrity. The major problem is that if crash happens when
>> > data were written and checksum wasn't, the block has incorrect checksum
>> > and can't be read again.
>> >
>>
>> This is how it works.
>> This is a purpose of integrity protection - do not allow "bad" content
>> to load and use.
>>
>> But even with encryption it might happen that some blocks have been
>> updated and some not.
>> Even if  reading the blocks succeeds, the content can be a mess from
>> old and new blocks.
>
> dm-crypt encrypts each 512-byte sector individually, so (assuming that
> there is no disk with sector size <512 bytes), it can't result in random
> data. You read either new data or old data.

I have not expressed correctly what I wanted to say.
Basically a file might consists of several sectors where part of them
will have an
old content and part of them will have a new content.
That "combination" content is a garbage...

>
>> This patch I sent out has one missing feature what I have not pushed yet.
>> In the case of none-matching blocks, it just zeros blocks and returns
>> no error (zero-on-mismatch).
>> Writing to the block replaces the hmac.
>> It works quite nicely. mkfs and fsck is able to read and write/fix the
>> filesystem.
>
> But it causes silent data corruption for the user. So it's worse than
> returning an error.

Agree. Error is good as it was.

>
>> > How is this integrity target going to be used? Will you use it in an
>> > environment where destroying data on crash doesn't matter? (can you
>> > describe such environment?)
>> >
>>
>> We are looking for possibility to use it in LSM based environment,
>> where we do not want
>> attacker could make offline modification of the filesystem and modify
>> the TCB related stuff.
>
> What are the exact attach attack possibilities you are protecting against?
>

That is to protect against offline attacks only - when the system is down.
LSM supposed to protect when system is running.

> Can the attacker observe or modify the data while system is running? (for
> example the data is accessed remotely over an unsecured network
> connection?) Or is it only protecting against modifications when the
> system is down?
>

Right.

> Can the attacker modify the partition with hashes? - or do you store it in
> another place that is supposed to be secure?
>

As also Will said in the next email..
That is not hashes, but HMACs. They are protected by the key and do
not require secure storage.

> What are you going to do if you get failed checksum because of a crash?
>

Integrity verification failed - return an error.
No reason to run modified /sbin/init.
I understand about unusable system, but this is to prevent running
compromised system.

>> > It could possibly be used with ext3 or ext4 with data=journal mode - in
>> > this mode, the filesystem writes everything to journal and overwrites data
>> > and metadata with copy from journal on reboot, so it wouldn't matter if a
>> > block that was being written is unreadable after the reboot. But even with
>> > data=journal there are still some corner cases where metadata are
>> > overwritten without journaling (for example fsck or tune2fs utilities) -
>> > and if a crash happens, it could make metadata unreadable.
>> >
>>
>> In normal environment, if fsck crashes, it might corrupt file system
>> in the same way.
>> zero-on-mismatch makes block device still accessible/fixable for fsck.
>
> The problem is that it apmplifies filesystem damage. For example, suppose
> that fsck is modifying an inode. You get a crash and on next reboot not
> just one inode, but the whole block of inodes is unreadable (or replaced
> with zeros). Fsck "fixes" it, but the user loses more files.
>

From security perspective it is "unsafe" to run fsck.
System has been compromised and fixing it by fsck might result in
unexpected behavior.

>
> I am thinking about possibly rewriting it so that it has two hashes per
> sector so that if either old or new data is read, at least one hash
> matches and it won't result in data corruption.
>

This is an improving step. There are different performance issues
about such approach.
I was working on such solution before dm_bufio appeared.
In own integrity data management it was so that I could know what
block of primary integrity data
is written and could issue write for secondary integrity data.
Integrity data blocks writing/updating were handled independently of each other.
It was easily possible to get "mirror" block and issue a write request.
With bufio API I do not really see how target could be notified that
buffer has been written.
May be complete callback on like bio_end_io would be needed.
You might advise on how to achieve it.
But there are performance issues with that and it might be the next step.

Thanks,

Dmitry

> Mikulas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] dm-integrity: integrity protection device-mapper target
  2013-01-23  6:09     ` [dm-devel] " Will Drewry
@ 2013-01-23 10:20       ` Kasatkin, Dmitry
  2013-01-28  1:43         ` Will Drewry
  0 siblings, 1 reply; 15+ messages in thread
From: Kasatkin, Dmitry @ 2013-01-23 10:20 UTC (permalink / raw)
  To: Will Drewry
  Cc: device-mapper development, alan.cox, linux-fsdevel, akpm,
	Alasdair G. Kergon, Milan Broz

On Wed, Jan 23, 2013 at 8:09 AM, Will Drewry <redpig@dataspill.org> wrote:
> On Tue, Jan 22, 2013 at 5:29 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>>
>>
>> On Fri, 18 Jan 2013, Kasatkin, Dmitry wrote:
>>
>>> Hi Mikulas,
>>>
>>> Thanks for looking into it.
>>>
>>> On Thu, Jan 17, 2013 at 6:54 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>>> > Hi Dmitry
>>> >
>>> > I looked at dm-integrity. The major problem is that if crash happens when
>>> > data were written and checksum wasn't, the block has incorrect checksum
>>> > and can't be read again.
>>> >
>>>
>>> This is how it works.
>>> This is a purpose of integrity protection - do not allow "bad" content
>>> to load and use.
>
> With respect to the use of "integrity", you may want to consider
> something like dm-integrity-hmac to disambiguate from the BIO
> integrity naming.  It's why I proposed the somewhat obtuse "verity"
> name for the other data-integrity target.
>
>>> But even with encryption it might happen that some blocks have been
>>> updated and some not.
>>> Even if  reading the blocks succeeds, the content can be a mess from
>>> old and new blocks.
>>
>> dm-crypt encrypts each 512-byte sector individually, so (assuming that
>> there is no disk with sector size <512 bytes), it can't result in random
>> data. You read either new data or old data.
>>
>>> This patch I sent out has one missing feature what I have not pushed yet.
>>> In the case of none-matching blocks, it just zeros blocks and returns
>>> no error (zero-on-mismatch).
>>> Writing to the block replaces the hmac.
>>> It works quite nicely. mkfs and fsck is able to read and write/fix the
>>> filesystem.
>>
>> But it causes silent data corruption for the user. So it's worse than
>> returning an error.
>>
>>> > How is this integrity target going to be used? Will you use it in an
>>> > environment where destroying data on crash doesn't matter? (can you
>>> > describe such environment?)
>>> >
>>>
>>> We are looking for possibility to use it in LSM based environment,
>>> where we do not want
>>> attacker could make offline modification of the filesystem and modify
>>> the TCB related stuff.
>>
>> What are the exact attach attack possibilities you are protecting against?
>>
>> Can the attacker observe or modify the data while system is running? (for
>> example the data is accessed remotely over an unsecured network
>> connection?) Or is it only protecting against modifications when the
>> system is down?
>>
>> Can the attacker modify the partition with hashes? - or do you store it in
>> another place that is supposed to be secure?
>
> Given that HMACs are being used to authenticate blocks, I'd assume,
> until corrected, that the HMACs aren't required to be on secure
> storage.  To that end, it seems like there is a distinct risk that an
> attacker could use old data blocks and old HMACs to construct an
> "authentic" dm-integrity target that doesn't match anything the
> user/TPM ever saw in aggregate before.  Perhaps I missed something
> when I skimmed the code, but it doesn't seem trivial to version the
> data or bind them to a large enough group of adjacent blocks without
> paying more computational costs (like using a Merkle tree with an
> HMAC'd root node). Technically, all the blocks would still be
> authentic, but the ordering in time and space wouldn't be. I'd love to
> know what ideas you have for that, or if that sort of attack is out of
> scope?  For ordering in space, inclusion of the sector index in the
> HMAC might help.
>

Hello,

Yes. You are right. All is about computational and IO costs.
"In time" is really hard to manage. The key is the same and there is a
possibility to
replace the blocks with older block.

But this is a case with encryption as well. right?

"in space" - it is easier. As you said sector index might be used like
with encryption.
Please have a look to dm_int_calc_hmac(). It uses already offset in
calculations..

	err = crypto_shash_init(&desc.shash);
	if (!err)
		err = crypto_shash_update(&desc.shash, digest, size);
	if (!err)
		err = crypto_shash_finup(&desc.shash, (u8 *)&offset,
					  sizeof(offset), hmac);

Thanks,
Dmitry


> thanks!
> will
>
>> What are you going to do if you get failed checksum because of a crash?
>>
>>> > It could possibly be used with ext3 or ext4 with data=journal mode - in
>>> > this mode, the filesystem writes everything to journal and overwrites data
>>> > and metadata with copy from journal on reboot, so it wouldn't matter if a
>>> > block that was being written is unreadable after the reboot. But even with
>>> > data=journal there are still some corner cases where metadata are
>>> > overwritten without journaling (for example fsck or tune2fs utilities) -
>>> > and if a crash happens, it could make metadata unreadable.
>>> >
>>>
>>> In normal environment, if fsck crashes, it might corrupt file system
>>> in the same way.
>>> zero-on-mismatch makes block device still accessible/fixable for fsck.
>>
>> The problem is that it apmplifies filesystem damage. For example, suppose
>> that fsck is modifying an inode. You get a crash and on next reboot not
>> just one inode, but the whole block of inodes is unreadable (or replaced
>> with zeros). Fsck "fixes" it, but the user loses more files.
>>
>>
>> I am thinking about possibly rewriting it so that it has two hashes per
>> sector so that if either old or new data is read, at least one hash
>> matches and it won't result in data corruption.
>>
>> Mikulas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] dm-integrity: integrity protection device-mapper target
  2013-01-23 10:20       ` Kasatkin, Dmitry
@ 2013-01-28  1:43         ` Will Drewry
  0 siblings, 0 replies; 15+ messages in thread
From: Will Drewry @ 2013-01-28  1:43 UTC (permalink / raw)
  To: Kasatkin, Dmitry
  Cc: device-mapper development, alan.cox, linux-fsdevel, akpm,
	Alasdair G. Kergon, Milan Broz

On Wed, Jan 23, 2013 at 4:20 AM, Kasatkin, Dmitry
<dmitry.kasatkin@intel.com> wrote:
> On Wed, Jan 23, 2013 at 8:09 AM, Will Drewry <redpig@dataspill.org> wrote:
>> On Tue, Jan 22, 2013 at 5:29 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>>> On Fri, 18 Jan 2013, Kasatkin, Dmitry wrote:
>>>> Hi Mikulas,
>>>>
>>>> Thanks for looking into it.
>>>>
>>>> On Thu, Jan 17, 2013 at 6:54 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
[snip]
>>> Can the attacker modify the partition with hashes? - or do you store it in
>>> another place that is supposed to be secure?
>>
>> Given that HMACs are being used to authenticate blocks, I'd assume,
>> until corrected, that the HMACs aren't required to be on secure
>> storage.  To that end, it seems like there is a distinct risk that an
>> attacker could use old data blocks and old HMACs to construct an
>> "authentic" dm-integrity target that doesn't match anything the
>> user/TPM ever saw in aggregate before.  Perhaps I missed something
>> when I skimmed the code, but it doesn't seem trivial to version the
>> data or bind them to a large enough group of adjacent blocks without
>> paying more computational costs (like using a Merkle tree with an
>> HMAC'd root node). Technically, all the blocks would still be
>> authentic, but the ordering in time and space wouldn't be. I'd love to
>> know what ideas you have for that, or if that sort of attack is out of
>> scope?  For ordering in space, inclusion of the sector index in the
>> HMAC might help.
>>
>
> Hello,
>
> Yes. You are right. All is about computational and IO costs.
> "In time" is really hard to manage. The key is the same and there is a
> possibility to
> replace the blocks with older block.
>
> But this is a case with encryption as well. right?

I'm not a crypto-expert, but it all depends on if blocks are
interrelated in any way (and thus the algorithm and configuration).
Of course, encryption and integrity are often separate for a reason :)

The 'in time' problem is tricky, but I guess this is a good reason to
extend dm-verity to also have a kernel-supported write mode.  It could
be used as an alternative to the hmac'd block approach (with a
TPM/whatever-signed root hash) to achieve both in-time and in-space
resilience.  Of course, the user would need to decide on the cost then
-- more I/O+CPU versus their risk tolerance.

> "in space" - it is easier. As you said sector index might be used like
> with encryption.
> Please have a look to dm_int_calc_hmac(). It uses already offset in
> calculations..
>
>         err = crypto_shash_init(&desc.shash);
>         if (!err)
>                 err = crypto_shash_update(&desc.shash, digest, size);
>         if (!err)
>                 err = crypto_shash_finup(&desc.shash, (u8 *)&offset,
>                                           sizeof(offset), hmac);

Nice - I misread that!  At least that keeps replays pinned to the same
block index for direct replays.

cheers!
will

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-01-28  1:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-17  4:54 dm-integrity: integrity protection device-mapper target Mikulas Patocka
2013-01-18 21:43 ` Kasatkin, Dmitry
2013-01-18 23:16   ` Alasdair G Kergon
2013-01-18 23:58     ` Kasatkin, Dmitry
2013-01-21 13:51       ` Alan Cox
2013-01-21 13:51         ` Alan Cox
2013-01-21 10:37     ` Kasatkin, Dmitry
2013-01-21 10:38     ` Kasatkin, Dmitry
2013-01-23  1:29   ` Mikulas Patocka
2013-01-23  6:09     ` [dm-devel] " Will Drewry
2013-01-23 10:20       ` Kasatkin, Dmitry
2013-01-28  1:43         ` Will Drewry
2013-01-23  9:15     ` Spelic
2013-01-23 10:19     ` Kasatkin, Dmitry
2013-01-23 10:19       ` Kasatkin, Dmitry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.