All of lore.kernel.org
 help / color / mirror / Atom feed
* OOM on OSDS with erasure coding
@ 2016-06-02  6:23 Sharath Gururaj
  2016-06-03 17:04 ` kefu chai
  0 siblings, 1 reply; 6+ messages in thread
From: Sharath Gururaj @ 2016-06-02  6:23 UTC (permalink / raw)
  To: ceph-devel

Hi All,

We are testing an erasure coded ceph cluster fronted by rados gateway.
Recently many osds are going down due to out-of-memory.
Here are the details.

Description of the cluster:
====================
ceph version 0.94.2 (hammer)
32 hosts, 6 disks (osds) per host so 32*6 = 192 osds
17024 pgs, 15 pools, 107 TB data, 57616 kobjects
167 TB used, 508 TB / 675 TB available
erasure coding reed-solomon-van with k=10, m=5
We are using rgw as client. erasure coded only for the .rgw.buckets pool
rest of the rgw metadata/index pools are replicated with size=3

The Problem
==========
We ran a load test against this cluster. The load test simply writes
4.5 MB sized objects through a locust test cluster.
We observed very low throughput with saturation on disk iops.
We reasoned that this is because rgw stripe width is 4MB,
which results in the osds splitting it into 4MB/k = 400kb chunks,
which leads to random io behaviour.

To mitigate this, we changed rgw stripe width to 40 MB (so that, after
chunking, the object sizes become 40/k = 4MB) and we modified the load
test to upload 40 MB objects.

Now we observed a more serious problem.
A lot of OSDs across different hosts started getting killed by OOM killer.
We saw that the memory usage of OSDs were huge. ~10G per OSDs.
For comparison, we have a different replicated cluster with a lot more
data where OSD memory usage is ~600MB.

At this point, we stopped the load test, and tried to restart the
individual OSDs.
Even without load, the OSD memory size grows to ~11G

We ran the tcmalloc heap profiler against an OSD. Here is the graph
generated by google-pprof.
http://s33.postimg.org/5w48sr3an/mygif.gif


The graph seems to indicate that most of the memory is being allocated
by PGLog::readLog
Is this expected behaviour? Is there some setting that allows us to
control this?

Please let us know further steps we can take to fix the problem.

Thanks
Sharath

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OOM on OSDS with erasure coding
  2016-06-02  6:23 OOM on OSDS with erasure coding Sharath Gururaj
@ 2016-06-03 17:04 ` kefu chai
  2016-06-03 19:08   ` Sharath Gururaj
  0 siblings, 1 reply; 6+ messages in thread
From: kefu chai @ 2016-06-03 17:04 UTC (permalink / raw)
  To: Sharath Gururaj; +Cc: ceph-devel

On Thu, Jun 2, 2016 at 2:23 PM, Sharath Gururaj <sharath.g@flipkart.com> wrote:
> Hi All,
>
> We are testing an erasure coded ceph cluster fronted by rados gateway.
> Recently many osds are going down due to out-of-memory.
> Here are the details.
>
> Description of the cluster:
> ====================
> ceph version 0.94.2 (hammer)

Sharath, have you tried the latest hammer (v0.94.7)? does it also have
this issue?

> 32 hosts, 6 disks (osds) per host so 32*6 = 192 osds
> 17024 pgs, 15 pools, 107 TB data, 57616 kobjects
> 167 TB used, 508 TB / 675 TB available
> erasure coding reed-solomon-van with k=10, m=5
> We are using rgw as client. erasure coded only for the .rgw.buckets pool
> rest of the rgw metadata/index pools are replicated with size=3
>
> The Problem
> ==========
> We ran a load test against this cluster. The load test simply writes
> 4.5 MB sized objects through a locust test cluster.
> We observed very low throughput with saturation on disk iops.
> We reasoned that this is because rgw stripe width is 4MB,
> which results in the osds splitting it into 4MB/k = 400kb chunks,
> which leads to random io behaviour.
>
> To mitigate this, we changed rgw stripe width to 40 MB (so that, after
> chunking, the object sizes become 40/k = 4MB) and we modified the load
> test to upload 40 MB objects.
>
> Now we observed a more serious problem.
> A lot of OSDs across different hosts started getting killed by OOM killer.
> We saw that the memory usage of OSDs were huge. ~10G per OSDs.
> For comparison, we have a different replicated cluster with a lot more
> data where OSD memory usage is ~600MB.
>
> At this point, we stopped the load test, and tried to restart the
> individual OSDs.
> Even without load, the OSD memory size grows to ~11G
>
> We ran the tcmalloc heap profiler against an OSD. Here is the graph
> generated by google-pprof.
> http://s33.postimg.org/5w48sr3an/mygif.gif
>
>
> The graph seems to indicate that most of the memory is being allocated
> by PGLog::readLog
> Is this expected behaviour? Is there some setting that allows us to
> control this?
>
> Please let us know further steps we can take to fix the problem.
>
> Thanks
> Sharath
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards
Kefu Chai

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OOM on OSDS with erasure coding
  2016-06-03 17:04 ` kefu chai
@ 2016-06-03 19:08   ` Sharath Gururaj
  2016-06-03 19:54     ` Samuel Just
  0 siblings, 1 reply; 6+ messages in thread
From: Sharath Gururaj @ 2016-06-03 19:08 UTC (permalink / raw)
  To: kefu chai; +Cc: ceph-devel

Hi Kefu,

I haven't tried the latest hammer.
Is there any pg_log related fixes that have been applied?

After a little more digging, our current suspicion is that the pg_log
is growing in proportion to the number of file chunks in each OSD.
since we have k=10, m=5 (total 15 chunks for each rados object), the
memory usage has become quite high.

Could you tell a little bit about the implementation/lifecycle of the PGLog?
When are they trimmed? Are there any settings to control how much of
it is kept in memory?

Thanks
Sharath

On Fri, Jun 3, 2016 at 10:34 PM, kefu chai <tchaikov@gmail.com> wrote:
> On Thu, Jun 2, 2016 at 2:23 PM, Sharath Gururaj <sharath.g@flipkart.com> wrote:
>> Hi All,
>>
>> We are testing an erasure coded ceph cluster fronted by rados gateway.
>> Recently many osds are going down due to out-of-memory.
>> Here are the details.
>>
>> Description of the cluster:
>> ====================
>> ceph version 0.94.2 (hammer)
>
> Sharath, have you tried the latest hammer (v0.94.7)? does it also have
> this issue?
>
>> 32 hosts, 6 disks (osds) per host so 32*6 = 192 osds
>> 17024 pgs, 15 pools, 107 TB data, 57616 kobjects
>> 167 TB used, 508 TB / 675 TB available
>> erasure coding reed-solomon-van with k=10, m=5
>> We are using rgw as client. erasure coded only for the .rgw.buckets pool
>> rest of the rgw metadata/index pools are replicated with size=3
>>
>> The Problem
>> ==========
>> We ran a load test against this cluster. The load test simply writes
>> 4.5 MB sized objects through a locust test cluster.
>> We observed very low throughput with saturation on disk iops.
>> We reasoned that this is because rgw stripe width is 4MB,
>> which results in the osds splitting it into 4MB/k = 400kb chunks,
>> which leads to random io behaviour.
>>
>> To mitigate this, we changed rgw stripe width to 40 MB (so that, after
>> chunking, the object sizes become 40/k = 4MB) and we modified the load
>> test to upload 40 MB objects.
>>
>> Now we observed a more serious problem.
>> A lot of OSDs across different hosts started getting killed by OOM killer.
>> We saw that the memory usage of OSDs were huge. ~10G per OSDs.
>> For comparison, we have a different replicated cluster with a lot more
>> data where OSD memory usage is ~600MB.
>>
>> At this point, we stopped the load test, and tried to restart the
>> individual OSDs.
>> Even without load, the OSD memory size grows to ~11G
>>
>> We ran the tcmalloc heap profiler against an OSD. Here is the graph
>> generated by google-pprof.
>> http://s33.postimg.org/5w48sr3an/mygif.gif
>>
>>
>> The graph seems to indicate that most of the memory is being allocated
>> by PGLog::readLog
>> Is this expected behaviour? Is there some setting that allows us to
>> control this?
>>
>> Please let us know further steps we can take to fix the problem.
>>
>> Thanks
>> Sharath
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Regards
> Kefu Chai

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OOM on OSDS with erasure coding
  2016-06-03 19:08   ` Sharath Gururaj
@ 2016-06-03 19:54     ` Samuel Just
  2016-06-03 20:10       ` Sharath Gururaj
  0 siblings, 1 reply; 6+ messages in thread
From: Samuel Just @ 2016-06-03 19:54 UTC (permalink / raw)
  To: Sharath Gururaj; +Cc: kefu chai, ceph-devel

Oh, actually, I think the problem is simply that you have 1.3k pg
shards/osd.  We suggest more like 200.  Indeed, the main user of
memory for a particular pg is the pg log, so it makes sense that that
is where most of the memory would be allocated.  You can probably live
with fewer entries/pg: try adjusting osd_min_pg_log_entries and
osd_max_pg_log_entries (defaults are 3000 and 10000) down by a factor
of 10.
-Sam

On Fri, Jun 3, 2016 at 12:08 PM, Sharath Gururaj <sharath.g@flipkart.com> wrote:
> Hi Kefu,
>
> I haven't tried the latest hammer.
> Is there any pg_log related fixes that have been applied?
>
> After a little more digging, our current suspicion is that the pg_log
> is growing in proportion to the number of file chunks in each OSD.
> since we have k=10, m=5 (total 15 chunks for each rados object), the
> memory usage has become quite high.
>
> Could you tell a little bit about the implementation/lifecycle of the PGLog?
> When are they trimmed? Are there any settings to control how much of
> it is kept in memory?
>
> Thanks
> Sharath
>
> On Fri, Jun 3, 2016 at 10:34 PM, kefu chai <tchaikov@gmail.com> wrote:
>> On Thu, Jun 2, 2016 at 2:23 PM, Sharath Gururaj <sharath.g@flipkart.com> wrote:
>>> Hi All,
>>>
>>> We are testing an erasure coded ceph cluster fronted by rados gateway.
>>> Recently many osds are going down due to out-of-memory.
>>> Here are the details.
>>>
>>> Description of the cluster:
>>> ====================
>>> ceph version 0.94.2 (hammer)
>>
>> Sharath, have you tried the latest hammer (v0.94.7)? does it also have
>> this issue?
>>
>>> 32 hosts, 6 disks (osds) per host so 32*6 = 192 osds
>>> 17024 pgs, 15 pools, 107 TB data, 57616 kobjects
>>> 167 TB used, 508 TB / 675 TB available
>>> erasure coding reed-solomon-van with k=10, m=5
>>> We are using rgw as client. erasure coded only for the .rgw.buckets pool
>>> rest of the rgw metadata/index pools are replicated with size=3
>>>
>>> The Problem
>>> ==========
>>> We ran a load test against this cluster. The load test simply writes
>>> 4.5 MB sized objects through a locust test cluster.
>>> We observed very low throughput with saturation on disk iops.
>>> We reasoned that this is because rgw stripe width is 4MB,
>>> which results in the osds splitting it into 4MB/k = 400kb chunks,
>>> which leads to random io behaviour.
>>>
>>> To mitigate this, we changed rgw stripe width to 40 MB (so that, after
>>> chunking, the object sizes become 40/k = 4MB) and we modified the load
>>> test to upload 40 MB objects.
>>>
>>> Now we observed a more serious problem.
>>> A lot of OSDs across different hosts started getting killed by OOM killer.
>>> We saw that the memory usage of OSDs were huge. ~10G per OSDs.
>>> For comparison, we have a different replicated cluster with a lot more
>>> data where OSD memory usage is ~600MB.
>>>
>>> At this point, we stopped the load test, and tried to restart the
>>> individual OSDs.
>>> Even without load, the OSD memory size grows to ~11G
>>>
>>> We ran the tcmalloc heap profiler against an OSD. Here is the graph
>>> generated by google-pprof.
>>> http://s33.postimg.org/5w48sr3an/mygif.gif
>>>
>>>
>>> The graph seems to indicate that most of the memory is being allocated
>>> by PGLog::readLog
>>> Is this expected behaviour? Is there some setting that allows us to
>>> control this?
>>>
>>> Please let us know further steps we can take to fix the problem.
>>>
>>> Thanks
>>> Sharath
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Regards
>> Kefu Chai
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OOM on OSDS with erasure coding
  2016-06-03 19:54     ` Samuel Just
@ 2016-06-03 20:10       ` Sharath Gururaj
  2016-06-03 20:45         ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Sharath Gururaj @ 2016-06-03 20:10 UTC (permalink / raw)
  To: Samuel Just; +Cc: kefu chai, ceph-devel

Thanks Samuel,

But how did you arrive at the number 1.3k per shard osd?
we have ~17k pgs and 192 osds, giving ~88 PGs per osd.



On Sat, Jun 4, 2016 at 1:24 AM, Samuel Just <sjust@redhat.com> wrote:
> Oh, actually, I think the problem is simply that you have 1.3k pg
> shards/osd.  We suggest more like 200.  Indeed, the main user of
> memory for a particular pg is the pg log, so it makes sense that that
> is where most of the memory would be allocated.  You can probably live
> with fewer entries/pg: try adjusting osd_min_pg_log_entries and
> osd_max_pg_log_entries (defaults are 3000 and 10000) down by a factor
> of 10.
> -Sam
>
> On Fri, Jun 3, 2016 at 12:08 PM, Sharath Gururaj <sharath.g@flipkart.com> wrote:
>> Hi Kefu,
>>
>> I haven't tried the latest hammer.
>> Is there any pg_log related fixes that have been applied?
>>
>> After a little more digging, our current suspicion is that the pg_log
>> is growing in proportion to the number of file chunks in each OSD.
>> since we have k=10, m=5 (total 15 chunks for each rados object), the
>> memory usage has become quite high.
>>
>> Could you tell a little bit about the implementation/lifecycle of the PGLog?
>> When are they trimmed? Are there any settings to control how much of
>> it is kept in memory?
>>
>> Thanks
>> Sharath
>>
>> On Fri, Jun 3, 2016 at 10:34 PM, kefu chai <tchaikov@gmail.com> wrote:
>>> On Thu, Jun 2, 2016 at 2:23 PM, Sharath Gururaj <sharath.g@flipkart.com> wrote:
>>>> Hi All,
>>>>
>>>> We are testing an erasure coded ceph cluster fronted by rados gateway.
>>>> Recently many osds are going down due to out-of-memory.
>>>> Here are the details.
>>>>
>>>> Description of the cluster:
>>>> ====================
>>>> ceph version 0.94.2 (hammer)
>>>
>>> Sharath, have you tried the latest hammer (v0.94.7)? does it also have
>>> this issue?
>>>
>>>> 32 hosts, 6 disks (osds) per host so 32*6 = 192 osds
>>>> 17024 pgs, 15 pools, 107 TB data, 57616 kobjects
>>>> 167 TB used, 508 TB / 675 TB available
>>>> erasure coding reed-solomon-van with k=10, m=5
>>>> We are using rgw as client. erasure coded only for the .rgw.buckets pool
>>>> rest of the rgw metadata/index pools are replicated with size=3
>>>>
>>>> The Problem
>>>> ==========
>>>> We ran a load test against this cluster. The load test simply writes
>>>> 4.5 MB sized objects through a locust test cluster.
>>>> We observed very low throughput with saturation on disk iops.
>>>> We reasoned that this is because rgw stripe width is 4MB,
>>>> which results in the osds splitting it into 4MB/k = 400kb chunks,
>>>> which leads to random io behaviour.
>>>>
>>>> To mitigate this, we changed rgw stripe width to 40 MB (so that, after
>>>> chunking, the object sizes become 40/k = 4MB) and we modified the load
>>>> test to upload 40 MB objects.
>>>>
>>>> Now we observed a more serious problem.
>>>> A lot of OSDs across different hosts started getting killed by OOM killer.
>>>> We saw that the memory usage of OSDs were huge. ~10G per OSDs.
>>>> For comparison, we have a different replicated cluster with a lot more
>>>> data where OSD memory usage is ~600MB.
>>>>
>>>> At this point, we stopped the load test, and tried to restart the
>>>> individual OSDs.
>>>> Even without load, the OSD memory size grows to ~11G
>>>>
>>>> We ran the tcmalloc heap profiler against an OSD. Here is the graph
>>>> generated by google-pprof.
>>>> http://s33.postimg.org/5w48sr3an/mygif.gif
>>>>
>>>>
>>>> The graph seems to indicate that most of the memory is being allocated
>>>> by PGLog::readLog
>>>> Is this expected behaviour? Is there some setting that allows us to
>>>> control this?
>>>>
>>>> Please let us know further steps we can take to fix the problem.
>>>>
>>>> Thanks
>>>> Sharath
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Regards
>>> Kefu Chai
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OOM on OSDS with erasure coding
  2016-06-03 20:10       ` Sharath Gururaj
@ 2016-06-03 20:45         ` Sage Weil
  0 siblings, 0 replies; 6+ messages in thread
From: Sage Weil @ 2016-06-03 20:45 UTC (permalink / raw)
  To: Sharath Gururaj; +Cc: Samuel Just, kefu chai, ceph-devel

On Sat, 4 Jun 2016, Sharath Gururaj wrote:
> Thanks Samuel,
> 
> But how did you arrive at the number 1.3k per shard osd?
> we have ~17k pgs and 192 osds, giving ~88 PGs per osd.

Each logical PG has k + m shards, so

	17000 * 15 / 192 = 1328

sage

> 
> 
> 
> On Sat, Jun 4, 2016 at 1:24 AM, Samuel Just <sjust@redhat.com> wrote:
> > Oh, actually, I think the problem is simply that you have 1.3k pg
> > shards/osd.  We suggest more like 200.  Indeed, the main user of
> > memory for a particular pg is the pg log, so it makes sense that that
> > is where most of the memory would be allocated.  You can probably live
> > with fewer entries/pg: try adjusting osd_min_pg_log_entries and
> > osd_max_pg_log_entries (defaults are 3000 and 10000) down by a factor
> > of 10.
> > -Sam
> >
> > On Fri, Jun 3, 2016 at 12:08 PM, Sharath Gururaj <sharath.g@flipkart.com> wrote:
> >> Hi Kefu,
> >>
> >> I haven't tried the latest hammer.
> >> Is there any pg_log related fixes that have been applied?
> >>
> >> After a little more digging, our current suspicion is that the pg_log
> >> is growing in proportion to the number of file chunks in each OSD.
> >> since we have k=10, m=5 (total 15 chunks for each rados object), the
> >> memory usage has become quite high.
> >>
> >> Could you tell a little bit about the implementation/lifecycle of the PGLog?
> >> When are they trimmed? Are there any settings to control how much of
> >> it is kept in memory?
> >>
> >> Thanks
> >> Sharath
> >>
> >> On Fri, Jun 3, 2016 at 10:34 PM, kefu chai <tchaikov@gmail.com> wrote:
> >>> On Thu, Jun 2, 2016 at 2:23 PM, Sharath Gururaj <sharath.g@flipkart.com> wrote:
> >>>> Hi All,
> >>>>
> >>>> We are testing an erasure coded ceph cluster fronted by rados gateway.
> >>>> Recently many osds are going down due to out-of-memory.
> >>>> Here are the details.
> >>>>
> >>>> Description of the cluster:
> >>>> ====================
> >>>> ceph version 0.94.2 (hammer)
> >>>
> >>> Sharath, have you tried the latest hammer (v0.94.7)? does it also have
> >>> this issue?
> >>>
> >>>> 32 hosts, 6 disks (osds) per host so 32*6 = 192 osds
> >>>> 17024 pgs, 15 pools, 107 TB data, 57616 kobjects
> >>>> 167 TB used, 508 TB / 675 TB available
> >>>> erasure coding reed-solomon-van with k=10, m=5
> >>>> We are using rgw as client. erasure coded only for the .rgw.buckets pool
> >>>> rest of the rgw metadata/index pools are replicated with size=3
> >>>>
> >>>> The Problem
> >>>> ==========
> >>>> We ran a load test against this cluster. The load test simply writes
> >>>> 4.5 MB sized objects through a locust test cluster.
> >>>> We observed very low throughput with saturation on disk iops.
> >>>> We reasoned that this is because rgw stripe width is 4MB,
> >>>> which results in the osds splitting it into 4MB/k = 400kb chunks,
> >>>> which leads to random io behaviour.
> >>>>
> >>>> To mitigate this, we changed rgw stripe width to 40 MB (so that, after
> >>>> chunking, the object sizes become 40/k = 4MB) and we modified the load
> >>>> test to upload 40 MB objects.
> >>>>
> >>>> Now we observed a more serious problem.
> >>>> A lot of OSDs across different hosts started getting killed by OOM killer.
> >>>> We saw that the memory usage of OSDs were huge. ~10G per OSDs.
> >>>> For comparison, we have a different replicated cluster with a lot more
> >>>> data where OSD memory usage is ~600MB.
> >>>>
> >>>> At this point, we stopped the load test, and tried to restart the
> >>>> individual OSDs.
> >>>> Even without load, the OSD memory size grows to ~11G
> >>>>
> >>>> We ran the tcmalloc heap profiler against an OSD. Here is the graph
> >>>> generated by google-pprof.
> >>>> http://s33.postimg.org/5w48sr3an/mygif.gif
> >>>>
> >>>>
> >>>> The graph seems to indicate that most of the memory is being allocated
> >>>> by PGLog::readLog
> >>>> Is this expected behaviour? Is there some setting that allows us to
> >>>> control this?
> >>>>
> >>>> Please let us know further steps we can take to fix the problem.
> >>>>
> >>>> Thanks
> >>>> Sharath
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>>
> >>>
> >>> --
> >>> Regards
> >>> Kefu Chai
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-03 20:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-02  6:23 OOM on OSDS with erasure coding Sharath Gururaj
2016-06-03 17:04 ` kefu chai
2016-06-03 19:08   ` Sharath Gururaj
2016-06-03 19:54     ` Samuel Just
2016-06-03 20:10       ` Sharath Gururaj
2016-06-03 20:45         ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.