Question: Modifying kernel to handle all I/O requests without page cache

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Question: Modifying kernel to handle all I/O requests without page cache
@ 2019-09-25 22:51 Jianshen Liu
  2019-09-26 12:39 ` Carlos Maiolino
  0 siblings, 1 reply; 5+ messages in thread
From: Jianshen Liu @ 2019-09-25 22:51 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs

Hi,

I am working on a project trying to evaluate the performance of a
workload running on a storage device. I don't want the benchmark
result depends on a specific platform (e.g., a platform with X GiB of
physical memory). Because it prevents people from reproducing the same
result on a different platform configuration. Think about you are
benchmarking a read-heavy workload, with data caching enabled you may
end up with just testing the performance of the system memory.

Currently, I'm thinking how to eliminate the cache effects created by
the page cache. Direct I/O is a good option for testing with a single
application but is not good for testing with unknown
applications/workloads. Therefore, it is not feasible to ask people to
modify the application source code before running the benchmark.

Making changes within the kernel may only be the option because it is
transparent to all user-space applications. The problem is I don't
know how to modify the kernel so that it does not use the page cache
for any IOs to a specific storage device. I have tried to append a
fadvise64() call with POSIX_FADV_DONTNEED to the end of each
read/write system calls. The performance of this approach is far from
using Direct I/O. It is also unable to eliminate the caching effects
under concurrent I/Os. I'm looking for any advice here to point me an
efficient way to remove the cache effects from the page cache.

Thanks,
Jianshen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question: Modifying kernel to handle all I/O requests without page cache
  2019-09-25 22:51 Question: Modifying kernel to handle all I/O requests without page cache Jianshen Liu
@ 2019-09-26 12:39 ` Carlos Maiolino
  2019-09-27  1:42   ` Jianshen Liu
  0 siblings, 1 reply; 5+ messages in thread
From: Carlos Maiolino @ 2019-09-26 12:39 UTC (permalink / raw)
  To: Jianshen Liu; +Cc: linux-xfs

Hi.

I am removing linux-fsdevel from the CC, because it's not really the quorum for
it.

On Wed, Sep 25, 2019 at 03:51:27PM -0700, Jianshen Liu wrote:
> Hi,
> 
> I am working on a project trying to evaluate the performance of a
> workload running on a storage device. I don't want the benchmark
> result depends on a specific platform (e.g., a platform with X GiB of
> physical memory).

Well, this does not sound realistic to me. Memory is just only one of the
variables in how IO throughput will perform. You bypass memory, then, what about
IO Controller, Disks, Storage cache, etc etc etc? All of these are 'platform
specific'.

> Because it prevents people from reproducing the same
> result on a different platform configuration.

Every platform will behave different, memory is only one factor.

> Think about you are
> benchmarking a read-heavy workload, with data caching enabled you may
> end up with just testing the performance of the system memory.
>

Not really true. It all depends on what kind of workload you are talking about.
And what you are trying to measure.

A read-heavy workload, may well use a lot of page cache, but it all depends on
the IO patterns, and exactly what statistics you care about. Are you trying to
measure how well a storage solution will perform on a random workload? On a
sequential workload?
Are you trying to measure how well an application will perform? If that's the
case, removing the page cache from the equation really matters? I.e. will it
give you realistic results?

If you are benchmarking systems with 'random' kinds of workloads, there are
several tools around there which can help, and you can configure to use DIO.

> Currently, I'm thinking how to eliminate the cache effects created by
> the page cache. Direct I/O is a good option for testing with a single
> application but is not good for testing with unknown
> applications/workloads.

You can use many tools for that purpose, which can 'emulate' different
workloads, without needing to modify a specific application.

But if you are trying to create benchmarks for a specific application, if your
benchmarks uses DIO or not, will depend on if the application uses DIO or not.

> Therefore, it is not feasible to ask people to
> modify the application source code before running the benchmark.

Well, IMHO, your approach is wrong. First, if you are benchmarking how an application
will perform, you need to use the same IO patterns the application is using,
i.e. you won't need to modify it. If it does not use direct IO, benchmarking a system
using direct IO will bring you something very wrong data. And the opposite is true,
if the application uses direct IO, you don't want to benchmark a system by using
the page cache, because one of the things you really want to measure is how well the
application's cache is performing.

Also, direct IO is also not a good option to use when you 'don't know how to
issue I/O requests'.

All I/O requests submitted using direct IO must be aligned. So, if the
application does not issue aligned requests, the IO requests will fail.

I remember some filesystems to had an option to 'open all files with O_DIRECT by
default', and many problems being created because IO requests to such files were
not all sector aligned.

> 
> Making changes within the kernel may only be the option because it is
> transparent to all user-space applications.

I will hit the same point again :) and my question is: Why? :) Will you be using
a custom kernel? With this modification? If not, you will not be gathering
trustable data anyway.

> The problem is I don't
> know how to modify the kernel so that it does not use the page cache
> for any IOs to a specific storage device. I have tried to append a
> fadvise64() call with POSIX_FADV_DONTNEED to the end of each
> read/write system calls. The performance of this approach is far from
> using Direct I/O. It is also unable to eliminate the caching effects
> under concurrent I/Os. I'm looking for any advice here to point me an
> efficient way to remove the cache effects from the page cache.
> 
> Thanks,
> Jianshen

Benchmarking systems is an 'art', and I am certainly not an expert on it, but at
first, it looks like you are trying to create a 'generic benchmark' to some
generic random system. And I will tell you, this is not going to work well. We
have tons of cases and stories about people running benchmark X on system Z, and
it performing 'well', but when running their real workload, everything starts to
perform poorly, exactly because they did not use the correct benchmark at first.

You have several layers in a storage stack, which starts from how the
application handles its own IO requests. And each layer which will behave
differently on each type of workload.

Apologies to be repeating myself:

If you are trying to measure only a storage solution, there are several tools
around which can create different kinds of workload.

If you are trying to measure an application performance on solution X, well,
it is pointless to measure direct IO if the application does not use it or
vice-versa, so, modifying an application, again, is not what you will want to do
for benchmarking, for sure.

Hope to have helped (and not created more questions :)

Cheers

-- 
Carlos

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question: Modifying kernel to handle all I/O requests without page cache
  2019-09-26 12:39 ` Carlos Maiolino
@ 2019-09-27  1:42   ` Jianshen Liu
  2019-09-27 10:39     ` Carlos Maiolino
  2019-09-27 22:17     ` Dave Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: Jianshen Liu @ 2019-09-27  1:42 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: linux-xfs

Hi Carlos,

Thanks for your reply.

On Thu, Sep 26, 2019 at 5:39 AM Carlos Maiolino <cmaiolino@redhat.com> wrote:
>
> On Wed, Sep 25, 2019 at 03:51:27PM -0700, Jianshen Liu wrote:
> > Hi,
> >
> > I am working on a project trying to evaluate the performance of a
> > workload running on a storage device. I don't want the benchmark
> > result depends on a specific platform (e.g., a platform with X GiB of
> > physical memory).
>
> Well, this does not sound realistic to me. Memory is just only one of the
> variables in how IO throughput will perform. You bypass memory, then, what about
> IO Controller, Disks, Storage cache, etc etc etc? All of these are 'platform
> specific'.

I apologize for any confusion because of my oversimplified project
description. My final goal is to compare the efficiency of different
platforms utilizing a specific storage device to run a given workload.
Since the platforms can be heterogeneous (e.g., x86 vs arm), the
comparison should be based on a reference unit that is relevant to the
capability of the storage device but is irrelevant to a specific
platform. With this reference unit, you can understand how much
performance a platform can give over the capability of the specific
storage device. Once you have this knowledge, you can consider whether
you add/remove some CPUs, memory, the same model of storage devices,
etc can improve the platform efficiency (e.g., cost/reference unit)
with respect to the capability of the storage device under this
workload. Moreover, you can answer questions like can you get the full
unit of performance when you add one more device onto the platform.

My question here is how to evaluate the platform-independent reference
unit for the combination of a given workload and a specific storage
device. Specifically, the reference unit should be a performance value
of the workload under the capability of the storage device. In other
words, this value should not be either enhanced or throttled by the
testing platform. Yes, memory is one of the variables affecting the
I/O performance, the CPU horse, network bandwidth, type of host
interface, version of the software would be the other. But these are
the variables I can easily control. For example, I can check whether
the CPU and/or the network are the performance bottlenecks. The I/O
controller, storage media, and the disk cache are encapsulated in the
storage device, so these are not platform-specific variables as long
as I keep using the same model of the storage device. The use of page
cache, however, may enhance the performance value making the value
become platform-dependent.

> > Because it prevents people from reproducing the same
> > result on a different platform configuration.
>
> Every platform will behave different, memory is only one factor.
>
> > Think about you are
> > benchmarking a read-heavy workload, with data caching enabled you may
> > end up with just testing the performance of the system memory.
> >
>
> Not really true. It all depends on what kind of workload you are talking about.
> And what you are trying to measure.
>
> A read-heavy workload, may well use a lot of page cache, but it all depends on
> the IO patterns, and exactly what statistics you care about. Are you trying to
> measure how well a storage solution will perform on a random workload? On a
> sequential workload?
> Are you trying to measure how well an application will perform? If that's the
> case, removing the page cache from the equation really matters? I.e. will it
> give you realistic results?
>
> If you are benchmarking systems with 'random' kinds of workloads, there are
> several tools around there which can help, and you can configure to use DIO.
>
> > Currently, I'm thinking how to eliminate the cache effects created by
> > the page cache. Direct I/O is a good option for testing with a single
> > application but is not good for testing with unknown
> > applications/workloads.
>
> You can use many tools for that purpose, which can 'emulate' different
> workloads, without needing to modify a specific application.

I don't want to emulate a workload. An emulated workload will most of
the time be different from the source real-world workload. For
example, replaying block I/O recording results generated by fio or
blktrace will probably get different performance numbers from running
the original workload.

> But if you are trying to create benchmarks for a specific application, if your
> benchmarks uses DIO or not, will depend on if the application uses DIO or not.

This is my main question. I want running an application without
involving page caching effects even when the application does not
support DIO.

> > Therefore, it is not feasible to ask people to
> > modify the application source code before running the benchmark.
>
> Well, IMHO, your approach is wrong. First, if you are benchmarking how an application
> will perform, you need to use the same IO patterns the application is using,
> i.e. you won't need to modify it. If it does not use direct IO, benchmarking a system
> using direct IO will bring you something very wrong data. And the opposite is true,
> if the application uses direct IO, you don't want to benchmark a system by using
> the page cache, because one of the things you really want to measure is how well the
> application's cache is performing.
>
> Also, direct IO is also not a good option to use when you 'don't know how to
> issue I/O requests'.
>
> All I/O requests submitted using direct IO must be aligned. So, if the
> application does not issue aligned requests, the IO requests will fail.

Yes, this is one of the difficulties in my problem. The application
may not issue offset, length, buffer addressed aligned I/O. Thus, I
cannot blindly convert application I/O to DIO within the kernel.

> I remember some filesystems to had an option to 'open all files with O_DIRECT by
> default', and many problems being created because IO requests to such files were
> not all sector aligned.
>
> >
> > Making changes within the kernel may only be the option because it is
> > transparent to all user-space applications.
>
> I will hit the same point again :) and my question is: Why? :) Will you be using
> a custom kernel? With this modification? If not, you will not be gathering
> trustable data anyway.

I created a loadable module to patch a vanilla kernel using the kernel
livepatching mechanism.

> > The problem is I don't
> > know how to modify the kernel so that it does not use the page cache
> > for any IOs to a specific storage device. I have tried to append a
> > fadvise64() call with POSIX_FADV_DONTNEED to the end of each
> > read/write system calls. The performance of this approach is far from
> > using Direct I/O. It is also unable to eliminate the caching effects
> > under concurrent I/Os. I'm looking for any advice here to point me an
> > efficient way to remove the cache effects from the page cache.
> >
> > Thanks,
> > Jianshen
>
>
> Benchmarking systems is an 'art', and I am certainly not an expert on it, but at
> first, it looks like you are trying to create a 'generic benchmark' to some
> generic random system. And I will tell you, this is not going to work well. We
> have tons of cases and stories about people running benchmark X on system Z, and
> it performing 'well', but when running their real workload, everything starts to
> perform poorly, exactly because they did not use the correct benchmark at first.

I'm not trying to create a generic benchmark. I just want to create a
benchmark methodology focusing on evaluating the efficiency of a
platform for running a given workload on a specific storage device.

> You have several layers in a storage stack, which starts from how the
> application handles its own IO requests. And each layer which will behave
> differently on each type of workload.

My assumption is that we should run the same workload when comparing
different platforms.

> Apologies to be repeating myself:
>
> If you are trying to measure only a storage solution, there are several tools
> around which can create different kinds of workload.

I would like to know whether there is a tool that can create an
identical workload as the source. But this still does not help to
measure the reference unit that I mentioned.

> If you are trying to measure an application performance on solution X, well,
> it is pointless to measure direct IO if the application does not use it or
> vice-versa, so, modifying an application, again, is not what you will want to do
> for benchmarking, for sure.

The point is that I'm not trying to measure the performance of an
application on solution X. I'm trying to generate a
platform-independent reference unit for the combination of a storage
device and the application.

I have researched different knobs provided by the kernel including
drop_caches, cgroup, and vm subsystem, but none of them can help me to
measure what I want. I would like to know whether there is a variable
in the filesystem that defines the size of the page cache pool. Also,
would it be possible to convert some of the application IOs to DIO
when they are properly aligned? Are there any places in the kernel I
can easily change to bypass the page cache?

Thanks,
Jianshen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question: Modifying kernel to handle all I/O requests without page cache
  2019-09-27  1:42   ` Jianshen Liu
@ 2019-09-27 10:39     ` Carlos Maiolino
  2019-09-27 22:17     ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Carlos Maiolino @ 2019-09-27 10:39 UTC (permalink / raw)
  To: Jianshen Liu; +Cc: linux-xfs

Hi.

I'm gonna move this question to the top, for a short answer:

> > But if you are trying to create benchmarks for a specific application, if your
> > benchmarks uses DIO or not, will depend on if the application uses DIO or not.
> 
> This is my main question. I want running an application without
> involving page caching effects even when the application does not
> support DIO.

You simply can't. Aligned IOs is a primitive of block devices (if I can use
these words). If you don't submit aligned IOs, you can't access block devices
directly.

You can't modify the kernel to do that either, because that's exactly one of the
goals of the buffer cache, other than improving performance of course. If you
submit an unaligned IO, kernel will first read in the whole sectors from the
block device, modify them accordingly to your unaligned IO and write the whole
sectors back.

For reads, the process is the same, kernel will read at least the whole sector,
never just a part of it.

Now, let me try a longer reply :P

> 
On Thu, Sep 26, 2019 at 06:42:43PM -0700, Jianshen Liu wrote:
> Hi Carlos,
> 
> Thanks for your reply.
> 
> On Thu, Sep 26, 2019 at 5:39 AM Carlos Maiolino <cmaiolino@redhat.com> wrote:
> >
> > On Wed, Sep 25, 2019 at 03:51:27PM -0700, Jianshen Liu wrote:
> > > Hi,
> > >
> > > I am working on a project trying to evaluate the performance of a
> > > workload running on a storage device. I don't want the benchmark
> > > result depends on a specific platform (e.g., a platform with X GiB of
> > > physical memory).
> >
> > Well, this does not sound realistic to me. Memory is just only one of the
> > variables in how IO throughput will perform. You bypass memory, then, what about
> > IO Controller, Disks, Storage cache, etc etc etc? All of these are 'platform
> > specific'.
> 
> I apologize for any confusion because of my oversimplified project
> description. My final goal is to compare the efficiency of different
> platforms utilizing a specific storage device to run a given workload.
> Since the platforms can be heterogeneous (e.g., x86 vs arm), the
> comparison should be based on a reference unit that is relevant to the
> capability of the storage device but is irrelevant to a specific
> platform.

The storage vendors usually already provide you with the hardware limitations
which you can use as the reference units you are looking for. Like maximum IOPS
and Throughput such storage solution can support. These are all platform and
application independent reference units you can use.

> With this reference unit, you can understand how much
> performance a platform can give over the capability of the specific
> storage device.

Again, you can use the numbers provided by the vendor. For example, XFS is
designed to be a high-throughput filesystem, and the goal is to be as close as
possible to the hardware limits, but of course, it all depends on everything
else.

> Once you have this knowledge, you can consider whether
> you add/remove some CPUs, memory, the same model of storage devices,
> etc can improve the platform efficiency (e.g., cost/reference unit)
> with respect to the capability of the storage device under this
> workload.

Storage hardware limitations, vendor provided numbers also applies here. And you
can't simply discard application's behavior here. Everything you mentioned here
will be directly affected by the application you're using, so, modifying the
application will give you nothing useful to work on.

> Moreover, you can answer questions like can you get the full
> unit of performance when you add one more device onto the platform.

For "Full unit of performance", you can again, use vendor-provided numbers :)

> My question here is how to evaluate the platform-independent reference
> unit for the combination of a given workload and a specific storage
> device.

Use the application you are trying to evaluate, in different platforms, and
measure it.

> Specifically, the reference unit should be a performance value
> of the workload under the capability of the storage device. In other
> words, this value should not be either enhanced or throttled by the
> testing platform. Yes, memory is one of the variables affecting the
> I/O performance, the CPU horse, network bandwidth, type of host
> interface, version of the software would be the other. But these are
> the variables I can easily control. For example, I can check whether
> the CPU and/or the network are the performance bottlenecks. The I/O
> controller, storage media, and the disk cache are encapsulated in the
> storage device, so these are not platform-specific variables as long
> as I keep using the same model of the storage device. The use of page
> cache, however, may enhance the performance value making the value
> become platform-dependent.

Again, everything you measure, will have no meaning if you don't use realistic
data. You can't simply bypass the buffer cache if the application does not
support it, and so, it is pointless to measure how an application will 'perform'
in such scenario.

> I don't want to emulate a workload. An emulated workload will most of
> the time be different from the source real-world workload. For
> example, replaying block I/O recording results generated by fio or
> blktrace will probably get different performance numbers from running
> the original workload.

And I think this is the crux for your issue.

You don't want an emulated workload, because it may not reproduce the real-world
workload.

Why then are you trying to find a way to bypass the page/buffer cache, on an
application that will not support direct IO and won't be able to use it like
that?

You don't want to collect data using emulated workloads, but at the same time
you want to use something that is simply totally out of the reality? Does not
make any sense to me.

fio can get different performance numbers? Sure, I agree, no performance
measurement tool can beat the real workload of a specific application, but, what
you are trying to do doesn't either, so, what's the difference?

> > Benchmarking systems is an 'art', and I am certainly not an expert on it, but at
> > first, it looks like you are trying to create a 'generic benchmark' to some
> > generic random system. And I will tell you, this is not going to work well. We
> > have tons of cases and stories about people running benchmark X on system Z, and
> > it performing 'well', but when running their real workload, everything starts to
> > perform poorly, exactly because they did not use the correct benchmark at first.
> 
> I'm not trying to create a generic benchmark. I just want to create a
> benchmark methodology focusing on evaluating the efficiency of a
> platform for running a given workload on a specific storage device.

Ok, so, you want to evaluate how platform X will behave with your application +
storage.

Why then you want to modify that original platform behavior? In this case, let's
say, by bypassing Linux page/buffer cache.

By platform you mean hardware? Well, then, use the same software stack.

> 
> > You have several layers in a storage stack, which starts from how the
> > application handles its own IO requests. And each layer which will behave
> > differently on each type of workload.
> 
> My assumption is that we should run the same workload when comparing
> different platforms.

Yes, and if you don't want to use emulated workloads, you should don't try to
hack your software stack to behave in weird ways.

If you want to compare platforms, ensure to use the same software stack.
Including the same configuration. That's all.

> > If you are trying to measure an application performance on solution X, well,
> > it is pointless to measure direct IO if the application does not use it or
> > vice-versa, so, modifying an application, again, is not what you will want to do
> > for benchmarking, for sure.
> 
> The point is that I'm not trying to measure the performance of an
> application on solution X. I'm trying to generate a
> platform-independent reference unit for the combination of a storage
> device and the application.

You simply can't. Get any enterprise application out there, you will see the
application vendors usually certify certain combinations of hardware + software
stack.

There is a reason for that. There are many variables in the way, not only the
page/buffer cache. You can't simply bypass the page/buffer cache, and think
you'll get some realistic base reference unit you can work with. Specially if
you are not sure how the application behaves.

If you want to have base reference unit numbers for a storage solution, use the
vendor's reference numbers. They are platform agnostic. Everything else above
that will be totally interdependent.

> I have researched different knobs provided by the kernel including
> drop_caches, cgroup, and vm subsystem, but none of them can help me to
> measure what I want.

Because I honestly think what you are trying to measure is unrealistic :)

> I would like to know whether there is a variable
> in the filesystem that defines the size of the page cache pool.

There is no such silver bullet :)

> Also,
> would it be possible to convert some of the application IOs to DIO
> when they are properly aligned?

Not that I know about, but well, I'm not really an expert in the DIO code, maybe
there's a way to fall back to buffered io, although, I don't think so.

> Are there any places in the kernel I
> can easily change to bypass the page cache?

No.

-- 
Carlos

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question: Modifying kernel to handle all I/O requests without page cache
  2019-09-27  1:42   ` Jianshen Liu
  2019-09-27 10:39     ` Carlos Maiolino
@ 2019-09-27 22:17     ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2019-09-27 22:17 UTC (permalink / raw)
  To: Jianshen Liu; +Cc: Carlos Maiolino, linux-xfs

On Thu, Sep 26, 2019 at 06:42:43PM -0700, Jianshen Liu wrote:
> > But if you are trying to create benchmarks for a specific application, if your
> > benchmarks uses DIO or not, will depend on if the application uses DIO or not.
> 
> This is my main question. I want running an application without
> involving page caching effects even when the application does not
> support DIO.

LD_PRELOAD wrapper for the open() syscall. Check that the target is
a file, then add O_DIRECT to the open flags.

Won't help you for mmap() access that will always use the page
cache, though, so things like executables will always use the page
cache regardless of what tricks you try to play.

So, as Carlos has said, what you want to do is largely impossible to
acheive.

> > All I/O requests submitted using direct IO must be aligned. So, if the
> > application does not issue aligned requests, the IO requests will fail.
> 
> Yes, this is one of the difficulties in my problem. The application
> may not issue offset, length, buffer addressed aligned I/O. Thus, I
> cannot blindly convert application I/O to DIO within the kernel.

LD_PRELOAD wrapper to bounce buffer unaligned read/write() requests.

> > I will hit the same point again :) and my question is: Why? :) Will you be using
> > a custom kernel? With this modification? If not, you will not be gathering
> > trustable data anyway.
> 
> I created a loadable module to patch a vanilla kernel using the kernel
> livepatching mechanism.

That's just asking for trouble. I wouldn't trust a kernel that has
been modified in that way as far as I could throw it.

> > If you are trying to measure an application performance on solution X, well,
> > it is pointless to measure direct IO if the application does not use it or
> > vice-versa, so, modifying an application, again, is not what you will want to do
> > for benchmarking, for sure.
> 
> The point is that I'm not trying to measure the performance of an
> application on solution X. I'm trying to generate a
> platform-independent reference unit for the combination of a storage
> device and the application.

Sounds like an exercise that has no practical use to me - the model
will have to be so generic and full of compromises that it won't be
relevant to real world situations....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-27 22:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-25 22:51 Question: Modifying kernel to handle all I/O requests without page cache Jianshen Liu
2019-09-26 12:39 ` Carlos Maiolino
2019-09-27  1:42   ` Jianshen Liu
2019-09-27 10:39     ` Carlos Maiolino
2019-09-27 22:17     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).