From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Patrick Turley" <pturley@rocksteady.com>
Date: Tue, 06 Jan 2004 18:03:21 +0000
Subject: Re: [LARTC] Bandwidth Control Tolerances
Message-Id: <005101c3d47f$61d1f270$6401a8c0@pturley>
List-Id: <lartc.vger.kernel.org>
References: <002801c3d3eb$4634da30$6401a8c0@pturley>
In-Reply-To: <002801c3d3eb$4634da30$6401a8c0@pturley>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: lartc@vger.kernel.org

This is, of course, very valuable feedback. Unfortunately, given the
responses I've had so far, I see that I didn't make it clear what I'm really
looking for.

I believe that my colleague's test methodology is flawed. I believe that you
cannot generate reliable bandwidth measurements by ftp'ing files and
measuring the time it takes. I believe this because I have seen iperf
generate very stable measurements with a variability of only plus or minus 1
kbit/s. I think it is inappropriate to spend time examining the quality of
the underlying bandwidth algorithms when the problem is really with the
measurement technique. If I don't prove that his tests are flawed, then I
*will* be asked to do a bunch of investigation that I think will only waste
my time. The ideal response would be something like this:


Patrick,

Timing an FTP transfer is well known to be a poor choice for measuring
bandwidth for the following reasons:

1) Even under ideal conditions, you will be guaranteed to under-measure the
bandwidth because you are not accounting for FTP protocol overhead, which is
considerable.

2) FTP does a lot of related work that isn't directly helpful in simply
moving the bits, which is why your colleague is seeing such variability in
his measurements.

3) The resolution of the Linux clock depends on the underlying hardware but,
at user level, is one half second at best. This can introduce substantial
error and variability in such a simple measurement.

4) Iperf is specifically designed to measure bandwidth without protocol
overhead or any other operations that introduce undue error or variability,
and accounts for clock skew. This is why you're seeing such stable
measurements and your colleague is not. You don't measure voltage with a
light bulb - you use a voltmeter. Don't measure bandwidth with FTP, use a
bandwidth meter.


These are my assertions. If you can authoritatively agree or disagree with
any of these claims, please say so. Also, if any of you have measured HTB
accuracy and can point me to a web site, that would be ideal. Yes, I've
visited the HTB home page, but I haven't found what I'm looking for there.

Please see below for my additional responses to Martin's e-mail.

>  : I have measured the performance of HTB with iperf and found it to be
>  : very close to expected (i.e., within 5%). I have a colleague who is
>  : measuring the performance by ftp'ing large files and recording the time
>  : required to make the transfer. He is seeing an average throughput that
>  : is nearly 10% away from the theoretical, with occasional excursions to
>  : nearly 30%.
>
> How have you defined your PSCHED_CLOCK_SOURCE?  See this URL:
>
>   http://www.docum.org/stef.coene/qos/faq/cache/40.html

Thank you for the reference. Yes, I found this and read it. We are using
stock RedHat kernels and are unwilling to recompile. I will try to figure
out how the RedHat kernel is configured. Can you give me an easy way to
discover this? Something in /proc perhaps?

>  : My colleague is now questioning the quality of the traffic control
>  : algorithms and wondering two things:
>
> Let's be careful with the baby and the bathwater.  The algorithms have
> been vetted.  The implementation may not be ideal, but implementations
> always suffer from compromises, right?

I agree entirely. I am convinced that the bandwidth algorithms have been
closely examined by many people and work quite well. However, because my
colleague believes his tests are accurate, he has concluded otherwise. If
you can help me prove that the problem is the measurement technique, I would
be very grateful.

>  : 1) What tolerance can we guarantee and advertise?
>
> Measure the deviations from your specified bandwidth after changing your
> setting for PSCHED_CLOCK_SOURCE.  Advertise your measured tolerance
> accordingly.

Yes - that's fine, if you think you are measuring correctly. But that's the
problem at hand, isn't it?

>  : 2) Can the tolerance be improved, since the values he has measured are
>  : unacceptable?
>
> I don't know--see the above link, and check how your kernel was compiled.
> Others on this list may have further suggestions for you.
>
> Good luck,
>
> -Martin

As always, Martin. Thanks very much for your help.

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/