All of lore.kernel.org
 help / color / mirror / Atom feed
* [OSSTEST PATCH] README.hardware-acquisition
@ 2018-10-30 16:13 Ian Jackson
  2018-10-30 16:27 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Ian Jackson @ 2018-10-30 16:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, infra

New document-cum-checklist, for helping with hardware procurement.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: infra@xenproject.org
---
 README.hardware-acquisition | 310 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 310 insertions(+)
 create mode 100644 README.hardware-acquisition

diff --git a/README.hardware-acquisition b/README.hardware-acquisition
new file mode 100644
index 00000000..8f196109
--- /dev/null
+++ b/README.hardware-acquisition
@@ -0,0 +1,310 @@
+====================================
+# HARDWARE ACQUISITION FOR OSSTEST #
+====================================
+
+This document can be used as a checklist when procuring hardware for
+an osstest instance.  A few of the points have details specific to the
+Xen Project test lab in Massachusetts, but most of it will be relevant
+to all osstest installations.
+
+
+POWER
+=====
+
+osstest needs to turn each host on and off under program control.
+
+When a host is power cycled, all state in it must be reset.  This
+includes onboard control and management software (eg IPMI), since such
+systems can be buggy and bugs in them can be provoked by bugs in
+system software (ie, buggy versions of Xen can break the LOM, even if
+the LOM, unusually, is not simply flaky).
+
+However, it is often necessary to use the LOM (Lights Out Management)
+as part of the poweron/poweroff sequence as otherwise some machines
+draw enough current to wear out our mains PDU contacts too quickly.
+
+(I use the English word `mains' for the single phase 110V/220V-240V AC
+electrical power supply prevalent in datacentres.)
+
+Requirements for typical server hardware
+----------------------------------------
+
+ * If the system has a LOM it should be driveable with Free Software,
+   eg via the IPMI protocol.
+
+ * Redundant PSUs are not required.
+
+ * Provisioning: One PDU port is required per host.
+
+Requirements for embedded or devboard hardware
+----------------------------------------------
+
+ * There must be arrangements to control the actual power supply
+   to each board (node).  Options include:
+
+     (i) Each node has a separate mains power supply, each of which
+         we will plug into a PDU port.
+
+     (ii) A separate management or PDU board or backplane, which
+         has one single mains power input and which has relays
+         or similar to control power to individual nodes.
+         The management system must have its own separate network
+         connection and not be at risk of corruption from
+         bad software on nodes.
+
+ * Provisioning:
+    + Number of PDU ports required depends on the approach taken.
+    + With a separate PDU controller, a switch port is required.
+
+
+SERIAL
+======
+
+We always use hardware serial for console output.  This is essential
+to capture kernel and hypervisor crash messages, including from early
+boot; as well as bootloader output, and so on.  We use our own serial
+concentrator hardware, separate from the systems under test.  Built-in
+console-over-LAN systems (eg IPMI serial over LAN) are not reliable
+enough for our purposes.
+
+Requirements for typical server hardware
+----------------------------------------
+
+ * At least one conventional RS232 UART, accessible to system
+   software in the conventional way.
+
+ * For ARM, supported as console by both Xen[1] and Linux[2].
+
+ * Presented on a standard 9-pin D connector.  (RJ45 is acceptable
+   if we know the pinout.)
+
+ * Provisioning: one serial concentrator port required per host.
+
+Requirements for a embedded or devboard hardware
+------------------------------------------------
+
+ * At least one suitable UART
+
+ * Supported in software by both Xen[1] and Linux[2]
+
+ * With suitable physical presentation:
+    (i)
+       + Proper RS232 (full voltage, not TTL or 3.3V)
+       + presented on a 9-pin D or RJ45 connector
+       + with known pinout;
+   or
+    (ii)
+       + Connected somehow to a USB-to-serial adapter
+       + Adapter supported by Linux[2]
+       + Multiple adapters, giving one physical USB port
+         for all nodes (ie built-in hub) preferred
+   or
+    (iii) Some other suitable arrangement to be discussed.
+
+ * Provisioning: Requires serial concentrator port(s) and/or spare USB
+   port(s) on appropriate infrastructure host(s).
+
+
+PHYSICAL PRESENTATION
+=====================
+
+ * All equipment should be mounted inside one or more 19" rack
+   mount cases.
+
+ * In as few U as possible: usually 1U (or, exceptionally, maybe 2U)
+   for a single server-type host.  
+
+ * Forbidden: External power adapters (laptop-style mains power supply
+   bricks); external USB hubs; any equipment not physically
+   restrained.  There is no shelf in the rack.
+
+ * Pair principle: Every host or node must be part of a set of several
+   identical hosts.  This allows us to distinguish hardware faults
+   from software bugs.  (In the cases of chassis with backplane, one
+   backplane is OK.)  Conversely, we want diversity to find the most
+   host-specific bugs, so usually around two of each type is best.
+
+ * Provisioning: Enough rack space must be available.
+
+
+MASS STORAGE
+============
+
+Each host needs some locally attached mass storage of its own.
+
+Requirements for typical server hardware
+----------------------------------------
+
+ * SATA controller supported by Linux[2]
+
+ * If SATA controller has multiple modes (eg, AHCI vs RAID)
+   it is sufficient for it to be supported in one mode.
+
+ * Storage redundancy is not required: one disk will do.
+
+ * SSD is not required: rotating rust is cheaper and will do.
+
+Requirements for embedded or devboard hardware
+----------------------------------------------
+
+ * Some mass storage supported by Linux[2].  Best is an onboard SATA
+   controller, connected to a SATA HDD in the same enclosure.
+   High-endurance flash drives are another possibility.
+
+ * If the hardware always starts by boot from a mass storage device,
+   that boot device must be physically read-only and separate from the
+   primary mass storage.  See BOOT ARRANGEMENTS.
+
+
+REMOTE FIRMWARE ACCESS VIA SERIAL
+=================================
+
+Configuration of the primary system firmware must be possible remotely
+using only the power and serial accesses just described.
+Specifically, interaction with the firmware via the serial port.
+
+Requirements for typical server hardware with UEFI or BIOS
+----------------------------------------------------------
+
+ * `BIOS' configuration (including the UEFI equivalent) accessible and
+   useable via BIOS `serial console redirection'.
+
+ * UEFI shell (if provided) also available via serial.
+
+ * Specifically, boot order configuration available via serial.
+
+
+Requirements for embedded or devboard hardware
+----------------------------------------------
+
+ * See BOOT ARRANGEMENTS.
+
+
+BOOT ARRANGEMENTS, NETBOOT
+==========================
+
+Every host must netboot as its first boot source.  The netboot
+configuration must be able to `chain' to the local writeable mass
+storage.  This ensures that a host can be completely wiped, even if
+bad software has corrupted the mass storage.
+
+Requirements for typical server hardware with UEFI or BIOS
+----------------------------------------------------------
+
+ * PXE and/or UEFI netboot.
+
+Requirements for embedded or devboard hardware
+----------------------------------------------
+
+ * Some firmware must be available and provided which is capable of
+   netbooting Xen[1] and Linux[2], under control from the netboot
+   server.  A suitable version of u-boot can meet this need.
+
+ * The firmware which performs the netbooting must be on a read-only
+   storage device (flagged as such in hardware, not software) so that
+   it cannot be corrupted by system software.  So it must be on a
+   separate physical storage device to the primary mass storage (see
+   MASS STORAGE, above).
+
+ * This firmware will not usually be updated.
+
+
+NETWORKING
+==========
+
+Requirements
+------------
+
+ * Each host must have at least one RJ45 ethernet port compatible
+   with ordinary 100Mbit ethernet.   xxx
+
+ * The primary ethernet port must be compatible with Linux[2].
+
+ * In the case of a chassis with backplane, it is acceptable if the
+   chassis contains an ethernet switch, provided that it is a normal
+   and reliable ethernet switch (not a proprietary interconnect).
+
+ * In the case of a system with IPMI or similar LOM, it is best if the
+   LOM has its own physical ethernet port.
+
+
+CPU, CHIPSET, MOTHERBOARD, ETC.
+===============================
+
+General advice and preferences
+------------------------------
+
+ * We prefer multicore, multisocket and NUMA systems because they
+   expose a greater variety of exciting bugs.  But we don't care much
+   about performance and we want a wide variety of different hosts.
+   We want a mixture of systems with different CPU variants and
+   feature support.
+
+ * Memory requirements are modest.  8G or 16G per host is fine. xxx
+
+Compatibility with Xen and Linux - requirements
+-----------------------------------------------
+
+(Normally these issues are not a problem for x86, except perhaps for
+the network and storage controllers - see MASS STORAGE and NETWORKING,
+above.)
+
+ * [1] Xen: The CPU and other hardware must be supported by current
+   versions of xen-unstable, at the very least.
+
+ * [2] Linux: The CPU and other hardware must be supported by existing
+   widely available versions of Linux.  There are two principal
+   requirements:
+
+   + Baremetal boot from Debian stable or stable-backports:
+
+     A suitable Linux kernel binary which can boot baremetal on the
+     proposed hardware must be available from Debian (at least
+     `stable', or, if that is not possible `stable-backports').  It is
+     not OK to require a patched version of Linux, or a version of
+     Linux built from a particular git branch, or some such.  If the
+     required kernel is not available in Debian, the vendor should
+     first work with the Debian project to ensure and validate that
+     the Debian stable-backports kernel binaries boot on the proposed
+     hardware.
+
+   + Boot under Xen with Linux kernel built from source code.
+
+     For x86, recent Linux LTS or mainline kernel source code must be
+     able to boot under Xen, on the proposed hardware.
+
+     For ARM, there is a special Xen ARM kernel branch.  It must be
+     able to boot under Xen, on the proposed hardware.
+
+ * Board-specific Linux and Xen versions are not acceptable.
+
+ * Hardware vendor offering a "board support package" is a red flag.
+   We will not be using a "board support package".  If we are offered
+   one we will need explicit confirmation, and perhaps verification,
+   of the points above.
+
+ * For ARM systems using Device Tree: xxx what to write here ?
+
+
+RELIABILITY
+===========
+
+ * osstest stresses systems in unusual ways.  The need to completely
+   wipe the machine for each test means test hosts are power cycled
+   more often than usual.
+
+ * Random failures due to unreliable hardware are not tolerable.  Some
+   hosts do not boot reliably.  Even a very small probability of a
+   random boot failure, per boot, is intolerable in this CI
+   environment: hosts are rebooted many times a day, and a random boot
+   failure looks just like a `hypervisor could not boot' bug.  (The
+   same bug would not be noticeable in a server farm where hosts are
+   nearly never rebooted.)
+
+
+NON-REQUIREMENTS
+================
+
+ * No VGA console needed.
+ * Redundant PSUs are not needed (see POWER, above).
+ * RAID is not needed (or wanted) (see MASS STORAGE, above).
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2019-02-15 17:07 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-30 16:13 [OSSTEST PATCH] README.hardware-acquisition Ian Jackson
2018-10-30 16:27 ` Jan Beulich
2018-10-30 20:28 ` Julien Grall
2018-10-30 22:38   ` Stefano Stabellini
2018-10-31 15:39     ` [OSSTEST PATCH] README.hardware-acquisition [and 1 more messages] Ian Jackson
2018-10-31 18:37       ` Stefano Stabellini
2018-11-01 12:42         ` [OSSTEST PATCH] README.hardware-acquisition [and 1 more messages] [and 2 " Ian Jackson
2018-11-01 18:12           ` Stefano Stabellini
2018-11-02 10:16             ` Lars Kurth
2018-11-02 14:16               ` Wei Liu
2018-11-02 16:19               ` Stefano Stabellini
2019-02-15 11:56       ` [OSSTEST PATCH] README.hardware-acquisition [and 1 " Ian Jackson
2019-02-15 12:15         ` Juergen Gross
2019-02-15 13:47         ` Lars Kurth
2019-02-15 15:40           ` Ian Jackson
2019-02-15 16:04             ` Lars Kurth
2019-02-15 17:07               ` Ian Jackson
2018-10-31 14:44   ` [OSSTEST PATCH] README.hardware-acquisition Ian Jackson
2018-10-31 18:32     ` Julien Grall
2018-10-31 17:49 ` George Dunlap
2018-10-31 17:50   ` George Dunlap
2018-10-31 18:46     ` Stefano Stabellini
2018-11-01 11:29       ` George Dunlap
2018-11-01 11:49         ` Lars Kurth
2018-11-01 17:50           ` Stefano Stabellini
2018-11-02 11:37             ` Ian Jackson
2018-11-02 15:05   ` [OSSTEST PATCH] README.hardware-acquisition [and 1 more messages] [and 2 more messages] [and 2 more messages] Ian Jackson
2018-11-02 15:38     ` Julien Grall
2018-11-02 15:44       ` Ian Jackson
2018-11-02 16:10         ` Julien Grall
2018-11-02 16:40           ` Ian Jackson
2018-11-02 17:56     ` Stefano Stabellini
2018-11-02 18:08       ` Julien Grall
2018-11-02 23:44         ` Stefano Stabellini
2018-11-05 11:08           ` Julien Grall
2018-11-05 11:32             ` Ian Jackson
2018-11-09 19:48               ` Stefano Stabellini
2018-11-05 10:55       ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.