- LOVE:
- Running Debian GNU / Linux 4.0 (a.k.a. etch)
- Dell Power Edge SC1425 - SATA - Dual Xeon 3.4GHz/2MB, 800FSB
- 8GB Dual Rank DDR2 Memory (4x2GB) 1
- 2 x 250GB SATA (7,200rpm) 1in Hard Drive
- Upg to Silver 3Y (24x7) Premier Enterprise Support
New, new 2010 server planning:
GOALS:
- Separate filesystem storage from computation
- Do backups UNDERNEATH the individual servers, in a single comprehensive system
- Snapshotting to allow for rollback
- Provision for true backup (eg to tape)
- Architecture designed for eventual scale-out
- But consolidated onto single fat server for short-term cost-effectiveness
- And then replicate this single fat server
- Can later add more front-end CPU servers, and re-task the existing fat servers as dedicated filesystem machines, without moving any files
- Even later than this, can then scale out storage further, without having to change front end CPU servers; eventual architecture is many CPU servers, many fileservers.
- Should be trivially able to provision new VMs, completely automated creation of NFS tree and VM image; similarly, modification (disk quota, RAM allocation) and deletion
- Individual VM owners should be able to completely control their databases.
- Assume we cannot ensure atomic preservation of DB file state across hardware failures etc.
- So must rely on filesystem-level backup
- Can we rely on atomic saving of binlogs?
- Should be responsible for doing their own daily backups to dumps, + binlogs
- But we will set all this up as part of the standard VM template
- BUT NFS does not permit atomic append operations! (Is this a problem? binlogs should be single-writer.)
- Use nfsv4 for better locking etc.? Certainly easier from protocol viewpoint, provided it's completely stable.
THINKING:
Look at http://backdrift.org/live-migration-and-synchronous-replicated-storage-with-xen-drbd-and-lvm
THINKING:
Different levels of storage resilience
- RAID protects against physical disk failure
- Replication defends against whole-RAID or whole-server failure
- Snapshotting defends against data destruction WITHIN filesystem trees
- Backup (to tape!) defends against low-level corruption of the fileserver's own disk filesystem, or insidious corruption, or whole-facility disaster
THINKING:
- Separate storage from VMs
- Mount VMs on central NFS server, instead on virtual disk images
- More robust, because disk writes to FS really write to physical disk, less chance of corruption
- Flexible allocation of storage (can we do quotas for isolation?)
- Easy to port storage between boxes: physical, virtual, different boxes
- Start with doing NFS server on VM on same box as user VMs
- Allows centralized backup of all projects
- Snapshots on fileserver are VERY GOOD for systems admin!!
So: initially a single multiprocessor box
- dom0: hosts VMs, and nothing else!
- NFS domU: hosts NFS server, backup scripts/processes
- Other domUs: individual projects -- only have /boot partitions as virtual disks!
And a second box for backup.
Either:
- Master/slave hot spare, slave just mirrors master until cutover
- Live/live system: some projects on each, each mirrors the other symmetrically
Underlying filesystem:
- BTRFS still experimental
- ZFS? -- only user-space on Linux!
- Ext4? -- Looking good, but snapshotting?
- Use LVM for snapshotting?
Use Time Machine-like non-atomic "snapshotting"??? Needs more care about backup window, scheduling of backups.
Or do we expose LVM volumes via NFS, and mount filesystems on these? Hits the 2-layer problem again.
What about databases?
- Live block-level replication?
- Or a central database server, AS WELL AS the central NFS server, and do database-level replication?
- What happens to inter-project isolation if we do this? Have to rely on DBA security, ugh.
If we put the DB server on a separate VM, with direct access to physical disk, we cannot give VM DBAs DB root access -- multilayer security isn't really a DB thing -- so VM DBAs would need to request new DB creation. Not practical.
- Or do we limit database resiliency to recovery from snapshots?
- Or expose physical filesystem redundancy to databases: run 2 instances of a DB, each mounted on a physically different filesystem, and do replication between these
- This only makes sense for the very most important databases
- What about journalling recovery from crashes? Database snapshotting?
Different possible levels of database resilience:
- 1 Can wind back to a sane snapshot, lose at most a couple of days of work
- 2 Can replay logfiles, lose at most a few seconds of work
- 3 ACID: never lose a transaction = snapshots + logfiles + transactional consistency for logfile writes
Current server is 2 x dual-core 32-bit Xeon, 8G RAM, 250G disk, RAID 1
Next generation server:
- 2 x quad-core 64-bit Nehalem architecture Xeon, 32G RAM, 500G to 1 Tbyte disk, RAID 1
CPUx2, DISKx2or4, RAMx4
And then TWO OFF of each of these
Each of these functions as both an NFS server, and a Xen host. Depending on architecture choice, we may either run the NFS server on the Dom0 directly, or on a single DomU? dedicated to that function.
Approximate pricing:
DELL R905, 32G RAM, dual-power, 2 x Xeon E5540, each with 4 64-bit cores @ 2.53 GHz, 2 x 500G disk: £3466 + VAT
See:
If we buy our own RAM and disks it gets cheaper: use 15000rpm SAS instead of 7200rpm SATA?
- Is there much difference between SATA and SAS in performance?
Host in a data centre? If not, must add managed UPS for each server.
- We have to host in data centres really.
Finally, we need a tape machine or similar capable of backing up 1 Tbyte.
- We'll back up to my office machine over the wires so no need for tape.