SIDEBAR
»
S
I
D
E
B
A
R
«
tc-liv-4 restarted
Oct 31st, 2013 by Tim Watts

The tomcat services had failed affecting a number of websites.

All are now working correctly.

ukriss blog fixed
Oct 28th, 2013 by Tim Watts

Due to a previous episode where the filesystem went full, the blog at https://ukriss.cerch.kcl.ac.uk/ failed.

This has now been fixed – apologies for the inconvenience.

Blogs server (parrot) failed
Oct 21st, 2013 by Tim Watts

I have just added some more RAM as it ran out of memory and killed the MySQL database daemon.

Full service has been resumed.

System backups all running correctly at full speed
Oct 17th, 2013 by Tim Watts

The restructuring of the backup system has proven successful – the backups are now completing in a timely manner and all virtual servers are now protected again.

Just to remind folk:

This is primarily a disaster recovery system – but we can *usually* get you files back from the last 30 days.

Postgresql servers run local backups so we can recover individual databases.

Same for SCM mercurual, SVN and GIT repositories.

However, the myriad MySQL servers do not hot-backup so we can only restore teh whol server, not individual databases.

Repairing backup systems
Oct 10th, 2013 by Tim Watts

The backup system was still running very slowly (to the point of not being able to reliable do backups in the time available).

Apart from the file fragmentation issue, I also discovered that MS-SQL-Server that the backup system uses was running very poorly with a massive transaction log.

This (and the MS SQL servers on the VMWare management server and the Veeam VMWare monitoring server) needed a reindexing and compaction maintenance job which was time consuming.

This is now done. The backup system is at risk for the next few days of not being able to restore certain machines until it has had a chance to catch up.

Backup service suspended for 2 days
Oct 7th, 2013 by Tim Watts

miner.cch , the backup repository in Drury Lane has a very fragmented file system.

Running a defragger in parallel with the backup system is causing both to run excessively slowly, so I have turned off backups until Wednesday to give the defragger a chance.

ereed.cch rebooted
Oct 7th, 2013 by Tim Watts

Webserver had fallen over.

System normal now.

Backup server (miner) is straining
Oct 3rd, 2013 by Tim Watts

And running at a very high load average.

Checks show that the 19TB XFS filesystem is 97% fragmented (yes that still happens – let no-one claim otherwise!).

Now running xfx_fsr to defrag the filesystem.

Rebooted servers behind http://www.elta-project.org/
Oct 2nd, 2013 by Tim Watts

The service had stopped responding.

It is fine now.

MySQL server my-liv-2 failed
Oct 1st, 2013 by Tim Watts

due to a full /var/lib/mysql filesystem. This has now been rectified and the 20 odd project servers affected have been checked and restarted where necessary.

SIDEBAR
»
S
I
D
E
B
A
R
«
»  Substance:WordPress   »  Style:Ahren Ahimsa