SIDEBAR
»
S
I
D
E
B
A
R
«
http://strandlines.net/ down
May 26th, 2015 by Tim Watts

There appears to be a problem with the MySQL database drupal is using – apologies for the inconvenience.

UPDATE

It’s back now – the problem was the cache_block table was corrupt and needed repair.

New Systems Monitoring feature
Apr 22nd, 2015 by Tim Watts

I have just made an addition to the systems monitoring carried out by theĀ DDH Nagios Server.

It now also tests the SSL certificates on each service that is supposed to have a paid for trusted certificate. The results of that can be seen in this Nagios SSL summary. The warning state (yellow) happens some time in advance of the certificate expiring to give us time to order new ones. The warning will turn to critical (red) 7 days before the certificate actually expires to give enough time to deploy the new certificates.

Tim

mkcheur.cch disk full
Apr 15th, 2015 by Tim Watts

It seems the logrotate process stopped working – now correcting and rebooting.

Apologies for the inconvenience.

Disk full on http://db.pbw.kcl.ac.uk/
Mar 20th, 2015 by Tim Watts

I have added some improved log rotation. Service will be restored in a few minutes.

Apologies for the loss of service.

pmsa2 needs a reboot
Feb 5th, 2015 by Tim Watts

As the website has stopped responding.

WordPress server blogs.cch rebooted
Feb 2nd, 2015 by Tim Watts

A few blogs were unhappy. Apologies for any inconvenience.

Few machines having trouble this morning
Jan 12th, 2015 by Tim Watts

tc-liv-3 had a problem with a jammed NFS mount and jainpedia needed a reboot. Now happy.

InsAph had killed its webserver too. Restarted and happy.

Scrambled has a problem and is being investigated now.

diamm rebooted
Jan 8th, 2015 by Tim Watts

-stg failed and lots of segfaults in iip

– Now working.

mkcheur.cch down
Jan 8th, 2015 by Tim Watts

/var has gone full – fixing now…

Update – now fixed – tomcat logs had filled the disk. Apologies for any inconvenience.

ocve2.cch.kcl.ac.uk /vol/ocve2 failed to mount
Jan 7th, 2015 by Tim Watts

A few services related to ocve2.cch and cfeo.cch are currently down as /vol/ocve2 is failing to mount. There’s some damage in the filesystem and xfs_repair is currently running.

 

Will advise when it is back.

 

UPDATE – it is now back and running correctly. Apologies for any inconvenience.

2 disks replaced in the VMWare Cluster SAN
Jan 7th, 2015 by Tim Watts

Yesterday, Tuesday, I replaced another failed disk and pre-emptively replaced a failing disk (3 block errors) on the advice of Dell’s tech support.

The disk array is currently happy and all is well.

pmsa2 apache restarted
Oct 13th, 2014 by Tim Watts

Something was causing apache to consume 79% CPU and become very slow.

VMWare Cluster upgrades – 2014
Jul 28th, 2014 by Tim Watts

UPDATE 23rd July: vmhost-12 is now running with 192GB RAM..

ULCC1ULCC2ULCC3

 

Dear all,

We are expecting a new disk array to arrive in a few weeks to add to our VMWare cluster along with additional RAM. We will be adding 6TB of additional fast storage to the cluster based usingĀ an Equallogic PS4100X SAN with will pair with our existing PS6500E unit.

We will also be doubling the host RAM on each server to 192GB. This will ensure our cluster remains state of the art for our needs over the coming years and that we can allocate more memory to virtual servers.

In order at add the disk array and RAM, I will need to take each of the 4 VMWare servers down a number of times. This will have little impact on service, except to possibly cause some slight slowdown in the running virtual servers as they will be temporarily squeezed onto 3 servers instead of spread out over 4.

The plan is to add the RAM first which will make further operations have less impact on performance.

Dates

25th June, Noon – 2pm : vmhost-11 having a RAM upgrade. This is the first of 4, so will cause some strain to the systems as vmhost-11 is taken out of service for 2 hours. After this one is done, there will be less impact.

UPDATE 16:12 25th June: vmhost-11 is now running with 192GB RAM…

Next upgrade: 23rd July: vmhost-12 will have its RAM upgraded.

UPDATE 23rd July: vmhost-12 is now running with 192GB RAM..

Next upgrade: 28th July: vmhost-13 AND vmhost-14 will have their RAM upgraded.

New backup server coming online
Jul 8th, 2014 by Tim Watts

We have a new backup server in Drury Lane.

I’m bringing this online now – this will share the load (and the space) with the original system to give us greater backup capacity.

3 servers rebooted
Jun 20th, 2014 by Tim Watts

ootw.cch had slightly weird kernel problems.

my-liv-2 was running with a load average of 11 (that’s bad)

tc-liv-4 had failed.

All have been rebooted now. All services are running normally.

SIDEBAR
»
S
I
D
E
B
A
R
«
»  Substance:WordPress   »  Style:Ahren Ahimsa