SIDEBAR
»
S
I
D
E
B
A
R
«
subversion.cch.kcl.ac.uk is moving server on Friday 27th June
Jun 19th, 2014 by Tim Watts

UPDATE – date shifted again, due to last minute clash.

The old subversion server (known as svn.cch.kcl.ac.uk and subversion.cch.kcl.ac.uk) lives on old hardware which is not backed up.

I have a new setup with essentially identical config on a new Debian 7 virtual machine.

On Friday next week, between 1-1:30pm, I will switch the servers around. Please do not attempt to do an SVN checkout or commit during this half hour window.

You do not need to do anything else – the server should look identical as far as the client software is concerned.

Obviously – if you see anything that seems wrong after 1:30pm on Friday, please email me.

Thank you,

 

Tim

Schenker and ccedb.org.uk restarted
Jun 16th, 2014 by Tim Watts

Nagios reported sdo2.cch website had failed (all domains, all vhosts) and also http://ccedb.org.uk/

Both are now running correctly.

VMWare vForum at Wembley
Jun 12th, 2014 by Tim Watts

The usual vendors showing their wares, some interesting talks on new VMWare related software and some interesting ideas.

The most interesting idea from my point of view is this:

http://www.hgst.com/solid-state-storage/enterprise-ssd/pcie-ssd coupled with Caching layer software.

Basically you put mega fast flashram (or loads of extra RAM) into your VMWare hosts and the PernixData software inserts it as a read/write caching layer between VMWare guests and external normal storage SANs. Interestingly it’s not that expensive (around $4-5 per GB) as they are using MLC commodity flash chips rather than the exceedingly expensive industrial SLC chips. Wisdom has it that current MLC is robust enough for this type of use, thanks to the fact it’s been developed and manufactured so much more due to demand for decent flash for phones and *pads.

To avoid issues with sudden host failure or sudden flash card problems, the PernixData software replicates the cache between hosts over one of the private VMWare network links.

ereed server rebooted to correct website failure
Jun 10th, 2014 by Tim Watts

http://ereed.cch.kcl.ac.uk/ is now working correctly.

CeRch Drupal server failed – now fixed
Jun 2nd, 2014 by Tim Watts

The virtual disk on which the MySQL database lives went full, causing a number of sites to fail, including Strandlines and Historic Weather.

Apologies for the inconvenience.

VMWare Cluster Disk Failure
May 30th, 2014 by Tim Watts

Update: Now replaced

Nothing to worry about – Disk 19 failed. The system has one hotspare remaining and is functionally normally.

Dell have dispatched a replacement disk that should be here today.

Backup server down
May 19th, 2014 by Tim Watts

Update – 19/5/2014 – Cause of failure determined to be one of the ITS network switches had failed. This has been fixed and everything is working normally.

Update – 15/5/2014 16pm – OK – Almost all backup sets have been retried with success. I think this will be safe for the time being until I do some hardware tests on the server.

Update – 15/5/2014 11am – It’s back. Thanks to Paul V who was passing and very kindly peplugged its network into another CISCO switch.

It is currently running the backups that it should have done last night. However it’s Remote Access (LOM) card seems to have failed which may have caused the network glitch (that model of LOM shares the host servers Network Interface). It’s a recycled server – but we have several spares of that model so I should be able to effect a repair by Monday.

In the meantime, please continue to be careful as the backup system is “at-risk”.

 

Right now, I cannot ping miner.cch.kcl.ac.uk – this lives in Drury Lane and is physically where our backup files live.

I have reason to believe that it is a network issue not a server problem, but we are investigating.

For now, please be on guard as I cannot restore any files nor are backups currently running. We should still be covered by the last successful backups from 2 days ago though.

gsr2.cch failed
May 2nd, 2014 by Tim Watts

Due to the SAN LUN behind it filling. This has now been corrected and gsr2 restarted.

Bug in Password Expiration email program
Apr 28th, 2014 by Tim Watts

I have to apologise publicly.

There was a weird bug in the program that emails 3-2-1-0 weeks warning emails to account holders when their DDH LDAP password is about to expire.

The mail module had an undocumented feature that caused it to do a perl die() if the SMTP server rejected the email at send time, which it might if a KCL user had left. This then prevented the rest of the people due for a warning message, from getting one.

I have now fixed this so all should be back to normal – but if anyone says they cannot log in, please let me know their name so I can check and fix their account.

Apologies for the inconvenience.

 

Tim

Updated the backup server miner to Debian 7
Apr 14th, 2014 by Tim Watts

Partly to address issues with the XFS filesystem throwing errors under very high load.

OpenSSL Vulnerability (CVE-2014-0106 aka “Heartbleed”)
Apr 10th, 2014 by Tim Watts

Firstly – kudos to Miguel for alerting me to the existence of this:

http://heartbleed.com/

It is a bug in the OpenSSL library that is used on many of our servers to implement https:// amongst other things. The bug allows 64KB blocks of RAM to be read from the server’s process space which in turn can be used to leak the SSL private key. Once an attacker has the private key they can decrypt all the SSL protected traffic to and from that server – eg steal passwords and other sensitive data.

I can report that we are now fully patched and the latest scan I’ve done shows that we appear to have no vulnerable servers. Only our Debian 6 and 7 servers had issues and also had patches available from Debian. Debian 4 5 and SuSE servers do not have this bug.

As some of the patching work was rushed, I apologise for the un-scheduled rebooting of many servers last night and this morning.

Fixed several tomcat issues
Mar 26th, 2014 by Tim Watts

tc-liv-3, tc-stg-1, tc-dev-1 all locked up. Now OK.

Known issue in Confluence
Mar 21st, 2014 by Tim Watts

I am aware of some attachments (including images) becoming unavailable – eg clicking on a link to a PDF in a page yields nothing.

If anyone else notices specific Confluence pages that have suddenly lost their attachments or images, please could you add your URLs to this Mantis issue:

https://issuetracker.cch.kcl.ac.uk/view.php?id=6076

I will be raising a support request with Atlassian tomorrow (Friday) about this.

Cheers – Tim

Update – Friday 21st March. The attachments are (mostly) still inside Confluence. However the links to access the documents are generating 505 errors which is new! This has been logged as a priority call to Atlassian – should have some news in a few hours.

ocve2.cch rebooted
Mar 21st, 2014 by Tim Watts

Tomcat had died – mostly affecting cfeo.cch.kcl.ac.uk (ocve2 runs tomcat for cfeo).

Restarted my-liv-1 MySql server
Mar 20th, 2014 by Tim Watts

The MySql daemon was comsuming 100% CPU for a long time, becoming unresponsive.

It’s fine now.

SIDEBAR
»
S
I
D
E
B
A
R
«
»  Substance:WordPress   »  Style:Ahren Ahimsa