Installing a new MON box

Note: I inherited that machine part installed (i.e. some grid software + java). Plus all the hostkeys were already there:
[root@lcgmon00 ~]# locate hostcert
/etc/grid-security/hostcert.pem
/etc/tomcat5/hostcert.pem
/opt/glite/var/rgma/.certs/hostcert.pem


Documentation
The APEL FAQ.
glite (software install)
yaim (configuration)

Software
As usual this is 32 bit software running on a 64 bit OS.
To get the software go here: glite 3.1.
Here's the repo:
[root@lcgmon00 log]# more /etc/yum.repos.d/glite.repo
[glite]
name=glite
baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-MON/sl4/i386/
gpgcheck=0
enabled=1
protect=1

yum install glite-MON


mySQL
The database was in a funny state, so we just started from scratch by removing and reinstalling mysql (this is dragged form the history file, so bits might be missing):
rpm -e mysql-server-5.0.45-7.el5
ls -alrt /var/lib/mysql
rm -rf /var/lib/mysql/*
yum install mysql-server-5.0.45-7.el5 or was that just yum install mysql-server ?
/etc/init.d/mysqld restart
/usr/bin/mysqladmin -u root password 'cantputtherealpasswdhere'
/usr/bin/mysqladmin -u root -h lcgmon00.hep.ph.ic.ac.uk password 'cantputtherealpasswdhere'
/etc/init.d/mysqld restart

yum install mysql-server

This is what I have now:
[root@lcgmon00 tmp]# rpm -qa | grep mysql
mysql-devel-5.0.45-7.el5
mysql-server-5.0.45-7.el5
mysql-5.0.45-7.el5
mysql-5.0.45-7.el5
mysql-devel-5.0.45-7.el5


Ports
The mon box needs a number of open ports: 8443, 8088 and 3306 (incoming data from the CE (155.198.216.206). Which ports to open for what seems to be a well kept secret.
As I am terrible with the iptables commands for the record, here are the ones I needed:
/sbin/iptables -I RH-Firewall-1-INPUT 10 -p tcp -m tcp --dport 8088 -j ACCEPT
/sbin/iptables -I RH-Firewall-1-INPUT 11 -p tcp -m tcp -s 155.198.216.206 --dport 3306 -j ACCEPT

Here is a link to the (working) iptables.

Configuration
The siteinfo.def file (minus the passwords). Not sure about what it does to mysql - it never seems to stop it ?
/opt/glite/yaim/bin/yaim -d6 -c -s /opt/glite/yaim/siteinfo/lcgmon-site-info.def -n glite-MON

Tests
[run as root]

export RGMA_HOME=/opt/glite
not at lcgmon, but elsewhere this was needed:
export TRUSTFILE="$RGMA_HOME/etc/rgma/ClientAuthentication.props"
/opt/glite/bin/rgma-server-check
The Java version error can be ignored.

Copy a couple of entries from the old database and try to publish them
(I'd rather not try this again ...)
On the old mon box (gw25):
mysqldump -u accounting -p accounting LcgRecords --no-create-info --skip-opt --where="1 limit 10" > sql_dump.sql
On the new mon box:
mysql -u accounting -p accounting < sql_dump.sql
To check that the records are correctly inserted:
mysql -u accounting -p
mysql> use accounting;
mysql> select count(*) from LcgRecords;

And then run the APEL publisher: (check the cron job for syntax)
[root@lcgmon00 cron.d]# more edg-apel-publisher PATH=/sbin:/bin:/usr/sbin:/usr/bin
1 2 * * * root env RGMA_HOME=/opt/glite APEL_HOME=/opt/glite /opt/glite/bin/apel-publisher -f /opt/glite/etc/glite-apel-publisher/publisher-config-yaim.xml >> /var/log/apel.log 2>&1

And check the apel log file: /var/log/apel.log

Putting the new mon box into service
If the tests pass successfully (ask Cristina) edit on the CE /opt/glite/etc/glite-apel-sge/parser-config.xml (or whatever file is specified in the apel cron job in /etc/cron.d) to point to the new mon box.
See also the section on issues on how to stop the CE from trying to reprocess all log files in one go (it *will* run out of memory).

Issues
ce00 is a glite 3.0 CE and needs three types of files for the accounting:
/var/log/messages
/var/log/globus-gatekeeper.log
/var/spool/sge/default/accounting/accounting
Unfortunately we only kept the last 4 weeks worth of /var/log/messages as this was the default set in /etc/logrotate.conf (+ /etc/logrotate.d/syslog).

The default memory usage is set to 256 Mb in /opt/glite/bin/apel-sge-log-parser. As our CE now has 4GB worth of memory, I increased this to 512Mb (look for $JAVA_HOME/bin/java -Xmx512m).

Tell the new MON box which files have already been processed:
On old mon box:
mysqldump -u accounting -p --no-create-info --skip-opt accounting LcgProcessedFiles > files_dump.sql
And then insert it into the new database with:
mysql -u accounting -p accounting < files_dump.sql
It does help to make sure you've published everything else from the old mon box before doing this.

bdii
The mon box had been up for over a year, before I realized that it has a bdii component which wasn't working - clearly it's not vital.
ldap wasn't running, as the default /opt/bdii/etc/DB_CONFIG doesn't work in our setup. Using this DB_CONFIG allowed ldap to start.
Open the required port:
/opt/bdii/var/bdii.log
Still have errors in /opt/bdii/var/tmp/stderr.log, but as usual they are not very informative. After random googling, I replaced all instances of GlueTop in the ldif files in /opt/glite/etc/gip/ldif with MDS and now ldapsearch -x -H ldap://lcgmon00.hep.ph.ic.ac.uk:2170 -b mds-vo-name=resource,o=grid gives a sensible looking result.


Decommissioning the old box (gw25)
As they say.... my sql is weak, but my googling is strong.
MySql webpages:
fedora
mysql.com
random

On gw25, make sql dump (1.8 GB it turns out, make sure there is enough space, in my case only avalibale in /usr....):
mysqldump -u accounting -p accounting > /usr/accounting_dump.sql

Copy over to my machine and try and reconstruct the database (showing my lack of knowledge in mysql -- this is all donw as root):
mysql does not run by default:
/etc/init.d/mysqld restart
Set a root password:
/usr/bin/mysqladmin -u root password 'myrootpassword'
mysql -u root -p
show databases;
use mysql;
SELECT * FROM user\G (the \G is for better formatting, no ';' needed)
to check what is in there.
/usr/bin/mysqladmin -u root -h deathstar.hep.ph.ic.ac.uk password 'myrootpassword'
fails as there is no entry for deathstar (why?)
UPDATE user SET Host = 'deathstar.hep.ph.ic.ac.uk' where Host = 'localhost.localdomain';
Now it'll work.
There is still anonymous user access (user field is empty).
Remove this by doing:
DELETE FROM mysql.user WHERE User = '';
Are there still any rows without passwords ? Try (replace 'root' and '127.0.0.1' as appropriate):
SET PASSWORD FOR 'root'@'127.0.0.1' = PASSWORD('myrootpassword')
To create a user called 'accounting':
CREATE USER 'accounting'@'localhost' IDENTIFIED BY 'someotherpassword'
GRANT ALL PRIVILEGES ON *.* TO 'accounting'@'localhost' WITH GRANT OPTION;
(Ok, this is another admin account, but I had it for the time being.)

Now go for it:
mysql -u accounting -p accounting < accounting_dump.sql
mysql -u accounting -p
use accounting
select count(*) from LcgRecords;
This is good enough for me (in /root/gw25).