Installing a new MON box
Note: I inherited that machine part installed (i.e. some grid software + java).
Plus all the hostkeys were already there:
[root@lcgmon00 ~]# locate hostcert
/etc/grid-security/hostcert.pem
/etc/tomcat5/hostcert.pem
/opt/glite/var/rgma/.certs/hostcert.pem
Documentation
The APEL FAQ.
glite (software install)
yaim (configuration)
Software
As usual this is 32 bit software running on a 64 bit OS.
To get the software go here: glite 3.1.
Here's the repo:
[root@lcgmon00 log]# more /etc/yum.repos.d/glite.repo
[glite]
name=glite
baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-MON/sl4/i386/
gpgcheck=0
enabled=1
protect=1
yum install glite-MON
mySQL
The database was in a funny state, so we just started from
scratch by removing and reinstalling mysql (this is dragged form the
history file, so bits might be missing):
rpm -e mysql-server-5.0.45-7.el5
ls -alrt /var/lib/mysql
rm -rf /var/lib/mysql/*
yum install mysql-server-5.0.45-7.el5 or was that just yum install
mysql-server ?
/etc/init.d/mysqld restart
/usr/bin/mysqladmin -u root password 'cantputtherealpasswdhere'
/usr/bin/mysqladmin -u root -h lcgmon00.hep.ph.ic.ac.uk password 'cantputtherealpasswdhere'
/etc/init.d/mysqld restart
yum install mysql-server
This is what I have now:
[root@lcgmon00 tmp]# rpm -qa | grep mysql
mysql-devel-5.0.45-7.el5
mysql-server-5.0.45-7.el5
mysql-5.0.45-7.el5
mysql-5.0.45-7.el5
mysql-devel-5.0.45-7.el5
Ports
The mon box needs a number of open ports: 8443, 8088 and 3306 (incoming data
from the CE (155.198.216.206). Which ports to open for what seems to be a well
kept secret.
As I am terrible with the iptables commands for the record, here are the ones I needed:
/sbin/iptables -I RH-Firewall-1-INPUT 10 -p tcp -m tcp --dport 8088 -j ACCEPT
/sbin/iptables -I RH-Firewall-1-INPUT 11 -p tcp -m tcp -s 155.198.216.206 --dport 3306 -j ACCEPT
Here is a link to the (working) iptables.
Configuration
The siteinfo.def file (minus the
passwords). Not sure about what it does to mysql - it never seems to stop it ?
/opt/glite/yaim/bin/yaim -d6 -c -s /opt/glite/yaim/siteinfo/lcgmon-site-info.def -n glite-MON
Tests
[run as root]
export RGMA_HOME=/opt/glite
not at lcgmon, but elsewhere this was needed:
export TRUSTFILE="$RGMA_HOME/etc/rgma/ClientAuthentication.props"
/opt/glite/bin/rgma-server-check
The Java version error can be ignored.
Copy a couple of entries from the old database and try to publish them
(I'd rather not try this again ...)
On the old mon box (gw25):
mysqldump -u accounting -p accounting LcgRecords --no-create-info --skip-opt
--where="1 limit 10" > sql_dump.sql
On the new mon box:
mysql -u accounting -p accounting < sql_dump.sql
To check that the records are correctly inserted:
mysql -u accounting -p
mysql> use accounting;
mysql> select count(*) from LcgRecords;
And then run the APEL publisher: (check the cron job for syntax)
[root@lcgmon00 cron.d]# more edg-apel-publisher
PATH=/sbin:/bin:/usr/sbin:/usr/bin
1 2 * * * root env RGMA_HOME=/opt/glite APEL_HOME=/opt/glite /opt/glite/bin/apel-publisher -f /opt/glite/etc/glite-apel-publisher/publisher-config-yaim.xml >> /var/log/apel.log 2>&1
And check the apel log file: /var/log/apel.log
Putting the new mon box into service
If the tests pass successfully (ask Cristina) edit on the CE
/opt/glite/etc/glite-apel-sge/parser-config.xml (or whatever file is specified
in the apel cron job in /etc/cron.d) to point to the new mon box.
See also the section on issues on how to stop the CE from trying to reprocess
all log files in one go (it *will* run out of memory).
Issues
ce00 is a glite 3.0 CE and needs three types of files for the accounting:
/var/log/messages
/var/log/globus-gatekeeper.log
/var/spool/sge/default/accounting/accounting
Unfortunately we only kept the last 4 weeks worth of /var/log/messages as this
was the default set in /etc/logrotate.conf (+ /etc/logrotate.d/syslog).
The default memory usage is set to 256 Mb in /opt/glite/bin/apel-sge-log-parser.
As our CE now has 4GB worth of memory, I increased this to 512Mb (look for
$JAVA_HOME/bin/java -Xmx512m).
Tell the new MON box which files have already been processed:
On old mon box:
mysqldump -u accounting -p --no-create-info --skip-opt accounting LcgProcessedFiles > files_dump.sql
And then insert it into the new database with:
mysql -u accounting -p accounting < files_dump.sql
It does help to make sure you've published everything else from the old mon box
before doing this.
bdii
The mon box had been up for over a year, before I realized that it has a bdii
component which wasn't working - clearly it's not vital.
ldap wasn't running, as the default /opt/bdii/etc/DB_CONFIG doesn't work in our
setup. Using this DB_CONFIG allowed ldap
to start.
Open the required port:
/opt/bdii/var/bdii.log
Still have errors in /opt/bdii/var/tmp/stderr.log, but as usual they are not
very informative. After random googling, I replaced all instances of GlueTop in
the ldif files in /opt/glite/etc/gip/ldif with MDS and now
ldapsearch -x -H ldap://lcgmon00.hep.ph.ic.ac.uk:2170 -b mds-vo-name=resource,o=grid
gives a sensible looking result.
Decommissioning the old box (gw25)
As they say.... my sql is weak, but my googling is strong.
MySql webpages:
fedora
mysql.com
random
On gw25, make sql dump (1.8 GB it turns out, make sure there is enough space, in
my case only avalibale in /usr....):
mysqldump -u accounting -p accounting > /usr/accounting_dump.sql
Copy over to my machine and try and reconstruct the database (showing my lack of
knowledge in mysql -- this is all donw as root):
mysql does not run by default:
/etc/init.d/mysqld restart
Set a root password:
/usr/bin/mysqladmin -u root password 'myrootpassword'
mysql -u root -p
show databases;
use mysql;
SELECT * FROM user\G (the \G is for better formatting, no ';' needed)
to check what is in there.
/usr/bin/mysqladmin -u root -h deathstar.hep.ph.ic.ac.uk password
'myrootpassword'
fails as there is no entry for deathstar (why?)
UPDATE user SET Host = 'deathstar.hep.ph.ic.ac.uk' where Host = 'localhost.localdomain';
Now it'll work.
There is still anonymous user access (user field is empty).
Remove this by doing:
DELETE FROM mysql.user WHERE User = '';
Are there still any rows without passwords ? Try (replace 'root' and
'127.0.0.1' as appropriate):
SET PASSWORD FOR 'root'@'127.0.0.1' = PASSWORD('myrootpassword')
To create a user called 'accounting':
CREATE USER 'accounting'@'localhost' IDENTIFIED BY 'someotherpassword'
GRANT ALL PRIVILEGES ON *.* TO 'accounting'@'localhost' WITH GRANT OPTION;
(Ok, this is another admin account, but I had it for the time being.)
Now go for it:
mysql -u accounting -p accounting < accounting_dump.sql
mysql -u accounting -p
use accounting
select count(*) from LcgRecords;
This is good enough for me (in /root/gw25).