EMI 3 CREAMCE

This install refers to Update 6 of the EMI3 CREAMCE release. The underlying operating system was CentOS release 6.4 (Final).


(0) ** Preliminaries **
yum install yum-priorities
yum install yum-protectbase
(both needed by emi-release-3.0.0-2.el6.noarch.rpm)

SGE
cd /var/sgeCA
ln -s sge_qmaster port6444


(1) ** Repos **

(a) EMI
EMI3 Configuration guide
wget http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6/x86_64/base/emi-release-3.0.0-2.el6.noarch.rpm
rpm -i emi-release-3.0.0-2.el6.noarch.rpm

[root@cetest00 ~]# ls -l /etc/yum.repos.d/
total 36
-rw-r--r--. 1 root root 2545 Jul 22 11:24 CentOS-Base.repo
-rw-r--r--. 1 root root 638 Feb 25 08:57 CentOS-Debuginfo.repo
-rw-r--r--. 1 root root 630 Feb 25 08:57 CentOS-Media.repo
-rw-r--r--. 1 root root 3664 Feb 25 08:57 CentOS-Vault.repo
-rw-r--r--. 1 root root 253 Mar 7 15:10 emi3-base.repo
-rw-r--r--. 1 root root 264 Mar 7 15:10 emi3-contribs.repo
-rw-r--r--. 1 root root 273 Mar 7 15:10 emi3-third-party.repo
-rw-r--r--. 1 root root 262 Mar 7 15:10 emi3-updates.repo
-rw-r--r--. 1 root root 601 Jul 22 11:25 ICHEP.repo

(b) Certificates
wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo -O /etc/yum.repos.d/EGI-trustanchors.repo

(c) EPEL
wget http://www.nic.funet.fi/pub/mirrors/fedora.redhat.com/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm


(2) ** Installation **
yum install ca-policy-egi-core
yum install emi-cream-ce
yum install emi-ge-utils


(3) ** Hostcert and hostkey **
use cert and key rescued from EMI2 install
[root@cetest00 ~]# ls -l /etc/grid-security/host*
-rw-r--r--. 1 root root 2047 Jul 22 13:14 hostcert.pem
-rw-------. 1 root root 1820 Jul 22 13:14 hostkey.pem
I use the cert_sorcerer for certificate renewal on the machine. (This is only relevant for UK sites.)


(4) ** Configuration **
Before running yaim for the first time only: touch /etc/lrms/scheduler.conf
/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/siteinfo/site-info-cetest00.def -n creamCE -n SGE_utils
I use this script to generate my users.conf and groups.conf.

At this point set /etc/lrms/cluster.state to Draining to finish configuration in peace.


(5) ** Downgrading APEL **
EMI3 APEL on the CE is incompatible with a EMI2 mon (APEL) box, but an EMI3 mon (APEL) box is incompatible with EMI2 apel on the CEs. Yay !
EMI3 to EMI 2 downgrade instructions.
wget http://repository.egi.eu/sw/production/umd/2/repofiles/sl5/UMD-2-base.repo -O /etc/yum.repos.d/UMD-2-base.repo
wget http://repository.egi.eu/sw/production/umd/2/repofiles/sl5/UMD-2-updates.repo -O /etc/yum.repos.d/UMD-2-updates
yum install glite-apel-sge
Make config file based on EMI2 CE and install cron job. The .gz to .gz filter script can be found here. Afterwards remove UMD repos from harm.
As the machine has not changed name or ip address no changes on the APEL box (lcgmon01) are necessary.
Check it publishes on mon box:
mysql -u root -p accounting
mysql> use accounting;
Database changed
mysql> select Max(EventDate), SubmitHost from EventRecords group by SubmitHost;

(6) ** bdii **
/etc/hosts.allow add "slapd: ALL"
ldapsearch -LLL -x -H ldap://cetest00.grid.hep.ph.ic.ac.uk:2170 -b o=grid | perl -00pe 's/\r*\n //g'
check that ... part works (no 44444):
ldapsearch -LLL -x -H ldap://cetest00.grid.hep.ph.ic.ac.uk:2170 -b o=grid | grep 4444 should be empty
(check bdii-update.log otherwise)


(7) ** Cancelled by CE admin **
Both the ldap and the tomcat user must be able to run qstat.
Check that cert, key and directory have the correct ownership:
chown tomcat:sgeadmin /var/sgeCA/sge_qmaster/default/userkeys/tomcat
chown tomcat:sgeadmin /var/sgeCA/sge_qmaster/default/userkeys/tomcat/cert.pem
chown tomcat:sgeadmin /var/sgeCA/sge_qmaster/default/userkeys/tomcat/key.pem
chown ldap:sgeadmin /var/sgeCA/sge_qmaster/default/userkeys/ldap
chown ldap:sgeadmin /var/sgeCA/sge_qmaster/default/userkeys/ldap/cert.pem
chown ldap:sgeadmin /var/sgeCA/sge_qmaster/default/userkeys/ldap/key.pem
otherwise there's the now legendary "[description=Cancelled by CE admin] [failureReason=reason=3]" failure message in /var/log/cream/glite-ce-cream.log


(8) ** Local modifications (hacks) **
All hacks are summarised in a script I run after yaim: post_yaim_hacks.sh.
(a) sge_filestaging: Take into account shared home dirs.
(b) sge_submit.sh: Add rhel 5 or 6 switch, add accounting string to make log files more human readable.
(c) jobwrapper.tpl: Add line to source tar ball setup; ensure that jobs run in batch dir, not in home dir, add wmteg script
(d) filter_accounting_files.sh: Check if file exists. See (5) for details.
(e) cleanup-grid-accounts: remove cron job, this is done centrally
(f) glite-info-dynamic-ge: take into account our weird setup on we000
(g) glite_cream_load_monitor.conf: change number of allowed ftp connections
(h) restorecon -v /etc/my.cnf: yaim resets context, file will be silently ignored

Stealth Upgrade (EMI2 -> EMI3, keeping EMI2 APEL)
[lost in time and space]

'Fixing' things
The only way to clean the blah job db:
runuser -s /bin/bash -c "/usr/sbin/blah_job_registry_purge /var/blah/user_blah_job_registry.bjr 1509494400" tomcat
cleaning up the CREAM DB (rarely causes problems)
/usr/sbin/JobDBAdminPurger.sh -s done-failed,3
/usr/sbin/JobDBAdminPurger.sh might need adding:
classPath=${classPath}:/usr/share/java/bcprov.jar (after classPath=${classPath}:/usr/share/java/canl.jar)