EMI CREAM with SGE
This refers to EMI1, Update 10, fresh install.
(0) Documentation
System
Administrator Guide for CREAM
Know
issues
Trouble Shooting
(1) Preliminaries
yum install yum-protectbase.noarch
yum install yum-priorities
(2) Certificates
wget
http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo
-O /etc/yum.repos.d/EGI-trustanchors.repo
yum install ca-policy-egi-core
openssl pkcs12 -clcerts -nokeys -out hostcert.pem -in cetest00.p12
openssl pkcs12 -nocerts -nodes -out hostkey.pem -in cetest00.p12
chmod 600 /etc/grid-security/hostcert.pem
chmod 400 /etc/grid-security/hostkey.pem
(3) EMI repos
cd /etc/pki/rpm-gpg/
wget http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RPM-GPG-KEY-emi
cd /etc/yum.repos.d
wget
http://emisoft.web.cern.ch/emisoft/dist/EMI/1/sl5/x86_64/updates/emi-release-1.0.1-1.sl5.noarch.rpm
rpm -i emi-release-1.0.1-1.sl5.noarch.rpm
I now have three emi repos:
emi1-base.repo,
emi1-third-party.repo and
emi1-updates.repo,
and two epel repos: epel.repo, epel-testing.repo.
(4) Install software
yum install xml-commons-apis
yum install emi-cream-ce
yum install emi-ge-utils
yum install gridengine-qmaster
(5) Make some special users
Note that the default values for users in
/opt/glite/yaim/examples/edgusers.conf are already used on our system,
so I just add 100 to each. Though on reflection it would have been better to
add 400 to each to get around the 'root is not allowed to run cron jobs for
users with uid < 500' setting (only relevant for user 'glite').
groupadd -g 200 glexec
useradd -m -g glexec glexec
groupadd -g 201 glite (groupadd -g 255 glite)
useradd -m -u 201 -g glite glite (useradd -m -u 255 -g glite glite)
groupadd -g 252 edguser
useradd -m -u 252 -g edguser edguser
(6) SGE specifics
Link port6444 to sge_master (in /var/sgeCA/)
ls -l
lrwxrwxrwx 1 root root 11 Nov 30 16:21 port6444 -> sge_qmaster
drwxr-xr-x 3 root root 4096 Oct 13 14:09 sge_qmaster
everybody and their grandmother need to be able to run qstat:
chown -R ldap:sgeadmin /var/sgeCA/sge_qmaster/default/userkeys/ldap
chown -R edguser:sgeadmin /var/sgeCA/sge_qmaster/default/userkeys/edguser
chown -R tomcat:sgeadmin /var/sgeCA/sge_qmaster/default/userkeys/tomcat
When restarting the bdii make sure
export PYTHONPATH=/usr/lib/python:$PYTHONPATH
is set.
bdii
Our queue configuration is somewhat special as we have one machine which has a
short version of our queue (for ops tests):
[root@ceprod07 ~]# qconf -sq grid.q | grep -E 's_rt|h_rt|s_cpu|h_cpu'
s_rt 49:05:00,[we000.grid.hep.ph.ic.ac.uk=1:00:00]
h_rt 49:05:00,[we000.grid.hep.ph.ic.ac.uk=1:02:00]
s_cpu 48:00:00
h_cpu 48:05:00
Regurlar users won't see this, but the bdii advertises the minimum of all
walltimes (calculated in /usr/libexec/glite-info-dynamic-sge).
To get the bdii to advertise the correct values, I need to change
$QUEUE_minlimits{$q}->{'rt'} = &minval( $QUEUE_minlimits{$q}->{'rt'},
$wallclocktime ); (~ line 770) to $QUEUE_minlimits{$q}->{'rt'} = &maxval(
$QUEUE_minlimits{$q}->{'rt'}, $wallclocktime );
(7) Run yaim
/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/siteinfo/siteinfo_cetest00.def -n creamCE -n SGE_utils
Note that for cream-ce CMS requires the * notation in the groups.conf file (and
here's the users.conf file for completeness).
(8) The bdii (selinux)
slapd: ALL in hosts.allow
(this is wrong, apparently) semanage fcontext -a -t slapd_db_t "/var/log/bdii(/.*)?"; restorecon -vR
/var/log/bdii/
semanage port -a -t ldap_port_t -p tcp 2170
(9) Hacks (local configuration)
All combined in the
post_yaim_hacks.sh script
- /usr/bin/sge_filestaging:
shared home dirs
- /usr/bin/sge_submit.sh: log
submitting CPU name in accounting
- /etc/blah.config blah.config: fix KNOWN ISSUE in Update 10
- /etc/glite-ce-cream/cream-config.xml: sandbox location (CREAM_SANDBOX_DIR
specifies where the WN thinks the sandboxes are mounted)
- /var/lib/bdii/gip/plugin/glite-info-dynamic-scheduler-wrapper: fix KNOWN ISSUE
in Update 10
- /var/lib/tomcat5/webapps/ce-cream/WEB-INF/jobwrapper.tpl: change running
directory for job from home dir to batch dir, add environment variable for
dCache, source WN script (WN is a tarball install)
- /etc/tomcat5/tomcat5.conf:
modified memory settings
- /etc/glite-apel-sge/parser-config-yaim.xml: Accounting files are stored in
bzip2 format, apel needs gzip, point to converted files
- /etc/cron.d/edg-apel-sge-parser: add script for bzip ->
gzip to cron job
- Edit voms.gridpp.ac.uk.lsc in vomsdir to deal with the dodgy UK
certificate.
- Also check my.cnf.
I use mysqltuner.plto get a
handle on the mysql settings.
- The clean home dirs script (I just use the last version that worked, to
avoind problems with permissions).
(10) On the worker nodes
Edit the cream-sge.sh script located in
/usr/bin on the worker nodes to recognize the new CE as a cream CE.
(11) mysql queries
[root@ceprod07 ~]# mysql -u creamdbuser -p
Enter password:
mysql> use creamdb;
mysql> select * from job_status WHERE jobId = 'CREAM890544290';