ARC6 Installation Guide

PREREQUISITES

Choosing the host

It is asuumed that ARC CE is installed on the top of existing Linux computing cluster. There are many Linux distributions are supported. It can be istalled on complete virtual computing cluster environment in the cloud.

ARC is not intrusive to existing system. We suggest to deploy ARC CE on the dedicated (virtual) machine connected to the cluster network and filesystem. In the limited number of cases it is possible to communicate with cluster over SSH from the completely independent remote node.

Plan for storage areas

Several storage areas are necessary for jobs execution and data storing. You should mount/export following directories:

  • session directory
  • datastaging cache directory (if planned)
  • decide to what extend to use not cross-mounted scratch directory on the worker nodes

[TODO: what is session dir] [TODO: what is cache] [TODO: local scratch]

Local resource management system (LRMS)

Install and configure your LRMS. ARC supports a variety of LRMS backends:

  • fork (default) - execure jobs on the ARC CE host. Targeted for testing and development but not for real production jobs.
  • condor - uses HTCondor powered HTC resource
  • slurm - SLURM cluster
  • pbs - any flavor of PBS batch system, including Torque and PBSPro
  • ll - Load Leveler
  • lsf - Load Sharing Facility
  • sge - Oragle Grid Engine (formely Sun Grid Engine)
  • boinc - work as a gateway to BOINC volunteer computing

Check you are able to submit jobs from ARC CE host.

You may consider to setting up dedicated queues to use with ARC CE (e.g. per-VO queues).

Please also NOTICE that in some cases (depends on LRMS) you need to share batch system logs directories with ARC CE. [TODO: link to fulldoc]

Configure OS accounts

Plan for local account (or account pools) that will be used to execute jobs on the worker nodes.

This accounts should be also available on ARC CE.

Please note that ARC services are run as root on the ARC node and switch to this local account when processing job data staging and job execution. This process is called mapping.

INSTALLATION

Install ARC CE core packages from repositories. [TODO: build from source link]. [TODO: metapackage name].

Grid security heaviely relies on PKI and anything requires certificates/keys including ARC CE and users:

  • for testing purposes Test-CA and host certificates signed by Test-CA are included [TODO: arcctl]
  • for production usage please obtain certificate signed by one of the IGTF acreditated CA.

ARC CE needs IGTF CA certificates deployed to verify users and other services like storage elements. To deploy IGTF CA certificates to ARC CE host run [1]:

arcctl deploy igtf-ca classic
[1]Use --installrepo argument to enable repositories with IGTF CA certificates if ARC is not installed from the Nordugrid repos.

CONFIGURATION

Configuration of ARC CE is done by means of modifying pre-shipped ‘zero configuration’ available at /etc/arc.conf.

The purpose of this ‘zero configuration’ is to offer a minimalistic working computing element out-of-the box right after packages installation with zero additional configuration needed.

For production deployment you need to customize the configuration in accordance to your setup.

The most common configuration steps are the following:

Configure AuthZ

AuthZ rules defines who can execute jobs on the computing element.

ARC CE authorization rules are configured with [authgroup] blocks.

In the shipped confuration the [authgroup: all] is defined that match any user.

Authgroup can be applied per-interface ([arex/ws], [gridftpd]) and per-queue with allowaccess option in corresponding block.

Example configuration 1

To authorize single (or several) person by certificate subject name (SN):
  1. Create authorization group in arc.conf and specify SN directly with subject keyword or refers to a file that contains list of SNs:

    [authgroup: staticdn]
    subject = /O=Grid/O=Big VO/CN=Main Boss
    
    [authgroup: dnfromfile]
    file = /etc/grid-security/local_users
    
  2. Apply authgroup to target interface of queue:

    [gridftpd]
    allowaccess = staticdn dnfromfile
    

Example configuration 2

To filter access based on VOMS certificate attributes define one or more [authgroup] blocks using voms keyword.

To verify VO membership signature ARC CE needs so-called list of certificates (LSC) files that can be installed by arcctl.

Example configuration for atlas VO [2]:

  1. Deploy LSC files:

    arcctl deploy voms-lsc atlas --egi-vo
    
  2. Create authorization group in arc.conf:

    [authgroup: atlas]
    voms = atlas * * *
    
  3. Apply authgroup to target interface of queue:

    [queue: atlas]
    allowacces = atlas
    

For more information about possible authgroup options, including LCAS integration please read ARC CE System Administrator manual.

[2]It this example and following there is simplified configuration, actual config in most cases includes different authgroups for different VO groups and roles.

Configure mapping

Every grid-user should be mapped to local account to start processes and access files.

In shipped zero configuration all users are mapped to the same nobody account that will work with local forking only.

You have several common options to map grid-users.

Accounts pool

The most transperrent, secure and flexible recommended method is to map authorized users to accounts pool (so-called ARC simple pool method).

In this approach every authorized (by specified [authgroup]) user will be dynamically mapped to one of the available accounts.

Available pool account names are stored one per line in the pool file inside the directory. Leased names are stored in the other files placed in the same directory and can be reassinged to the other users after 10 days of inactivity.

Example configuration for atlas:

  1. Create an account pool:

    mkdir -p /etc/grid-security/pool/atlas
    for u in atlas{001..100}; do echo $u >> /etc/grid-security/pool/atlas/pool; done
    
  2. Configure mapping in arc.conf [3]:

    [mapping]
    unixgroupmap=atlas simplepool /etc/grid-security/pool/atlas
    
[3]atlas is the name used in [authgroup: atlas]

Legacy grid-mapfile based mapping

Legacy grid-mapfile based mapping is not recommended for the typical production loads.

In this approach users are mapped to local account based on certificate DN only. Mapping rules are stored line-by-line in so-called grid-mapfiles that describes which user is mapped to which account, for example:

"/O=Grid/O=NorduGrid/OU=uio.no/CN=Aleksandr Konstantinov" user1
"/O=Grid/O=NorduGrid/OU=hep.lu.se/CN=Oxana Smirnova" user2

In the simplest legacy case ARC can use grid-mapfile for both authorization and mapping decisions instead of or in addition to [authgroups].

Normally grid-mapfiles are refered in arc.conf as [userlist] objects that can be used as a source for authZ and mapping.

To generate mapfiles automatically and keeping it up to date (from e.g. VOMS database) nordugridmap utility can be used and configured with [nordugridmap] block.

Using external LCMAPS rules

ARC can run external plugin to map users. To comply the several production loads ARC ships with build-in LCMAPS plugin.

LCMAPS itself should be installed and configured separately and beyound the scope of this guide. Consult ARC CE Sysadm Manual [TODO].]

Provide LRMS-specific information

One more critical confgiuration step is to supply ARC CE with relevant information regarding you LRMS specifics.

Specify you LRMS type

In the arc.conf there is a dedicated [lrms] block that defines the type of your LRMS and several options related to the tuning behaviour. For example to instrcut ARC to use SLURM use the following config:

[lrms]
lrms = slurm
slurm_use_sacct = yes

Specify queues

In addition to specifying LRMS itself you should list all queues you want to expose via ARC CE using [queue: name] blocks.

Configure A-REX

The ARC Resource-coupled EXecution service (A-REX) is a core service for the execution of compute jobs.

Enable job management interfaces

A-REX has several job management interfaces avaiable. You can control which of them are enabled confgiuring the corresponding blocks

EMI-ES
[arex/ws/emies]
RESTFul
[arex/rest]
Gridftp
[gridftpd/jobs]
Internal
This interface is avaiable implicitly

Enable data services

ARC comes with powerfull data-staging framework called DTR. [TODO: purpose of datastaging]

Define [arex/data-staging] to enable data-staging capabilities.

TODO: advantages of having cache TODO: choose and consider to share cachedir Configure [arex/cache]

RunTimeEnvironments

RunTimeEnvironments are scripts that can extend job execution cycle with a simple plug-in technik. ARC ships several RTEs that already available to be used and classified as system-defined. You can add extra directories with so-called user-defined RTEs using the runtimedir configuration block in [arex] configuration.

In the ARC6 both system- and user-defined directories are local to ARC CE and SHOULD NOT be shared to worker nodes.

To use one of the installed RTEs you should additionally enable this RTE with arcctl tool. For example to enable system-defined ENV/PROXY RTE run:

arcctl rte enable ENV/PROXY

[TODO: write dedicated document and link it here to describe list, default, and set-params stuff]

Information system

ARC CE information system aimed to collect and publish informaion to clients to be used for matchmaking and/or monitoring the state and stats of the resource.

It is mandatory to configure the information system for production case, like WLCG computing element.

Defining general information

There are many information schemas and renderings of data avaiable to comply any existing standards. There are several blocks that used to defined information depending on schemas:

[infosys]
The most common block that enables internal information collection from ARC CE host and LRMS
[infosys/cluster]
The common information about the whole cluster, including description of calculated total CPUs values.
[queue: name]
For the heterogeneous clusters most of the information in the [infosys/cluster] block can be redefined on per-queue basis.
[infosys/glue2]
Configures the GLUE2-specific values and enables internal glue2 rendering.
[infosys/ldap]
Enables LDAP/BDII dedicated services to publish information via LDAP protocol.
[infosys/glue2/ldap]
Enables GLUE2-schema LDAP rendering of the collected information.
[infosys/nordugrid]
Enables LDAP rendering of the collected information according to the Nordugrid schema.
[infosys/glue1]
Configures the GLUE1.x-schema specific values and enables LDAP rendering of GLUE1.x.
[infosys/glue1/site-bdii]
Enables and configures GLUE1.x site-bdii functionality.

Hinting clients about authorized VOs

TODO: describe advertizedvo per/cluster per/queue

Accounting

ARC CE has build-in functionality to publish usage statictics to the SGAS and APEL centralized accounting services with the jura tool.

[TODO: some common flow and archiving necessity for republishing]

Publishing to SGAS

TODO: example

Publishing to APEL

TODO: example

Additional ARC services for advanced use-cases

Datadelivery service

TODO: description TODO: [arex] conf

Candypond

TODO: description TODO: develop, ship, and describe here candypont RTE

ACIX

TODO: describe ACIX (scanner, index, brocker)

CONFIGURE FIREWALL

Different ARC CE services opens a set of ports that should be allowed in the firewall configuration.

To generate iptables configuration based on arc.conf run:

arcctl deploy iptables-config

ENABLE AND RUN SERVICES

To enable and run all services as configured in arc.conf run:

arcctl service enable --as-configured --now

TEST BASIC FUNCTIONALITY

To test the job submission on the same host as A-REX the iternal interface can be used:

arcsub -S org.nordugrid.internal to inject the jobs directly

To test the submission via any of regular interfaces you can use ARC clients on the other machine and run:

arcstat
arctest
arcsub

To diagnose the ARC CE service you can interact with arcctl in many different ways. Some common examples follows:

  • Check which ARC services are enabled and running:

    arcctl service list
    
  • TODO:

    arcctl job list
    arcctl job attr 1s1MDm6kspsnr0O5upx6UuPqABFKDmABFKDmPRIKDmABFKDmzyFsJm lrmsid
    arcctl job log 1s1MDm6kspsnr0O5upx6UuPqABFKDmABFKDmPRIKDmABFKDmzyFsJm
    arcctl job log 1s1MDm6kspsnr0O5upx6UuPqABFKDmABFKDmPRIKDmABFKDmzyFsJm --lrms
    arcctl job log 1s1MDm6kspsnr0O5upx6UuPqABFKDmABFKDmPRIKDmABFKDmzyFsJm --service
    arcctl accounting stats
    

[TODO] Links to some production configs as examples? Publish some configs when we will create new ARC 6 configs from old ARC 5 for people.