PBS Professional 11.2 Release Notes 21 December 2011 Copyright (c) 2003-2011 Altair Engineering, Inc. All rights reserved. PBS(TM), PBS Works(TM), PBS GridWorks(R), PBS Professional(R), PBS Analytics(TM), PBS Catalyst(TM), e-Compute(TM), and e-Render(TM) are trademarks of Altair Engineering, Inc. and are protected under U.S. and international laws and treaties. All other marks are the property of their respective owners. ALTAIR ENGINEERING INC. Proprietary and Confidential. Contains Trade Secret Information. Not for use or disclosure outside ALTAIR and its licensed clients. Information contained herein shall not be decompiled, disassembled, duplicated or disclosed in whole or in part for any purpose. Usage of the software is only as explicitly permitted in the end user software license agreement. Copyright notice does not imply publication. For the most recent information, log in to the PBS Professional website, www.pbsworks.com with your site ID and password and follow the "Client Login" link, then the "Download Software" link. ------------------------------------------------------------------------- Contents Contact Information Licensing Information Supported Platforms Upgrading Special Notes (Security) Special Notes (Licensing) Special Notes (Installation) Special Notes (General) Special Notes (External Interface Changes) Special Notes (Documentation) Special Notes (Hooks) Special Notes (HPC Basic Profile) Special Notes (Windows) Special Notes (SGI) Special Notes (Cray) Special Notes (MPI) Special Notes (AIX) Special Notes (Linux) New Features Modifications and Bug Fixes Known Bugs, Errors, and Warnings Third Party Legal Notices ------------------------------------------------------------------------- Contact Information Altair Engineering, Inc. 1820 E. Big Beaver Road Troy, MI 48083-2031 USA www.pbsworks.com PBS Sales: pbssales@altair.com 248.614.2400 PBS Support: pbssupport@altair.com 248.614.2425 ------------------------------------------------------------------------- Licensing Information Terms of use for this software are available online at http://www.pbspro.com/UserArea/agreement.html, and are also included in the PBS Professional Administrator's Guide and the PBS Professional User's Guide. PBS uses a new Altair license server for licensing. The Altair license server must be installed and configured before PBS is installed and configured for permanent use. See the PBS Professional Installation & Upgrade Guide and the Altair License Management System Installation and Operations Guide for additional information. ------------------------------------------------------------------------- Supported Platforms For the following platforms, you can download all PBS components: HP-UX 11.23 and later on Itanium IBM AIX 5.3 with TL9 or later on POWER architectures IBM AIX 6.x and 7.1 on POWER architectures Red Hat Enterprise Linux 4 (AS, ES, WS) on x86, x86_64, and ia64 Red Hat Enterprise Linux 5 on x86, x86_64, and ia64 Red Hat Enterprise Linux 6 on x86 and x86_64 CentOS 5.4, 5.5, 5.6, and 6.0 on x86 and x86_64 SGI Altix with SGI ProPack 5, 6, and 7 on ia64 SGI ICE/XE with SGI ProPack 5 and 6 on x86_64 SGI ICE/XE/UV with SGI Performance Suite 1 on x86_64 Solaris 10 on SPARC and x86_64 SuSE SLES 9, 10, 11 on x86, x86_64, and ia64 Windows 7 on x86 and x86_64 Windows XP Professional, SP1 and later, on x86 and x86_64 Windows Server 2003 on x86 and x86_64 Windows Server 2008 on x86 and x86_64 Windows Server 2008 R2 on x86_64 Windows Vista on x86 and x86_64 Cray Linux Environment (CLE) 2.2 on XT4 and XT5 Cray Linux Environment (CLE) 3.0 on XT6 Cray Linux Environment (CLE) 3.1 on XT5, XT6, XE5, and XE6 Cray Linux Environment (CLE) 4.0 on XE6 For the following platforms, you can download only the license server: OS X 10.5/10.6 on x86 and x86_64 MPI PBS runs jobs with all known implementations of MPI. PBS is integrated more tightly with some versions, providing support for resource tracking and job cleanup after a failure. These are listed here: MPICH 1.2.5, 1.2.6, and 1.2.7 on Linux MPICH2 1.0.3, 1.0.5, 1.0.7 on Linux MPICH-GM (mpich-1.2.6.14b) on Linux IBM POE on AIX 5.x and 6.x, including HPS support Intel MPI 2.0.022, 3, and 4 on Linux HP MPI 1.08.03 on HP-UX 11 on Itanium 2 HP MPI 2.0.0 on Linux Platform MPI 8.0 on Linux LAM/MPI 6.5.9, 7.0.6, 7.1.1 on Linux SGI MPI (MPT) on Linux on SGI platforms, including over InfiniBand MVAPICH 1.2.7 and 2.0 on Linux Support for CSA: Support for CSA is deprecated. The following platforms supported CSA: ProPack version OS version Architecture libcsa version libjob version ----------------------------------------------------------------------- PP6 SLES10 x86_64 libcsa.so.3 libjob.so.2 PP6 SLES10 ia64 libcsa.so.3 libjob.so.2 PP6 SLES11 ia64 libcsa.so.4 libjob.so.2 PP6 SLES11 x86_64 libcsa.so.4 libjob.so.2 PP7 SLES11 ia64 libcsa.so.4 libjob.so.2 (none) SLES10 x86_64 libcsa.so.1 libjob.so (none) SLES11 x86_64 libcsa.so.1 libjob.so Any system running PBS and Performance Suite will not support CSA. Support for User Space mode on InfiniBand switches: Platform: AIX 5.3 on Power6 Parallel Environment: 4.3.2.5 libnrt: 2.4.5.5 Switches: Qlogic InfiniBand SilverStorm 9120 Connection to switch: Compute nodes connect to each Qlogic switch with 2 networks per HCA adapter. There are 4 links per switch and a total of 8 links per node. Windows Domains: Active Directory Service domains are supported. Windows NT domains are not supported. Mixed mode is not supported. Mixing Windows systems with non-Windows systems in one PBS complex is not supported. Peer scheduling between Windows and non-Windows complexes is not supported. For more specific Windows platform information, see Special Notes (Windows). Licensing: For a list of supported platforms for the license manager, see the license manager release notes. These are available from the PBS Works Documentation Download Area, at https://secure.altair.com/UserArea/docs.php ------------------------------------------------------------------------- Upgrading 1. PBS 11.0 and later uses a new Altair license server, which must be installed and configured before PBS is upgraded. See the PBS Professional Installation & Upgrade Guide and the Altair License Management System Installation and Operations Guide. 2. Upgrading a Cray to PBS 11.0 and later requires a migration upgrade. See the PBS Professional Installation & Upgrade Guide. 3. No rolling upgrades: You should upgrade all MoMs and the server/scheduler at the same time; an older MoM cannot interoperate with a newer server. For example, an 11.2 server cannot send jobs to a 10.4 or even an 11.1 MoM; the older MoMs will reject the jobs. 4. To choose between overlay and migration when upgrading to PBS Professional 11.0 and later: If you are running PBS version 5.3 or later, an overlay upgrade is recommended. If you are running a PBS version older than 5.3, a migration upgrade is recommended. PBS on Windows can be upgraded via migration only. ------------------------------------------------------------------------- Special Notes (Licensing) PBS uses a new Altair license server, which MUST be installed and configured before PBS is installed and configured. You CANNOT use a FLEX server with PBS 11.0 or later. Follow the instructions in the Altair License Management System Installation and Operations Guide. Chapter 3, Licensing, in the PBS Professional Installation & Upgrade Guide describes how to configure PBS Professional to use the license server. If you find that the PBS server host becomes unresponsive a short time after startup, the server may be trying to contact the wrong license server. See the Troubleshooting chapter in the PBS Professional Administrator's Guide. The pbs_license_file_location server attribute is deprecated. The installation script unsets this attribute during an upgrade. This is intentional. Use instead the pbs_license_info server attribute. The minimum number of PBS Works licenses that can be checked out is one (1). If the pbs_license_min server attribute is set to zero or unset, PBS automatically sets the number to one. For version 11.2, the default value for the pbs_license_linger_time server attribute is set to 1 year (31536000 seconds). The previous default value was 3600 seconds. This means that any unused licenses are checked back in to the license server after a year, instead of one hour. This value can be overridden by setting the value of pbs_license_linger_time via qmgr. ------------------------------------------------------------------------- Special Notes (Installation) Under UNIX/Linux, we recommend that the PBS administrator creates a new data service account called "pbsdata" before installing PBS. This account must be created correctly. See the PBS Professional Installation & Upgrade Guide for details. ------------------------------------------------------------------------- Special Notes (General) 0. Do not perform partial removal of custom resources. It is important to remove custom resources correctly. See the instructions in the PBS Professional Administrator's Guide. 1. Dependencies on Linux shared object libraries: Altair distributes separate PBS packages for different supported Linux distributions. Customers running on SLES and RHEL distributions should take care to use the correct package for their systems. For example, in a complex made up of RHEL 4 and RHEL 5 hosts, the RHEL4 package should be used on the RHEL 4 hosts, and the RHEL5 packages should be used on the RHEL 5 hosts. If PBS Professional 10.2 or later is deployed onto Linux distributions other than the supported SLES and RHEL distributions, there may be runtime problems with dependencies on shared object libraries, typically libssl.so or libcrypto.so. Customers who encounter such problems should first determine which supported distribution is most similar to that installed on their systems. For example, openSUSE 10.x is similar to SLES 10, and CentOs distributions are similar to RHEL. The customer should then try the PBS package that is supported for the similar system. If the runtime dependencies are still not resolved, there are two other approaches that may be tried. Before trying either approach, first run ldd on pbs_mom to get name and version number of libraries that pbs requires. pbs_mom needs to be extracted from the PBS rpm by using the Linux rpm2cpio and cpio commands. Here is an example using an installed SLES 10 x86_64 pbs_mom: shell> ldd /opt/pbs/default/sbin/pbs_mom libdl.so.2 => /lib64/libdl.so.2 (0x00002b23b0a59000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00002b23b0b5d000) libssl.so.0.9.8 => /usr/lib64/libssl.so.0.9.8 (0x00002b23b0c96000) libc.so.6 => /lib64/libc.so.6 (0x00002b23b0ddd000) libcrypto.so.0.9.8 => /usr/lib64/libcrypto.so.0.9.8 (0x00002b23b100d000) /lib64/ld-linux-x86-64.so.2 (0x00002b23b093c000) The Linux incompatibilities are generally associated with libcrypto.so, which is required by libssl.so. Having identified the requisite version of libcrypto.so and libssl.so, there are two options: (1) Locate and install an OpenSSL compatibility library that will satisfy the dependency. In the example above, the library needs to satisfy the dependency on libcrypto.so.0.9.8. (2) Download and install a version of OpenSSL that corresponds to the dependency determined above (e.g., libcrypto.so.0.9.8). 2. Starting with version 10.4, PBS requires more disk space than prior versions. Package size of PBS with Python is almost double that of these older versions. For example, under SLES, the size is now 81mb instead of 49mb, and under RHEL, the size is now 42mb instead of 21mb. 3. The default umask behavior changed in 10.4 due to the fix for SPID 4487. Under 10.2 and earlier versions of PBS, if -Wsandox=PRIVATE was specified, the job's execution directory was created with permissions drwxr-xr-x . With this fix, the default job umask is applied, so unless a different umask is specified for the job's execution directory, it has permission drwx------. To allow read access by Group or Other, it is necessary to specify the appropriate umask when submitting the job. See the PBS Professional User's Guide. 4. Do not use different versions of PBS within the same complex. All machines using the same PBS server should run the same version of PBS. Do not mix major or minor versions. For example, do not run 11.2 and 10.2 in the same complex. 5. Do not use qmgr to set or change the vnode sharing attribute, or a vnode's ncpus, vmem or mem. Instead, create a new vnode definition file and use "pbs_mom -s insert" to add the file. Then HUP the MOM. There are requirements for the vnode definition file. See the PBS Professional Administrator's Guide, section 3.5, "How to Configure MOMs and Vnodes". 6. All printable characters are allowed in a resource value, except for comma, space, colon, quotes, and ampersand, but are subject to some restrictions such as those imposed by select and place. Also, if these resource values are used as pathnames, they are subject to the restrictions of the operating system. 7. Issue 5909 (Bug 5375): pbs_mom emits error "unable to get my domain name" Starting with release 7.0, pbs_mom may emit the message "pbs_mom: pbs_mom, Warning: unable to get my domain name" when PBS is started. The mere presence of this warning on a system that has been upgraded from a previous release of PBS should not cause concern: the same check was performed in previous releases; the only change is that the warning was suppressed. The actual cause of the warning may be - pbs_mom is unable to find the fully-qualified domain name of the host on which it is executing - although the host's name (i.e. the value of uname -n) is fully qualified, the host's native name service has been configured to prefer other name service backends over DNS, and the preferred service uses an unqualified name for the host. For example, systems using a name service switch configuration file, nsswitch.conf, may have a line of the form hosts: files dns in conjunction with a hosts file entry in which the host's name is not fully qualified. To address the problem, administrators should ensure that the result of converting the host's name (obtained via gethostname()) to an address (using gethostbyname()) and converting that address to a name (using gethostbyaddr()) results in the original host name. See the pbs_hostn(8B) manual page for more information. 8. SPID 15770: Change in qdel behavior The -Wdelay option to qdel is silently ignored. The delay between the SIGTERM and SIGKILL signals is determined by each queue's kill_delay attribute. The default value for this attribute is changed to 10 seconds. 9. X11 forwarding limitation with Sun ssh 1.1 A working X11 forwarding configuration depends on the underlying ssh version. Sun sshd version Sun_SSH_1.1 does not work correctly. Updating to version Sun_SSH_1.1.2 provides an sshd with working X11 Forwarding. We recommend that customers use version 1.1.2 or newer. Sun sshd differs from openssh (as installed on BSD/linux systems) in important ways, so we do not recommend that customers replace Sun_SSH with openssh. 10. Enabling X11 forwarding In order to enable X11 forwarding, add the path to the xauth binary to each MoM's pbs_environment file. For example: echo $PATH /bin:/user/bin -bash-3.2$ which xauth /usr/bin/X11/xauth The entry in the pbs_environment file should be the following: PATH=/bin:/usr/bin:/usr/bin/X11 ------------------------------------------------------------------------- Special Notes (External Interface Changes) All external interface changes took place in previous releases. Changes ------- 0. New -T option to qstat command shows estimated start time. 1. Server cannot be killed with SIGINT. 2. The pbs-report command will be deprecated. It will be moved to the unsupported directory in a future release. 3. New server and queue resource usage limits are incompatible with old server and queue resource usage limits. See "Managing Resource Usage" in the PBS Professional Administrator's Guide. 4. Enabling job history changes the behavior of dependent jobs. If a job j1 depends on a finished job j2 for which PBS is maintaining history then j1 will go into the held state. If job j1 depends on a finished job j3 that has been purged from the historical records then j1 will be rejected just as in previous versions of PBS where the job was no longer in the system. 5. 8.0: Change to pbsnodes output: "Host =" replaced with "Mom =" 6. The resource arch is only used inside of a select statement. 7. pbs_hostid command no longer used. Use "almutil -hostid".. 8. When a job is requeued, a new timeout message may be printed. 9. When a reservation is confirmed, the message printed to the scheduler log is different from previous versions. 10. MOM now exits with an exit value of 1 if an error is detected while reading the configuration file. In previous versions of PBS, MOM did not exit. 11. Some MOM error messages no longer have a leading WARNING flag. 12. Behavior of qdel changed. See Special Notes (General). 13. Behavior of qsub, qstat, qselect, pbs_mom expanded. 14. Change in pbsnodes output: on hyper-threaded Linux systems, pbsnodes now reports the number of processors listed in /proc/cpuinfo. 15. Change to log levels: see the PBS Professional Administrator's Guide for the table titled "PBS Events and Log Event Classes" 16. Ability to submit a job with "-l ", i.e. without giving a value for the resource, will not work in future releases. This behavior was not supported, and will no longer work in future releases. 17. (11.0) A message in the server log file indicating that a vnode name is already defined by another MOM is no longer logged. 18. Consumable resources that do not appear in the "resources" line in the sched_config file will no longer appear in a job's exec_vnode or reservation's resv_nodes. In previous releases they were included, but not used. Most notable examples are mpiprocs and ompthreads. 19. 11.2: Support in PBS for CSA is deprecated. 20. 11.2: New -X option to qsub. 21. 11.2: New accounting log field called "project=" added to S, E, and R records. Files that no longer exist (11.0) -------------------------- PBS_HOME/server_priv/nodes PBS_HOME/server_priv/resvs/ and all *.RB files under this directory PBS_HOME/server_priv/serverdb Deprecated scheduler configuration parameters --------------------------------------------- assign_ssinodes cpus_per_ssinode key load_balancing_rr mem_per_ssinode preempt_checkpoint preempt_fairshare preempt_requeue preempt_starving preempt_suspend sort_by sort_priority option to job_sort_key strict_fifo sync_time Obsolete and deprecated resources --------------------------------- The -l nodes=nodespec form is replaced by the -l select= and -l place= statements. The -l resc=rescspec form is replaced by the -l select= statement. Properties are replaced by boolean resources. The resource host is only used inside of a select statement. The resource arch is only used inside of a select statement. The nodect resource is obsolete. The ncpus resource should be used instead. Sites which currently have default values or limits based on nodect should change them to be based on ncpus. The neednodes resource is obsolete. The ssinodes resource is obsolete. The ppn resource is deprecated. The nodes resource is no longer used. The mpp* resources are deprecated, meaning the following are deprecated: mppwidth mppdepth mppnppn mppmem mpparch mpphost mpplabels mppnodes Server and Vnode deprecations ----------------------------- The time-shared node type is no longer used. The :ts suffix is obsolete. The cluster node type is no longer used. The lictype vnode attribute is deprecated. The license vnode attribute is deprecated. The server attribute node_pack is deprecated. The server attribute pbs_license_file_location is deprecated, and replaced by pbs_license_info. Command deprecations -------------------- The -a option to the qselect command is deprecated. The -Wdelay=nnnn option to the qdel command is deprecated. The -c option to the pbsnodes command is deprecated. The pbs_hostid command is no longer available. The -a option to the pbs_sched command is deprecated. API deprecations ---------------- pbsrescquery pbs_rescquery avail totpool usepool configrm Altix: no longer used --------------------- Requests for the following values will return a PBSE_RMUNKNOWN error: cpuset_small_mem cpuset_small_ncpus memreserved max_shared_nodes nodersrcs shared_cpusets small_job_spec New server attributes --------------------- reserve_retry_init reserve_retry_cutoff max_queued max_queued_res.RES max_run max_run_res.RES max_run_soft max_run_soft_res.RES job_history_enable job_history_duration max_concurrent_provision backfill_depth (10.4) est_start_time_freq (10.4) pbs_license_info (11.0) New vnode attributes -------------------- hpcbp_enable hpcbp_user_name hpcbp_webservice_address hpcbp_stage_protocol provision_enable current_aoe in_multivnode_host (11.0) New reservation attributes -------------------------- reserve_retry New queue attributes -------------------- max_queued max_queued_res.RES max_run max_run_res.RES max_run_soft max_run_soft_res.RES New job attributes ------------------ Exit_status Stageout_status Submit_arguments estimated placement includes "vscatter" (11.0) placement includes "exclhost" (11.0) forward_x11_cookie (11.2) forward_x11_port (11.2) project (11.2) New job states -------------- F Finished M Moved New vnode states ---------------- provisioning wait-provisioning sharing attribute includes "default_exclhost", "force_exclhost" (11.0) resv_exclusive (11.0) New MOM configuration variables ------------------------------- cpuset_error_action (10.4) $aix_largepagemode vnodedef_additive (11.0) New resources ------------- start_time (10.4) exec_vnode (10.4) vntype (11.0) nchunk (11.0) PBScrayhost (Cray only) (11.0) PBScraynid (Cray only) (11.0) PBScrayorder (Cray only) (11.0) PBScraylabel_