Version 1.1 of sNow! cluster manager
Edit me

1.1.0

Major Fixes

  • FIX default console options and install repo
  • FIX issue #63 - missing mail program and empty license variable
  • Included error trapping in snow set node function
  • FIX issue #3 - dom0 memory limits
  • Fix user home creation on login
  • Add skel and ACL on user home creation
  • Fix permissions and owner for sssd.conf to be able to start the daemon
  • Fix: NET_LLF is optional
  • Fix enable public ssh instance in login role
  • Patch network config in CentOS with linkdelay=20 in order to fix issues with some 10Gb devices
  • Fix IPv6 issues related with RPCbind Server Activation Socket.

New Features and Changes

  • included OpenSUSE Leap 42.2 template
  • Diskless image support based on read-only nfsroot image
  • Included “set node” function in order to setup node based parameters
  • Extended snow “add node” function
  • Integrated database for compute nodes
  • Merged node_list function as a part of boot function
  • Included default image and template per node if defined in the database
  • The functions nreboot, npoweroff, nshutdown are now supporting a node list rather than just a range
  • included cpus, memory and disk size in the description of json file, as part of set_snow_json function
  • included last_deploy in the description of json file, as part of set_snow_json function
  • Included new CentOS 7.3 template with minimal packages
  • Included more comments in the snow.conf-example and renamed/removed some confusing variables.
  • Included logic for accommodating default node-based template
  • snow-install has been merged in this repository
  • Improvements in nfsroot and squashfs support
  • Included show nodes feature
  • Included snow list roles function. The style of the role scripts are normalised
  • Included chroot environment function
  • Included new functions to create raids and filesystems. Included an example of first_boot_hook
  • Improved eb_wrap and interactive.
  • Fairsharing is not longer default requirement. AccountingStorageEnforce=nojobs
  • Readahead operations disabled in the CentOS template in order to avoid performance issues in shared FS.
  • Removed System Imager from deploy role, as not longer required
  • Included memtest as image
  • Removed boot_delay in the boot process as it’s only required in deploy process
  • Introduced NET_COMP in order to define the DHCP service for the compute nodes

Known Issues

  • Read-only NFSROOT image only working for CentOS/RHEL. Tuned dracut module is required to enable it for SuSE

1.1.1

Major Fixes

  • Fix issue related with multi-cluster in ganglia monitoring
  • Fix issue with minimal domain role deployment

1.1.2

New Features and Changes

  • Increased memory of snow server to 8GB
  • Included installation proxy per node basis in order to achieve better scalability in the deployment

    Minor Fixes

  • Fixed OpenVPN AS role deployment
  • introduced ConnectTimeout=5 in snow list domains to avoid long waiting when a domain is not responsive
  • Fixed minor coding style issues

1.1.3

Minor Fixes

  • Fixed issue with public instance of ssh in login role
  • Introduced a delay after each domain booted while booting them all with snow boot domains to improve server responsiveness in the bridge management
  • Fixed issues related with gmetad
  • Fixed issue related with resolv.conf content
  • Fixed issues related with xdm and gdm roles
  • Minor improvements in the minimal role

1.1.4

New Features and Changes

  • Included Docker support in domain roles
  • included Docker Swarm roles (swarm-manager, swarm-worker) to accommodate docker based services.
  • Included torque master role. The support of Torque in sNow is not as mature as Slurm.

Minor Fixes

  • interactive CLI not longer requires a Slurm account

1.1.5

New Features and Changes

  • Included logic in the Torque and Maui role deployment in order to avoid incompatibility issues
  • Included support for Torque and Maui in the node deployment

Minor Fixes

  • included list roles in the snow CLI error message

1.1.6

New Features and Changes

  • Initial support for GateOne
  • Improvements in ganglia setup - use unicast

Minor Fixes

  • Fix minor issues in Torque 5.3.1 services startup
  • Fix issue 114 - snow add node populates the database as expected
  • Fix issue 111 - check if a node list is already defined in the database
  • Fix issue 118 - corrected squid3 path /var/spool/squid3
  • Fix issue 115 - replaced error message with an error exit when trying to add nodes that already exist in the database.
  • Fix issue 115 - moved interactive question in node remove outside the loop
  • Fix issue 123 - included NTP configuration in compute nodes

1.1.7

New Features and Changes

  • Fixed path divergency in SuSE for /usr/lib/systemd/system in dracut
  • Stateless based on SquashFS + OverlayFS working for (Open)SuSE and RHEL/CentOS
  • Included image_rootfs and image_type in the database.

Major Fixes

  • Remote file systems are excluded during the image generation. mksquasfs uses xz compression. Improvements in dracut support for SuSE
  • Included /etc/resolv.conf to avoid potential issues related with DNS service not available while mounting NFS or cluster file systems in the boot time

Minor Fixes

  • Fix issue 5 - update snow.conf permissions to 600 after snow init.
  • Fix issue 122 - ‘snow show nodes’ prints also the host name
  • Fix issue 66 - included warning message in /etc/hosts

Known Issues

  • Included NFSROOT option in overlayfs but it doesn’t allow to apply live changes in ro NFS image due a bug in OverlayFS. Remount is required to enable changes performed in NFS image.

1.1.8

New Features and Changes

  • “snow list domains” command also includes the hosts where the domains are allocated

Minor Fixes

  • Fix issue related with user environment under interactive job session

1.1.9

New Features and Changes

  • list domains provides High Availability and service locality information

1.1.10

New Features and Changes

  • LDAP master role generates certificates and populates the DB
  • LDAP DB has been migrated from HDB to MDB (new standard)
  • Improved privacy and security in default LDAP role

Minor Fixes

  • Fix issue 123: automatic start of NTPD at boot time
  • Fix recurrent issue 3: dom0 dedicated memory and preventing dom0 memory ballooning
  • Fix issue 131: no exit after trying to boot a non-deployed domain

Major Fixes

  • Fix broken compatibility in LDAP deployment due new standards

1.1.11

New Features and Changes

  • New domain role for BeeGFS server deployment
  • BeeGFS native client support
  • BeeGFS stateless image support - tested in CentOS 7.x
  • Lustre stateless image support - tested in CentOS 7.x
  • Support for Debian preseed deployment (partially fix issue #84 - deploy with ubuntu is still missing)
  • New deployment template for Debian 8.x
  • New deployment template for Debian 9.x
  • New deployment template for CentOS 7.4
  • Included GRES requirement in interactive
  • updated cpu-id-map to include Skylake architecture

Minor Fixes

  • Fix delay issue listing images.
  • Minor fix in OS release detection in install.sh
  • Included libx11-devel and openssl-devel to meet the OS packages requirements for some applications
  • Clean-up slurm job epilog and prolog
  • Fixed issue 125: Included DNS search list to snow.conf
  • Updated README files (fix issue #140)
  • sNow! CLI help more human readable (fix issue #126)
  • Fixed issue #139: check if snow CLI is executed by root
  • Fixed issue #138: style issue in the “snow list domains” output

Major Fixes

  • Fixed delay issues in diskless boot.
  • Fixed issues with stateless shutdowns due network stop before unmounting CFS.
  • Fixed ganglia configuration per cluster nodes.

Known Issues

  • Diskless based on OverlayFSroot over BeeGFS is fixed, but bug in systemd-machine-id-commit still affects old kernels < 4.2. There is a workaround to systemd-machine-id-commit + overlayfs bug (hostname) but not fully tested.

1.1.12

New Features and Changes

  • Included icinga2 role (web based setup not automated, so manual intervention is required after the deployment).
  • Included native support with Singularity based on HPCNow! repository

Minor Fixes

  • Fix typo in warning_message -> warning_msg.
  • Fix missing config file in memtest and localboot images.
  • Slurm configuration template updated in order to pick up changes in the latest release
  • Updated NFS server configuration with async for /sNow.
  • Default NFS mount options noatime and nodiratime in the clients.
  • Included performance considerations notes in the snow.conf-example
  • Updated slurdmdb configuration in order to fix issues with user creation.
  • SlurmDB user is root, in order to have consistency with SlurmCTLD user.
  • Updated the order of active-domains.conf-example to match the domain deployment order
  • Increased the number of loopback devices to 64 when virtualisation technology selected it Xen

Major Fixes

  • Fix memtest image url.

1.1.13

New Features and Changes

  • Included unattended installation in order to accommodate CI/CD
  • Included force option in snow init command
  • Included additional logic to manage boot/shutdown domains in HA mode
  • Boot function is now breakdown into boot_domain and boot_node
  • Default memory for domains is 2GB
  • merged /root/post-install.log into /root/snow-postinstall.log in Redhad/CentOS deployments

Minor Fixes

  • Fix conditional for domains shutdown/boot in HA mode
  • Merged /root/post-install.log into /root/snow-postinstall.log in Redhad/CentOS deployments
  • Fix issue 141: snow boot cluster says “cluster booted” when it only triggers the process.
  • Fix issue 135: Especial characters in the passwords defined in snow.conf could introduce some issues. It can be fixed by using single quotation marks. i.e. ‘$my_Str0Ng!! P455w@rD#’
  • Fix issue 145: Included additional logic to allow to execute snow help when the snow.conf is not available.

1.1.14

New Features and Changes

  • Initial support for Ubuntu 18.04 LTS as sNow! server
  • Removed sudosh from the HPCNow! working environment. Will include the package in the repository.
  • Included parallel bzip2 (pbzip2) in the decompression of sNow! domain template.

Minor Fixes

  • Fix warning messages in snow.log during domain deployment
  • Included full log history of snow command.

1.1.15

New Features and Changes

  • Extended support for Docker Swarm cluster provisioning.
  • Docker version must be set in snow.conf in order to ensure consistency in the Docker Swarm cluster.
  • Introduced support for OpenNebula private cloud provisioning.
  • Initial support for dynamic provisioning between Slurm, Docker Swarm and OpenNebula.
  • Included shellcheck in HPCNow! development environment.
  • Reduced memory footprint during the image gathering.
  • Installation has sNow nodes awareness, prelude to HA cluster self join.
  • Stateless provisioning is no longer fetching OS image into memory by default.
  • Increased the default number of CPUs available in sNow! server to 4.
  • The /home is no longer required to propagate SSH keys across the nodes and domains.

Minor Fixes

  • Fix issues with no-fetching option in stateless provisioning over NFS
  • Fix path of domain images. Default location is /sNow/domains/domain_name rather than /sNow/domains/domains/domain_name
  • Fixed console redirection in Xen kernels

1.1.16

New Features and Changes

  • Included lftp in order to enable a fast way to download and create local package repositories

Minor Fixes

  • Included cracklib-runtime package as a workaround of a broken package dependency
  • Fix issue in setting up the ganglia grid name