After a OpenStack installation test the following:

Config check

  1. Reboot/Restart the whole cluster (to ensure every config change is persisted)
    • Admin (manually), when it comes back, use it to reboot the other nodes (Crowbar) - check if admin dashboard is reachable/responding
    • Controller
    • Compute Nodes
  2. Execute chef-run on every node (to ensure every config change is in chef)
    • root@admin# ssh controller -- chef-client
    • root@admin# ssh compute1 -- chef-client
    • root@admin# ssh compute2 -- chef-client
    • root@admin# ssh compute3 -- chef-client
    • root@admin# ssh compute4 -- chef-client
  3. Test the following

Cluster check

  1. User-facing OpenStack parts are working
    • OpenStack Dashboard is reachable/responding
    • endpoints work: nova list from a workstation
  2. Start/Stop of VM works
    • Start two new VMs
    • Stop one of them
    • optional: this works via CLI tools on the workstation
  3. Network connectivity to VM works
    • ping other VM
    • SSH into the other VM
  4. Network connectivity from VM to Internet works (including DNS) (e.g. ping + wget
    • ping
    • ping
    • wget
    • optional: ping another VM
  5. Network connectivity from nodes to internet works (optional as not needed for OpenStack usage)

OpenStack Ops Checklist

  • “nova-manage services list” check if all services are “:-)”, if not
    • try to start service on node where it is down
  • “ps axw” on controller should include “nova-scheduler, nova-api-os-compute, nova-api-os-metadata”
    • if not start service via “start
    • if service does not start: check log file for error messages (/var/log/nova-*)
  • “ps axw” on compute should include “nova-network, nova-compute”
    • if not start service via “start
    • if service does not start: check log file for error messages (/var/log/nova-*)
  • check if hard disk of controller and nodes is not full
    • if hard drive of controller was full: restart mysql; /etc/init.d/rabbitmq-server restart
  • check if instance start/access/stop is possible
    • use test or admin account to start instance, see if it starts and a connection via ssh is possible (security groups need to be set up right)

More tipps and tricks see:

Default Config differences

  • Document every non-standard config change (e.g network)
  • Give documentation to customer

Automate It!

  • Next step is to automate steps in this process :-)

Interested in Cloud or Chef Trainings? Have a look at our Commandemy Trainings page. Need help migrating to the cloud? Check out Infralovers.

comments powered by Disqus
Blog Tags