A little more on OpenShift and monitoring with Zabbix

I’ve been doing some more work with monitoring OpenShift using Zabbix and getting more impressed with Zabbix all of the time. I have a Zabbix server collecting metrics from agents running on OpenShift Nodes and a Broker and learned a few things that I wanted to pass along. Zabbix is very configuration driven and it is worthwhile understanding how metrics and their associated keys are structured along with template and groups. This makes it easier later when you are adding multiple items that are related and then potentially grouping them for graphs and other purposes. It is worthwhile to experiment with collecting and displaying various metrics as a throwaway activity because you are very like to want to go back later and do it over again once you understand how to do what you need. I’ve found the Zabbix documentation to be useful with details and a few basics but it can’t really convey how to implement a large-ish system of monitoring.
Monitoring serves a number of purposes including identifying what is working, what is not working, what patterns are different than usual for the same time period yesterday/last week/the same season some other time. It is also essential to monitor with the intent of looking for additional capacity requirement or excess capacity that is costing you money but is not needed.
Zabbix agents are configured to collect various metrics and report them back to the server, either while waiting to be requested or actively pushing to the server. In the case of OpenShift the agents can be configured to run the various commands that are either directly a part of the OpenShift distribution or involved in the infrastructure such as messaging (ActiveMQ) or the database (MongoDB). Examples of those commands include oo-admin-chk, mco and a few others. However, by default Zabbix (server and agent) will wait for up to 30-seconds for an item collection to complete and several of the oo commands may take longer than that. In one case I saw an oo command take more than a minute and 15 seconds to complete. As a result it is useful to be aware of different approaches to dealing with that long running command situation. One way is to have a cron job execute the commands that you want, save the output to a file and have the agent run scripts to extract the items from those files. Another means of that is to use the zabbix_sender shell command (part of the base zabbix20 package) to collect the metrics and push them to the server without involving an agent at all.
I’m working on setting up an OpenShift Origin configuration in the EC2 (using Ansible playbooks) and using Zabbix to monitor it. I will be putting a few of my Zabbix scripts into my github repository as I’m doing this. More on that soon.

Podcast review: FoodFight

I regularly listen to various podcasts while walking my dog, commuting and so on and wanted to pass along a recommendation for the Food Fight podcast (http://foodfightshow.org/, @foodfightshow) which covers various devops subjects. It is excellent, stays on topic and has guests from various aspects of software technologies. The recent episode Show 64 – ‘The Future of DevOps’ is particularly good in that it includes Andrew Clay Shafer (@littleidea) and includes some discussion on using metrics to evaluate and run businesses (in addition to various other thought provoking discussion). One point made was that it becomes easy for organizations to begin a process improvement program and then simply focus on the metrics of that program rather than evaluating the result of the activities. By that I mean “our six sigma program is training x people per week and assisting y projects per month so it must be helping us” as opposed to verifying that “our costs per LOC, task, sprint, activity… have dropped and our quality factors have improved which might be because of our six sigma program”. Of course doing means that you have to be able to collect and analyze those metrics but I believe that there is more value in that vs simply assuming that some process improvement program is producing a useful result (for the cost). If you only count the number of activities that the program is associated with and assume that it is good then it is easy to become tempted to add more money or more process to the program and impose it on more projects which might not help anything at all and losing opportunity to pursue more useful activities with that budget.
This is a much larger subject to consider that I’ve done here but I wanted to pass along the reference because it was fresh in my mind.

Ruby rant

To follow up on a previous post I still have to say that there has not been one single experience in dealing with anything having to do with Ruby where I have not had all sorts of version problems. An example of that is in what appears to be a simple instruction in my installing Open Shift in the EC2 where I need to install something called rubygem-thor. As is in the EC2 RHEL it is not in a yum repo. Trying to install it as a standalone rpm leads to needing ruby2. Various sites say to install ruby2 I need rbenv. All I need to do for that is to get it from github. And it goes on an on. These sorts of aggravations do not sound all that difficult when you are sitting at a shell prompt and installing various things as needed by hand but when you are trying to script it remotely using something like ansible it can take a while (all filled with non-productive time) trying to get a combination of events that actually work.
While posting instructions on how to do something using individual manual steps is laudable I still believe that real value comes from something like:
- launch an EC2 instance of RHEL 6.4
- put your instance DNS name in a hosts file
- install ansible
- ansible-playbook xxxxx.yml

Zabbix, OpenShift and Ansible

I’ve been doing some work with OpenShift recently and needed to add some monitoring to it. I’ve used Nagios for other application in the past but this time it looked like Zabbix already had some OpenShift Origin tooling for it so I decided to go that way. I’m using EC2 instances and found a very good site for some installation instructions off the Zabbix site at https://www.zabbix.com/documentation/2.0/manual/quickstart
I had one or two issues but the primary one was that the Zabbix UI would regularly put up a red banner saying that the Zabbix server was not running. That ended us being due to the EC2 selinux which I disabled with
# echo 0 > /selinux/enforce
Someday, I need to spend the time to learn how to use selinux effectively rather than turning it off.

I started using Ansible a while back to setup/provision remote servers and have been very impressed with it. The primary advantages that I’ve experienced with it are that ssh access is the only requirement to use it (meaning the remote system does not need to be prepared for it) and that it lets me script (meaning control and document) the process that I’m using to set the remote systems up. I’m using it to set up a number of EC2 instances and being able to group server and perform an action en masse works well. An example of that would be installing git each on instance. I intend to add some info on how I’m using ansible to my github repo shortly so hopefully you can find a little more there.

As I evolve this some more I will append to this.

OpenShift, Mac OSX, IPv4 vs IPv6 and lessons learned

I’ve been doing work with OpenShift Origin and using a pre-built VirtualBox VM for convenience. This has worked well for me in the office and during a recent novalug meeting but when I tried it at home I had an issue with the OO host where it appeared to hang during boot. The OO console window went dark and would occasionally flash and I could not ssh into it. If I control c’d the console I would get a shell prompt but it was clearly not working correctly. It turned out it was my Apple Airport wifi network caused a VirtualBox dhcp issue.
More on that here: http://www.virtualbox.org/manual/ch06.html
It includes the following: “On Macintosh hosts, functionality is limited when using AirPort (the Mac’s wireless networking) for bridged networking. Currently, VirtualBox supports only IPv4 over AirPort. For other protocols such as IPv6 and IPX, you must choose a wired interface.”
This problem caused me to lose a few hours pursuing it over the weekend but once I found this I moved to my ethernet at home and the problem went away. I’m sure that I could have eventually figured out how to proxy the OO instance but I had to keep moving forward.

A little yoga with my rowing and surprised at my heart rate

I bought a water rower a while back and revised my workouts to primarily focus on it. I also started following it with some yoga (using the Yoga Studio app on the iPad which I am very happy with). I use a heart rate monitor during my workouts and have been surprised at how yoga keeps my heart rate up more than I had expected that it would. This might be due to my lack of flexibility and the effort that I have to put in to even come close to the poses but I’m satisfied with the addition to my workouts.
The following graph shows 30 minutes of rowing followed by 30 minutes of yoga. While I don’t think that I’m going to set up a DougsYogaCam anytime soon (unless the Comedy Channel is interested) I like the change. Oh, and if you are someone that thinks that “The practice of Yoga is pagan at best, and occult at worst.” and leads to he/she/it/them/null being unhappy with me, boy are you way too late.

IMG_2091

MEAN (Mongo,Express,AngularJS,NodeJS), AWS/EC2, Ansible and OpenShift

During the last couple of months I’ve been fortunate enough to work using a number of new(ish) software technologies and wanted to pass along a little about them.
One activity was a 3 week project where 3 of us used the MEAN stack (Mongo, Express, AngularJS and NodeJS) to quickly implement a document processing system. MEAN was decided to be the best stack for the system and turned out to be very efficient for use and a lot of fun to work with. The document processing system contains a number of functions including polling a directory where a customer would upload XML and PDF files, picking up any found, transforming the content to JSON, insertion into a MongoDB, supporting an AngularJS UI where a user could login and perform various functions on the record content, upload the files to an AWS S3 bucket and let the user publish the record content into another XML format file. The work required moving quickly and we were able to get the basic workflow going within a week and a half and complete the project in 3 weeks. The code all ran on AWS EC2 instances (one for Mongo and one for the node based UI). Having the front and backends be in Javascript enabled us all to work on both ends of the wire which was also very helpful and I recommend this stack for that reason. Another benefit was it was entirely using open source software meaning the only costs were labor and AWS usage. I’ve gotten a great deal of respect for the AWS but using it in the real world does lead to a few lessons learned. One of the lessons was that EC2 transfers of files (upload via scp/sftp for example) could take a few seconds for multi-megabyte files which would cause the polling code to detect the file (using the Node watcher.stalker function) before it was completely available. This caused us to have to refactor the code a few times to finally get it working correctly. The lesson learned in that case was that it would have been much better to use an sftp server that would let us control the upload by naming the file with a temporary name while it was in transit and renaming it to the final name when complete. Handling this otherwise is not as easy as it sounds if you do not control the upload. I would like to have tried having our UI support an upload via the browser so that it was entirely under our control but we did not have time for that. Next time I hope to have that already working and in my back pocket (github here I come).
After that project was complete I started on one involving OpenShift and have been working in that area. With that I am getting the opportunity to use Ansible for installation/provisioning EC2 instances for OpenShift. If you are not familiar with it Ansible is an alternative to Puppet/Chef and other provisioning systems with a significant advantage in its favor being that the only configuration required on the remote system is ssh access. This enables the rapid and simple creation of AWS EC2 instances followed by Ansible playbook script execution for the installation of your code. This has worked very well for me so far and I expect to making much more use of it from now on.
OpenShift is described as “The open source upstream of OpenShift, the next generation application hosting platform developed by Red Hat. OpenShift Origin includes support for a wide variety of language runtimes and data layers including Java EE6, Ruby, PHP, Python, Perl, MongoDB, MySQL, and PostgreSQL”. It enables a variety of things but primarily lets you develop and run code in a standard packaged form that accommodates cloud architectures along with many other things. I expect to be passing along quite a bit more about that in the future so check back every now and then.

NodeJS, Mongo and Mongoose collection pluralization

If you happen to be using NodeJS, Mongo and Mongoose you can spend a couple of hours debugging before discovering that Mongoose pluralizes collection names (silently) unless you explicitly tell it not to. You can see this happen in queries if you start mongod in verbose mode with -vv. http://stackoverflow.com/questions/7230953/what-are-mongoose-nodejs-pluralization-rules