comment 0

Wrapping AEM cURL Commands With Python

If you ever had the experience (no pun) of using Adobe Experience Manager (AEM), you would already know that curl commands are arguably the de facto way of interacting with AEM over http.

Whenever you google for various AEM /CQ HOWTOs, it’s easy to find examples with curl commands:

Naturally I started integrating those curl commands into my project’s application provisioning and deployment automation via Ansible’s shell module. However, it wasn’t long until I encountered a number of issues:

  • Lack of consistent response payload format from AEM. Some status messages are embedded within various html response bodies, some within json objects.
  • Some endpoints are returning status code 500 for non-server error result (e.g. when an item to be created already exists), making it hard to differentiate from a real server error.
  • Some endpoints are returning status code 200 with error message in the html response body.
  • Even though curl –fail exists, it’s not fail-safe. There doesn’t seem to be any way to identify success/failure result without parsing the response headers and body.
  • Which means that curl commands could be returning exit code 0 even when the http status code indicates an error, and Ansible would not fail the task, it would simply continue on to the next task.
  • Printing the response bodies to stdout won’t help much either, it will be painful for a human having to go through a large volume of text to identify any error.

It’s obvious that curl commands alone are not enough, I need better error handling by both checking status code and parsing response body, and then translating it into Ansible success/failed status. So I wrote PyAEM, a Python client for Adobe Experience Manager (AEM) API.

Why Python? 1) It’s first class in Ansible. 2) It’s saner to handle the response (status code checking, html/json parsing) in Python compared to shell. 3) Ditto for code lint, unit tests, coverage check, and package distribution, Python wins!

PyAEM ended up using pycurl to simplify porting those curl commands into Python. I initially tried Requests instead and managed to port majority of the curl commands, until I got to package manager API and  kept getting different responses from AEM with Requests compared to the ones with curl commands. Since AEM was a black box and I didn’t have any access to its source code, I couldn’t tell what was it with libcurl and package upload/download that was missing from Requests. So at the end I stuck with pycurl.

Here’s a code snippet on how to use PyAEM to stop a bundle called ‘mybundle':
(check out PyAEM API Reference to find out other actions that PyAEM currently supports)

import pyaem

aem = pyaem.PyAem('admin', 'password', 'localhost', 4502)

try:
    result = aem.stop_bundle('mybundle')

    if result.is_success():
        print 'Success: {0}'.format(result.message)
    else:
        print 'Failure: {0}'.format(result.message)
except pyaem.PyAemException, e:
    print e.message

Better. Now it has success/failure status handling and also error handling by catching PyAemException.

As for Ansible, the next obvious step is to create Ansible modules which utilise PyAEM. These modules serve as a thin layer between Ansible and PyAEM, all they need to worry about is argument passing and status handling.

#!/usr/bin/python

import os
import pyaem
import commands

def main ():
    module = AnsibleModule(
        argument_spec = dict(
            host = dict(required=True),
            port = dict(required=True),
            bundle_name = dict(required=True)
        )
    )

    host = module.params['host']
    port = module.params['port']
    bundle_name = module.params['bundle_name']

    aem_username = os.getenv('crx_username')
    aem_password = os.getenv('crx_password')

    aem = pyaem.PyAem(aem_username, aem_password, host, port)
    result = aem.stop_bundle(bundle_name)

    if result.is_failure():
        print json.dumps({ 'failed': True, 'msg': result.message })
    else:
        print json.dumps({ 'msg': result.message })

from ansible.module_utils.basic import *
main()

The above module can then be used in an Ansible playbook.

- name: 'Stop com.day.crx.crxde-support bundle'
  aem-stop-bundle: >
    host=somehost.com
    port=4503
    bundle_name=com.day.crx.crxde-support

Too simple!

This can actually be improved further by creating an Ansible role for AEM, distributed through Galaxy. Things like downloading an AEM package file from an artifact repository, uploading the package to AEM, install, then replicate, it’s a repetitive pattern for AEM package management.

PyAEM is still at an early stage, but it’s stable enough (we use it in production). It currently only supports the actions that are used in my project. Having said that, I think the package is pretty solid with 100% unit test coverage, 0 lint violation, and an automated Travis CI build on every code change.

Since AEM is a proprietary product, it currently doesn’t have any automated integration tests (think AEM docker containers :) ). However, PyAEM is verified to work with AEM 5.6.1 and Python 2.6.x/2.7.x via the internal project I’m working on.

Want to use it? PyAEM is available on PyPI. Anything missing? Contributions are welcome!

comment 1

Human-Readable Ansible Playbook Log Output Using Callback Plugin

One problem I’ve had with Ansible playbook since its early 0.x days is with its verbose log output. Jsonified by default, it’s hard to read, and pretty much impossible for a human to review when its stdout or stderr contains tens/hundreds of lines combined into one lengthy string.

Here’s how it looks like:

changed: [gennou.local] => {"changed": true, "cmd": "/tmp/sample.sh",
"delta": "0:00:00.019164", "end": "2014-03-30 21:05:33.994066", "rc": 0,
"start": "2014-03-30 21:05:33.974902", "stderr": "", "stdout": "gazillion
texts here with lots of \n in between gazillion texts here with
lots of \n in between gazillion texts here with lots of \n
in between gazillion texts here with lots \n in between"}

When –verbose flag is set, I believe that the intention is for a human to eventually review the verbose log output. And whenever the human did review the log, the person never failed to tell me that the jsonified message was impossible to read, to which I replied with “They will fix it someday.”

Well, Ansible is now at version 1.x and the problem is still there.

So, while we continue on waiting, the workaround I use for now is to set up an Ansible callback plugin that listens to some task events, and then logs the result in a human readable format, with each field on its own line and newline stays as-is.

Here’s how I set it up:

  1. Set callback plugins directory in Ansible configuration file (ansible.cfg file):
    [defaults]
    callback_plugins = path/to/callback_plugins/
  2. Create a callback plugin file at path/to/callback_plugins/ directory, I call mine human_log.py .
    Here’s the callback plugin gist: https://gist.github.com/cliffano/9868180
  3. Run ansible-playbook command:
    ansible-playbook -i hosts playbook.yml

And the log output looks like this:

cmd:
/tmp/sample.sh

start:
2014-03-30 21:05:33.974902

end:
2014-03-30 21:05:33.994066

delta:
0:00:00.019164

stdout:
gazillion texts here with lots of
in between
gazillion texts here with lots of
in between
gazillion texts here with lots of
in between
gazillion texts here with lots of
in between
stderr:

Now that’s more readable.

You can set the callback plugin on each Ansible project if you want to. But I set mine as part of my CI/Jenkins boxes provisioning, that way all Jenkins jobs that execute an Ansible playbook end up with a readable log output.

Note: I know that some people suggest using debug module to split output into multiple lines. However, having to add register and debug fields all over the tasks would easily clutter the playbook. I find the callback plugin to be a cleaner and simpler solution.

comments 2

Roombox – Node Knockout 2013

A few weeks ago I participated in Node Knockout 2013 (NKO4), a 48-hour hackathon with 385 teams competing for the top spot in 7 categories (team, solo, innovation, design, utility/fun, completeness, and popularity).

And here’s a video of what I hacked: Roombox, a Roomba vacuum cleaner turned into a boombox using node.js . This demo shows the Roomba playing Rocky theme, Beverly Hills Cop theme, Hey Jude (The Beatles), Scar Tissue (Red Hot Chilli Peppers), Super Mario Bros. theme, and Airwolf theme.

 

Note: I put the wrong year for The Beatles’ Hey Jude in the video. I wanted to fix it, but it was already 1 am back then and I had to go to work in the morning. Sorry Beatles fans!

The result? Roombox finished 9th in innovation category, and 14th in solo category. Not bad for an idea that I improvised on the D-day itself. If there’s a solo innovation category, Roombox would’ve finished 1st on that inexistent leaderboard :).

Comments from some judges and fellow contestants:

Cool hack! I’m also amused by the rickroll fail :)

Hah now I need to get a Roomba. Great hardware project / hack.

This got innovation points for me as it never would have occurred to me to do this. Made me laugh and share with others.

Most out-of-the-world idea on NKO :D

Completely useless but very innovative!

I would have given you 5 stars on innovation, but I once heard a hard drive play Darth Vader’s theme song so there is a precedent.

How does Roombox work? To put it simply, Roombox parses abc notation sheets, maps the music notes to fit Roomba notes range, splits each song into 4 segments where each segment would be registered to a Roomba slot, then finally the Roomba is instructed to play the song. Most of the development effort was spent on finding a suitable music format, and on testing the music sheets because in reality only few songs would sound decent on a vacuum cleaner.

Here’s a sketch I scribbled after deciding on how I would hack Roombox:

Huge thanks to Mike Harsch for writing mharsch/node-roomba, and Sergi Mansilla for writing sergi/abcnode. And an apology to my wife and brother for suffering through the weekend listening to dozens of horrible songs being tested :p.

Update (08/12/2013):

DBrain told me about DJ Roomba from Parks and Recreation. If iRobot ever upgraded Roomba’s sound system, Roombox code would be totally useful to achieve ‘music player on a moving vacuum cleaner’ a la DJ Roomba.

comment 0

NodeUp 53: NodeUp Listeners On NodeUp

About a month ago, I joined D-Shaw, Nizar Khalife, Erik Isaksen, and Matt Creager on NodeUp 53 where we discussed about NodeUp podcast and node.js community from NodeUp listeners point of view, and I also talked a bit about Australia, kangaroos, and node. Thanks to Rodd Vagg for pinging me about this particular episode.

Recording the show itself was an interesting experience :). For one, it started at 4am Melbourne EST. I totally missed the two alarms I set up, and was finally awaken by my mobile’s push notification alert from dshaw’s tweet telling me to accept the Skype invitation about two minutes before 4. Ran down the stairs, head spun a bit for the first hour lol.

Here’s the transcription of NodeUp 53 thanks to Noah Collins. I made a mistake where I thought I said that Flickr Photo migrated to node.js as davglass tweeted, but I actually said Facebook Photo on the show. It should be Flickr Photo. My bad, I’m sorry folks.

comment 0

An Old Dryer, A Watts Clever, and A Ninja Blocks

This was another quick weekend hack to fix my old dryer’s busted timer problem (busted timer = having to stay around when it’s time to switch off the dryer).

Step one was to use Watts Clever Easy-off Remote Control Socket which allowed me to switch the power on and off remotely. This product comes with a remote control which saved me from having to get out of the house to get to the garage during winter. But that’s not all…

Step two was to program the socket on a Ninja Blocks, which gave remote control ability via the web. This allowed me to turn off the dryer all the way from my office.

Step three was to write a node.js script that talks to Ninja Blocks which in turn switches the power socket on and off. This script was then executed from a scheduled Jenkins job.

Voila, the old dryer had a new timer, albeit a long-winded one :p.

comment 0

Monitor Jenkins From The Terminal

Here’s how I’ve been monitoring my Jenkins setup…

A combination of Nestor + watch + Terminator » one view for monitoring failing builds, one view for executors status, and one view for job queue. A summary of Jenkins status info on a small screen estate that I can place at the corner of my workspace.

If you want to set up something similar, here are the commands: (assume JENKINS_URL is already set)

  • watch -c “nestor dashboard | grep FAIL”
  • watch nestor executor
  • watch nestor queue
comment 0

DataGen Workers Optimisation

I released DataGen v0.0.9 during lunch break yesterday. This version includes the support to limit how many workers can run concurrently, which is something that I’ve always wanted to add since day one. I finally got the time to do it last weekend, and it turned out to be an easy task thanks to Rod Vagg‘s worker-farm module.

Why is this necessary?

The problem with previous versions of DataGen was that when you want to generate 20 data files, then 20 worker processes will be created and run concurrently. It’s obviously not a great idea to have 20 processes fighting over 2 CPUs.

With v0.0.9, you can specify this limit using the new -m/–max-concurrent-workers flag: (if unspecified, it will default to the number of CPUs)

datagen gen -w 20 -m 2

When I first wrote about DataGen last year, I mentioned that I still needed to run some tests to verify my assumption about the optimal number of workers. So here it is one year later…

The first test is on a Linux box with 8 cores, where each data file contains 500,000 segments, each segment contains a segment ID, 6 strings, and 3 dates.

The second test is on an OSX box with 2 cores, where each data file contains 500,000 segments, but this time each segment only contains a segment ID.

As you can see, the performance is almost always best when the concurrent running worker processes are  limited to the number of available CPUs (8 max concurrent workers on the first chart, and 2 on the second chart).

When you specify 20 workers and your laptop only has 2 CPUs, only 2 workers will generate the data file concurrently at any time, and you can be sure that it will be faster than having 20 workers generating 20 data files at the same time. And that’s why DataGen’s default setting allows as many concurrent workers as the available CPUs.

comment 0

Jenkins Build Slaves On A Budget

About half a year ago our team started working on a project with micro-service architecture, which means we had a lot of little applications to build as part of our delivery pipeline. One of the reasons why we opted to use this architecture was to gain the ability to replace a piece of component without having to rebuild the whole system, hence enabling faster feedback loop by releasing small chunks of changes in small parts of the system.

But there was one problem. Each application build was CPU-intensive, this includes fetching source, installing dependencies, unit testing, code coverage, integration testing, acceptance testing, packaging, publishing to repositories, and deploying to target environments. And nope, we don’t have a build farm!

Our goal was to have each application build to finish in less than a minute, which was fine when there’s only one build, but failed horribly when there were lots of changes triggering multiple downstream builds, often up to 15 builds at the same time with only 4 executors (4 CPUs) available on the build master. Scaling up wasn’t going to take us far, so we had to scale out and distribute the build system earlier than we normally would with past monolithic stonehenge projects.

We considered the idea of using the cloud, either an existing cloud CI solution or Amazon EC2, but we had to rule this out at the end due to extra cost, source code restriction, and network latency. One developer then suggested the idea of using the developer machines as build slaves, each one having 8 CPUs, SSD, lots of RAM and disk space, plenty of firepower lying around under-utilised most of the time.

So we gave it a go and it worked out really well. We ended up with additional 7×4 = 28 build executors, and it’s not unusual to have those 15 applications built at the same time and finished within a minute. Here’s our setup:

  • Each build slave has to self-register to the master because developer machines only have dynamic IP addresses, so they can’t be pre-configured on build master.
    This is where Jenkins Swarm Plugin comes in handy, allowing each slave to join the master, and the master doesn’t need to know any of the slaves beforehand.
  • Each build slave has to re-register when the machine is rebooted, we use upstart to do this.
  • Each build slave runs as its own user, separate from the developer’s user account. This allows a clean separation between user workspaces.
  • Each build slave is provisioned using Ansible, always handy when there are more build slaves to add in the future, or to update multiple build slaves in one go.
  • Each build slave is allocated 50% of the available CPUs on the machine to reduce any possibility of interrupting developer’s work.

So there, build slaves on a budget :).

I think it’s easy to overlook the fact that developer/tester machines are often under utilised, and that they would serve an additional purpose as Jenkins build slaves, a reasonable alternative before looking at other costlier solutions.

comment 0

CITCON 2013

I attended CITCON 2013 in Sydney last February. This year’s sessions covered more non-technical issues compared to CITCON 2010. Two of the more interesting topics for me were on how devops movement could potentially discourage collaboration, and on how large non-tech companies try and still fail to implement continuous delivery.

Those were some of the problems that I’ve been battling for many years. In an organisation where dev and ops are two separate divisions, devops is often a shortcut for dev to do ops tasks while bypassing any ops involvement. Instead, a better alternative would be for dev and ops teams to collaborate and stop fighting over issues like root access.

As for the second topic, continuous delivery is sometimes not as straightforward as it seems. One major obstacle to continuous delivery implementation is a conservative change management process. No matter how you automate your delivery pipeline along your development, test, and staging environments, it would all be useless if production deployment requires manual approval for the sake of auditing.

Technology is often the easier part, the harder part is on people and policies, on changing a culture, on accepting new ideas.

The best part of CITCON has always been its open space format where ideas/opinions/experiences flow during the discussions. And like most tech conferences, the hallway discussions were not to be missed. The quote-of-the-conf went to Jeffrey Fredrick for pointing out that (my interpretation of what he said) technologists often suck for focusing on the technology to sell to the business (e.g. continuous delivery is awesome), instead of focusing on the business problem and how the technology can solve it (e.g. business problem is time to market, continuous delivery can help).

I also caught up with Michael Neale from CloudBees there, here’s his CITCON 2013 notes, along with some familiar faces from CITCON 2010.