2013-05-01

Serializing Data Like a PHP Session

PHP has several built-in ways of serializing/unserializing data. The most cross-platform is json_encode; pretty much every programming stack can JSON decode data that has been encoded by any other stack. There's also PHP's native serialize function, which is not as cross-platform, but has the added benefit of being able to store and restore PHP objects to their original class.

There's a third, and much lesser known PHP serialization format: the format that PHP uses to store session data. If you have ever popped open a PHP session file, or stored session data in a database, you may have noticed that this serialization looks very similar to the serialize function's output, but it is not the same.

Recently, I needed to serialize data so that it looked like PHP session data (don't ask why; I highly suggest not doing this if it can be avoided.) It turns out, PHP has a function that encodes data in this format: session_encode. Great! I'll just pass my array of data to it and...

Oh wait. session_encode doesn't accept any arguments. You can't pass data to it. It just takes whatever is in the $_SESSION superglobal and serializes it. There is no built-in function in all of PHP that will serialize arbitrary data for you the same way that it would be serialized into a session.

There are a few userland implementations of PHP's built-in session serialization, mainly built around string splitting and regexes. All of them handle scalar values; some handle single-level arrays. None of them handle nested arrays and objects, and some have trouble if your data contains certain characters that are used in the encoding.

So I came up with my own functions for reading/writing arbitrary session data without overwriting the existing session. (Edit: I later noticed that a few people suggest a similar method in the comments on the PHP manual pages for session_encode and session_decode.):


Using these functions requires there to be an active session (session_start must have already been called.) Edit: Thanks to Rasmus Schultz for also pointing out that session_encode might be disabled on some systems due to security concerns.

PHP already has a built-in way to serialize and unserialize session data. The problem is that it only serializes from and unserializes into the PHP $_SESSION global. We probably don't want to overwrite the current $_SESSION. We hold a copy of whatever data is already in $_SESSION, then use it to perform our data serialization, then restore it afterwards. And because we're using PHP's built-in session serialization, we get nested array and object serialization for free, and we didn't have to write our own parser.

2013-04-23

Loggly and Puppet

Update 2013-10-22: This post refers to Loggly generation 1, and may (most likely) not work with Loggly's new second generation product offering.

As a follow-up to my previous post on pulling data from Loggly using JQuery, this post will show how to use Puppet to automatically register and configure instances to send data to Loggly.

At ServiceTrade, we use Amazon Web Services for almost all of our infrastructure. All our production servers are EC2 instances. The configuration of all the instances is kept in Puppet manifests. Instances go down and come up all the time, and Puppet helps us make sure they are all configured exactly alike out of the box.

A server cannot send data to Loggly unless you have previously told Loggly to accept data from it. Unfortunately, with server instances being created and removed automatically, it would be impossible to keep up with hand-registering each instance with Loggly. Fortunately, we can use Loggly's API and some Puppet manifests to register our instances for us when they come up.

We use rsyslog on our instances for collecting system log data (syslog, kernel, mail logs, etc.). rsyslog can tail log files forward them to other log files, or even other servers via TCP or UDP. Loggly has great documentation on setting up rsyslog to forward log files to Loggly.

First, we need to have Puppet manage rsyslog. This ensures that rsyslog will be installed, and gives us control over rsyslog's master configuration file and a directory of instance specific configuration files. Below is the rsyslog module file. All files are relative to the Puppet root directory.

modules/rsyslog/manifests/init.pp

As it says, the main configuration file will be in modules/rsyslog/files/rsyslog.conf. The config file is the standard one installed by our package manager with a few minor alterations, seen here:

modules/rsyslog/files/rsyslog.conf

That last line is important, because all out Loggly specific configurations will go in /etc/rsyslog.d.

Now that rsyslog is set up, we need to tell each instance where to send its log files. Additionally, we need to register each instance with each log file it will be sending to Loggly. Each log file is sent to a different endpoint, which Loggly refers to as an input. Each input has an ID, and a specific port on the Loggly logging server that maps to that ID. We have already set up a specific Loggly user for API purposes, and we'll use that user to do the registration.

First, we set up a new module that will hold our Loggly API configuration.

modules/loggly/manifests/init.pp

The hash of inputs allows us to easily reference each input we've mapped in Loggly, without having to remember specific ID and port numbers elsewhere in our manifests.

We'll also set up a Puppet template for tailing out log files and forwarding them to Loggly:

modules/loggly/templates/rsyslog.loggly.conf.erb

The template will take the values defined in loggly::$inputs and create one log file per input, which will ultimately end up in /etc/rsyslog.d.

One last Loggly manifest file is needed. This one generates the config file for an input, then registers the server against the input using Loggly's API.

modules/loggly/manifests/device.pp

This manifest uses the $name passed to the definition to gather data from the $inputs hash to build a config file. Also, it execs a cUrl call to Loggly's API to register the device for the input. The response to this call is stored for two reasons: first, if anything goes wrong, we have a record of Loggly's response to our request; and second, if the response file already exists, Puppet will know it does not need to make another call to the API.

All that remains is to use our loggly::device definition in a node definition:

manifests/site.pp

Since our input IDs and ports are bound to specific input names in our $inputs hash, we only need to know the names of the inputs we want to configure this instance to send to, and loggly::device does the rest.

Hopefully, at some point we (or someone else) will get around to releasing a proper Puppet module for this. Until then, I hope this post helps you get set up with the Loggly centralized logging service.

2013-04-12

Loggly from Javascript

Update 2013-10-22: This post refers to Loggly generation 1, and may (most likely) not work with Loggly's new second generation product offering.

For my most recent Dev Days project, I implemented centralized logging for our application, ServiceTrade. I don't want to worry about running our own indexing server, or storing the logs long term, so I investigated several SaaS logging solutions and eventually settled on Loggly. I was impressed with the ease of setting up our account, defining our logging inputs and even integrating with our Puppet configuration management infrastructure. For long term storage, they push raw log files to an S3 bucket of your choosing. Their customer support seemed very eager to help with the one issue I had. All-in-all, I've been pleased with the product.

One thing about Loggly that could use a little work is saved searches. First off, when Loggly gives you a graph of events from a saved search (a very cool feature) the graph sometimes loses information when zooming in and clicking on a section to see specific logs. Visiting the page for a saved search on a specific set of inputs and clicking on the graph to pull up the log lines for that search with give log lines across all inputs, not just the ones the saved search is limited to.

Secondly, you are limited to only 5 saved searches at the moment. The saved search feature is in beta, so hopefully they will allow saving of more (ideally unlimited) searches in the future. Apparently, you can have up to 2000 saved searches; the wording on the saved searches list page is out-of-date.

We are using their excellent API to pull down data and do our own visualizations of multiple saved searches. I'm using jQuery on a simple HTML page to query the API. There are a few caveats. The following information will hopefully prevent someone else from spending the half-hour I did trying to figure this out.

First of all, the API uses HTTP basic authentication. Jquery's get call does not handle HTTP authentication, so I had to use the more verbose ajax method.

Also, since the request is cross-domain, I had to use JSONP, which Loggly supports.

Finally, Loggly's API returns a Bad Request response if you send any parameters that it does not recognize. Unfortunately, unless you tell it otherwise, jQuery.ajax() will always send a timestamp query parameter to prevent response caching. In order to get everything to work, I had to tell the request to turn caching off and not send the parameter.

Here is what the final call looks like:

You might also want to check out my blog post soon on using Puppet to automatically register servers with Loggly.

2013-02-11

Storytellers and Prognosticators: Lessons in Communication Style

Every other Friday, my team holds a retrospective meeting. One issue that came up in our most recent retrospective was a particularly contentious user story estimation meeting that had occurred a few days prior. Voices were raised, people were interrupted mid-sentence, and sarcasm was liberally deployed. This was a specific isolated incident, and I am very proud of my team that we were able to quickly own, discuss and remedy the situation. We are a better team of communicators as a result.

After the retrospective, I thought about different communication styles, and I tried to come up with different personas for the different types of communicators on my team. So far, I've broken the communication styles down into these general personas:

Storytellers love to talk about the past. Their goal is to remind everyone that experience is the best teacher. They are the archive of a team's experiences, reminding everyone of past obstacles and past triumphs. By trying to couch everything in terms of previously encountered situations, storytellers sometimes have trouble recognizing changed circumstances. You can recognize a storyteller by phrases like "Do you remember when..." and "Last time this happened..."

Prognosticators are in some ways the opposite of storytellers. They love to talk about likely outcomes and visions of the future. Their ideas tend to be expressed as innovative approaches to problems and as warnings about potential pitfalls. Prognosticators can sometimes derail a conversation by making predictions based on erroneous assumptions. They can be recognized by phrases like "Something we need to watch out for..." and "It's possible that..."

Prognosticators and storytellers tend to feed off each other, with the latter talking about how a current situation is similar to the past, and the former talking about how the current situation is different. When they get on a roll together, it may be difficult for the other personas to break into the conversation. Both must be careful to not take the conversation off tangents and drive it away from productive outcomes.

Inquisitors communicate by asking questions (the Socratic method.) Their strength is getting others to think about what they are saying by asking for clarification, and by steering the tone and direction of the conversation to discover new possibilities. Inquisitors are excellent at keeping storytellers and prognosticators on track. It is important that inquisitors remember to contribute knowledge instead of always asking questions to which they already know the answers, otherwise they risk looking condescending. Inquisitors can be recognized by phrases like "What if..." and "What did you mean by..." and "Have you considered..."

Evaluators are good listeners. Unlike the other personas, which tend to drive conversations, evaluators do not speak often, and when they do, they tend to be soft-spoken. Evaluators are good at judging ideas objectively and combining multiple points of view into one cohesive vision. It is important that evaluators do not let themselves get talked over, and that they do not completely hold back from contributing; inquisitors can help draw an evaluator into the conversation. Evaluators can be recognized by phrases like "That's an interesting thought..." and "I was thinking about..."

Most people are more comfortable in one, or a combination of two, personas. I call this their base communicator. These are not discrete communication styles. Elements of any one may be combined with elements of another, and no person is purely one persona or another. From day to day, even within the course of a single conversation, people flow in and out of these different personas.

In conversation, especially in larger groups, it is important to recognize which persona is speaking. Remember that familiarity breeds complacency! As team cohesion grows, you will soon learn to recognize each team member's base communicator, and when that happens it is even more important to pay attention to what they are saying and how they are saying it. Just because you know someone's general communication style, you cannot assume that they will always communicate that way in every circumstance.