Debugging Demos

I have been at RailsConf this week. I gave a talk on Natural Language Processing with Ruby in the Machine Learning track. As part of my talk, I wanted to reprise a demo I did at RubyConf 2015 where I used a distributed system to do sentiment analysis based on emoji. At the time I felt I had used some of the hottest technologies. I used Docker containers for each of the different pieces of the pipeline and then used Kubernetes to orchestrate the whole thing. It seemed like a logical way to deploy something that I hoped I would spin up and down on a regular basis as the need for a Kubernetes demo arose. Last week I dug into the code to try and get the demo running again after a 12-month hiatus. It is amazing what you forget during that time. I am sharing some of what I learned in this blog post so that hopefully someone else doesn’t have to have the same battle I did.

The battle of the shebang

When I built this demo Kubernetes was not even 1.0; now we are up to 1.5. So it was not surprising that some of the kubectl and gcloud shell commands had changed slightly. A quick look at the docs and I was back in business but when I ran kubectl get pods all of my pods had status CrashLoopBackoff. So I did the logical thing: deleted them and tried to deploy again. This time I ran kubectl get pods immediately after creating the controller and the status was RunContainerError. So I rebuilt the container and tried again only to get the same error.

Although I know exec-ing into a running container is usually not a good idea it seemed like the only way to debug my current issue. So I started the container with docker run. This time I got a new error.

standard_init_linux.go:178: exec user process caused "exec format error"

I still have no idea what that means, but I guessed that there was an issue with my entrypoint and spent the better part of an hour searching for solutions online and trying different things to see what the issue was. Finally, out of desperation, I tried using the Dockerfile as it appeared on my slides. It worked. I compared the two versions, and the only difference was the version that didn’t work had the Apache V2 license header as a comment at the start of the file. The version that worked had the shebang at the top the file. I moved the shebang to above the Apache license header, rebuilt the container, and when I tried to run it with Docker it worked.

As part of getting the code ready to release to GitHub, I had added the license headers. But I had never done a demo with that version of the files. I’d always done them with a different version that didn’t have the license headers. As a result, I had shared some code as open source that did not work.

The difference between ruby:latest and ruby:onbuild

The next hurdle I ran into was more straightforward to fix. The previous demo had only used Ruby’s Standard Library. The new version of the demo needed the google-cloud-ruby gem. When I tried deploying the code that did sentiment analysis, it would error out because Ruby could not find the needed libraries. I wondered if using Ruby 2.4 was the issue, but I was fairly certain that it was not. I had run the code locally, with Docker, using Ruby 2.3 and it worked fine.

So I went to the documentation. I read through the Docker Hub page for the Ruby image I was using and realized that if I had gem dependencies, I needed to use ruby:onbuild and supply a Gemfile.lock for Bundler. Making the necessary changes fixed my code and the only painful part was downloading the ruby:onbuild image over slow wifi.

Shadowing

At this point, my demo was up and running, but when I looked at the pods, I noticed that some of them had restarted several times. Looking at the logs from my cluster, I saw that those pods were dying because of an exception deep down the Google Cloud Language gem. When I looked at the error more closely, I saw that somehow the gem was getting passed the URI for the Rinda server that serves as the backbone of the demo. Finally, I found an error that explained what I had done wrong

/usr/local/lib/ruby/2.4.0/uri.rb:97: warning: previous definition of URI was here

I had been lazy and hadn’t used any classes or namespaces for my code. So when I declared a constant called URI and used it for the Rinda server’s address, I was shadowing the URI module that exists in standard lib. Fixing my naming fixed the issue and my demo was stable. Also, I learned a valuable lesson about proper namespacing. Now the demo is up and running properly, and I will release the working version on GitHub soon. My talk went well, and I learned and relearned some things in the process. And I was reminded that namespacing is important.