Jump Start Solutions: An optimization problem and design challenge
Life is full of optimization problems. In university, I learned the maxim “sleep, school, or social life. Pick any two.”. At my first software job, I learned another one, “Fast, cheap, or good. Pick any two.” With experience, you realize these quips are amusing but unrealistic. You can always get a bit of all three, but due to outside factors, you often can’t do any one of them to perfection. For example, in university, no matter how much I tried, I couldn’t get perfect grades. I had reached my capacity for academic success. So the wise thing to do, and what I did, was redirect some of the time I was spending on school toward my social life and making friends.
One of my main work projects this year is Jump Start Solutions. These are a fast way to get hands-on and learn with Google Cloud. Each solution is a package of Terraform, application code, and interactive tutorials. The solutions can be deployed with one click in the UI, so you don’t have to install anything on your dev environment to build applications and learn new things.
It turns out, Jump Start Solutions is another optimization problem. Early on, we decided that the solutions should deploy quickly. We also wanted them to be free or cheap to try. We felt it was important that the solutions be more realistic than existing “Hello, World” type demos. And we wanted to show folks how to properly set up things like secrets, networking, and IAM. Lastly, since the program aims to accelerate learning, everything in a solution must be easy to explain and understand.
Those constraints gave us another optimization problem, “Fast, cheap, realistic, and understandable. Pick any N.” Except we didn’t know if N was 1 or 3. We just knew it wasn’t going to be 4.
You can think of these things as a four-team tug-of-war. Each time you start to optimize one constraint, you move further away from the goal with another.
For example, realistic applications with properly designed security, CI/CD, networking, etc, are complex. They have a lot of components connected in complicated ways. This makes them hard to understand, especially for those new to the cloud.
Likewise, realistic CI/CD and deployment pipelines are often slow, especially on an initial release that requires complete provisioning and a new build. Including a realistic CI/CD setup in our solutions slowed down deployments.
In the end, we had to make some compromises. One of the significant compromises we made was prebuilding containers and deploying our applications directly from Terraform. Deploying directly from Terraform isn’t a best practice. Instead, a better approach is to use Terraform for provisioning and use a CI/CD pipeline to deploy your application and run any post-deploy setup steps. In most cases, you also wouldn’t deploy your application from a container image someone else built months ago. You’d build with the latest dependencies and code changes, run tests or security scans, and then deploy. But again, all that takes time. Using prebuilt artifacts and having a single step in our provisioning and deployment process allowed us to cut deployment time by half or more. Ultimately, the improved user experience was worth deviating from best practice.
We made another compromise around deployment time as well. Initially, we had hoped to get deployment time under 5 minutes. We figured folks would only be willing to wait that long. But 5 minutes to provision infrastructure, especially infrastructure, set up networks, deploy an application, and do any post-deploy steps like loading a test database was only achievable for some of the solutions we wanted to build. We could have also made things fast by reducing the complexity, but we felt strongly that basic and “Hello, World” applications weren’t an adequate learning environment. So, we relaxed our requirement from 5 to 10 minute deployment times as the goal. We also created a review process for any solutions that took longer so we could be confident the learning goals were worth the added wait time.
Finally, there was the issue of cost. We want to keep costs down and prioritize using free tier and trial credits as much as possible. However, real-life architectures don’t usually use the smallest possible database, a micro k8s cluster, or precisely one VM. If you want to help folks learn concepts like multi-zone architectures, load balancing across VM clusters, or sharing large amounts of data, you must provide enough cloud resources to teach those skills. Our original requirements for a Jump Start Solution set a target max price. But we also make a point to review each solution to ensure that any time we don’t use free resources, it is necessary to support the learning goals for that solution.
While I wish we could have pulled off a magic trick and managed “fast, cheap, realistic, and understandable, you get all 4!” the world doesn’t work that way. Ultimately, I found thinking through acceptable versus unacceptable tradeoffs in this optimization problem fascinating. I’m proud of what we managed to do even when we couldn’t do everything perfectly.