Service Meshes, Kiali, and Continuous Verification

What happens when observability data meets configuration management and deploys?

Apr 16, 2020

This is Monitoring Monitoring, a quasi-monthly email newsletter about early-stage startups and projects in the observability space. Subscribe here.

There are good reminders that Kubernetes might not be the solution to all of our cloud software problems. In a recent tweet, Jaana Dogan compared Kubernetes to one of the deeper layers (L4) of the OSI model: container orchestration might be a foundational part of cloud native software, but it is also something that many developers and operations teams can avoid interacting with—if they pay someone else to deal with it.

So what’s a friendlier application-focused abstraction if you don’t want to get lost in the deep weeds of Kubernetes and you’re not on the serverless bandwagon? In early 2020, your answer might be some kind of service mesh.

A service mesh manages how microservices (macroservices?) communicate with each other, including what happens when new software gets deployed. There are good security and networking use cases, but the focus of this newsletter this week is what happens when the service mesh meets monitoring.

Service mesh-palooza

The open-source ecosystem around service meshes is thriving. Istio, linkerd, and Envoy have spawned a number of paid and free solutions. Both AWS (AWS App Mesh), Azure (Azure Service Fabric Mesh) have based their managed solutions on Envoy. Google Cloud (Traffic Director) has a solution based on Istio. There is also a fascinating open-source project tailored for financial institutions called SOFAStack that seems to be powering (parts of) large banks in China.

In terms of enterprise SaaS meshes, there’s also Aspen Mesh (incubated by F5, based on Istio), Tetrate (based on Istio and Envoy), and Banzai Cloud’s Backyards.

When it comes to monitoring, the interesting aspect of service meshes is they enable observability data that were previously difficult to collect as a built-in feature of using the mesh with any kind of service. Traditional metrics (like request rate) and traces are labeled with the context of the workloads generating that request, cool visualizations included.

Enter Kiali, a service mesh meets observability project for Istio that seems to be supported by Red Hat/IBM. The interesting twist with Kiali is in the tagline: “Service mesh observability and configuration”.

Configuration meets observability

There are several compelling ideas in Kiali, including emerging methods for correlating observability data. As Kiali—and most major APM/logging vendors—provide solutions that combine metrics and traces, there are different technical approaches involved in combining “the three pillars of observability” in a single interface to enable easy troubleshooting. See this technical talk from Chronosphere cofounder Rob Skillington on deep-linking metrics and traces for how this is done.

Another concept in Kiali is how it uses its privileged position in your infrastructure—it sees all service-to-service communication in the mesh—to help teams manage complex configuration using two different approaches:

Validations: Automatically identify configuration mistakes at runtime.
Wizards: user interfaces that help teams modify service mesh configuration rules without making changes to complex files by hand.

The general idea is if problems are detected, it’s possible to fix them immediately inside Kiali. Metrics are combined with awareness of runtime configuration to answer questions and fix problems in the same tool with human intervention. It’s a cool idea (certified dope by Kelsey Hightower), and a new concept that integrates significant configuration changes directly inside an observability project.

Deploys meet observability

Like diagnosing a problem caused by configuration, observability data has always been central in determining if a deploy was successful. A popular solution is Kubernetes-based continuous deployment software called Spinnaker that integrates with the usual monitoring vendors. Spinnaker is also available as enterprise-focused SaaS from Armory, OpsMx, and Mirantis.

Verica, founded by author of the recently-published Chaos Engineering book Casey Rosenthal, extends this idea further with continuous verification (CV). CV is a method of proactively identifying issues in a complex system using an automated system that verifies assumptions about a service—effectively what happens when you fully automate well-designed chaos engineering experiments, as Netflix did with their Chaos Automation Platform ChAP.

In a 2019 blog post and conference talk, Casey argues that CV techniques are the evolution of what teams have learned with continuous integration and delivery.

Continuous verification features also seem to be publicly available in the continuous delivery as a service startup Harness. Harness achieves this via integration with APM and logging vendors with some machine learning sprinkled on top.

The promising future of using observability data in new and clever ways

There’s a lot of attention on startups focused on the core problems of observability right now—different technical approaches that process, store, instrument, collect, and visualize data. Many of these emerging techniques have been covered in this newsletter, from eBPF and Prometheus databases, to observability pipelines.

However, some of the most compelling startups and projects right now are exploring what happens when you use observability data in clever ways to solve problems facing technical teams as they ship software—even if they decide they don’t need to deal with Kubernetes. New ways to safely deploy software, proactively identifty issues in a complex system, or detect configuration errors are just a few possibilities. There will be many more.

Thanks for reading. If you enjoyed this newsletter, feel free to share using the link below or subscribe here.

Disclosure: Opinions my own. I am not employed, consulting, or an investor in any of the mentioned companies or their competitors.