When I started in datacenters, we had places where we used network addressing and layer 3/layer 4 firewalls as the primary way to manage service identity for internal services where we needed it. Even at that time, there were some annoying attack scenarios you had to deal with in multitenant environments and compromise scenarios attacking your layer 2 and 3 infrastructure (MAC spoofing, ARP poisoning, etc.) that meant even then it wasn’t a totally reliable source for identity information. Luckily, for those reasons and other trends, people have largely moved away from source IP as a source of workload identity.
With the advent of high automated infrastructure systems and the birth of “cloud” infrastructure we saw other approaches to providing identity to compute workloads emerge. The most well known this is probably the AWS EC2 metadata service. Fundamentally this method boils down to using a link-local address to allow the workload to access an API to get information about itself over the network. This include operational characteristics and user data used to power cloud-config. Cloud-config powers a lot of the run time first boot features like executing arbitrary scripts, or configuring the system based on data provided by the metadata service.
However, if configured with an IAM role and instance profile, this service also provides a set of AWS IAM credentials that can be used to call permitted AWS services (based on the access configuration of that role) that are effectively accessible to any process that can source network traffic from the system by default. Another endpoint this metadata service provides is an identity document. This is a cryptographically signed document containing configuration information about the instance. Because this identity document is made available only to the workload itself this link local address people also use possession of this identity document as proof of workload identity.
This approach has some short comings, many of which these service providers have worked on strategies to mitigate. AWS on its documentation page for the metadata service (at the time of writing) calls out one of the largest ones. Any workload that can communicate over the network to that link local address can see all that data and those credentials. Outside of obvious attacks resulting in system compromise, there are a surprising number of attack vectors that allow you to make a vulnerable workload or systems perform a web request on behalf of the attacker – and with this network metadata approach all of them potentially expose your service identity/credential data.
It is easy to build broad API support for this credential discovery approach. Both the service provider themselves can do this in their libraries and SDKs as AWS does and others can do this using presigned URLs with instance role credentials or validating the instance identity document.
Another approach we see is folks providing credentials to applications, particularly beyond the IaaS level, is injecting data in the processes running environment. This broadly comes in two flavors, actual process environment variables or a file/filesystem provided within the filesystem presented process. I feel like the dangers of storing credentials and secrets in your processes environment variables has been documented extensively, but in summary both infrastructure systems and often times applications themselves (or their dependent libraries or frameworks) don’t protect the process environment as if they contain secret data in many cases. It is still common to find logs that freely log environment variables or send them as part of exception reports or debug error messages/screens. It is more rare to find applications that allow attackers to read arbitrary files from the file system, but they do exist – generally by honest developer mistake or misconfiguration rather than by design as many of the environment variable exposures appear to be.
These patterns are commonly used with Kubernetes secrets or config maps, and container PaaS/function as a service platform like Lambda. Particularly with the filesystem approach this feels like the most prudent path at this point as a service provider to me.
It is noteworthy that these Application Environment methods require more awareness and integration between the scheduler and the workload. In high level cases like function or container as a service, this probably feels natural. However, it gets more complex if you want to support many operating systems and depends heavily on the features available in your scheduler/hypervisor layer.
Very few workloads we deploy today are an island. Most communicate with service providers we leverage consuming PaaS services or other services we write and deploy separately in a modern microservice environment. Traditionally we dealt with long lived credentials that we stored treated as secrets used in the production configuration management or deployment processes.
There are other solutions to machine identity out here. Enterprise organizations have been dealing with Active Directory and their related machine account dynamics with Windows for decades. Even earlier than that systems like Kerberos had solutions to this problem. Most of the solutions that solve this do not fare well when applied to the Service Provider use case. I do not know any service providers today that operate with multiple customers sharing an Active Directory environment for instance. I think many customers would be uncomfortable with this scenario if they did.
As technology and security practices have evolved, the benefits of moving to short lived credentials have been wildly discussed. It caps the duration of risk related to someone getting access to credentials that have been used in your production environment. This includes attack vectors like someone compromising a production backup, snapshot, or set of log files that inadvertently included credentials for your environment.
Many of the advancements people think of as cloud are highly
automated infrastructure and modern deployment practices. Giving customers a secure
root of identity solution can encourage them to adopt other useful
infrastructure automation patterns, like treating servers as short lived and replaceable
rather than long lived and maintained. This sort of identity can be
foundational to first boot scripts interacting with other services from your
provider or reaching out to your centralized identity provider to get other
credentials it needs to access other systems. Getting people to think about their
infrastructure differently, rather than any technical capability is the most
impactful improvement most organizations make during a cloud transformation.