This article is an introduction to cloud computing for IT architects. It covers topics that could be complete articles or even books. By no means can this be complete. Instead, this article is intended as a starting point in the IASA business technology architecture body of knowledge, where topic areas provide context for IT architects to be aware of.
Abstraction
One way to categorize cloud services is by the level of abstraction they offer. Traditionally, computing requires physical hardware: servers in racks in datacenters owned by the customer (or service provider). Infrastructure as a service (IaaS) provides access to virtualized computing resources, such as servers, storage, and networks, which the customer can configure and manage. Platform as a service (PaaS) adds a layer of software tools and frameworks, such as databases, middleware, and development platforms, which simplify the deployment and management of applications. Software as a service (SaaS) delivers ready-to-use applications that run on the cloud provider’s infrastructure which the customer can access through a web browser or an API. The NIST definition of cloud computing mentions five essential characteristics: On-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. If your company or solution could benefit from these characteristics, public cloud may be a fit for you. And there may also be good reasons why it isn’t a fit. For instance, when you own and operate an existing datacenter, deliver acceptable functionality, at an acceptable cost, are comfortable to continue that, and have stable workloads, which don’t need above mentioned characteristics. There are trade-offs everywhere.
Responsibilities
The level of abstraction offered by cloud services also affects the division of responsibilities between customers and cloud providers. In general, the higher the level of abstraction, the more responsibilities are shifted from the customer to the provider. For example, in IaaS the customer is responsible for managing the operating system, the applications, the data, and the security of the virtual machines, while the provider is responsible for maintaining the physical infrastructure, the virtualization layer, and the network connectivity. In PaaS the customer needs to manage the application code, the data, and identity and access, while the provider takes care of the rest, including the operating system, the middleware, the database, and the security of the platform. In SaaS the customer brings configuration, data, and identity and access to use the software, and the provider manages the software, from development to deployment to maintenance and lower layers including compute, storage, and networking. For simplicity, we describe the relationship between customers and cloud providers. For completeness we should mention that several types of service providers exist, who can manage various layers between customers and cloud providers. To elaborate a bit, we simply use the term public cloud as one place to go to in contrast with on-premises. In practice, there are multiple public cloud providers, who provide IaaS, PaaS, and/or SaaS services in various shapes and sizes. Most companies use multi cloud, for instance by using infrastructure from one or more providers, low code from that, and CRM from another. Private cloud can mean many things. For instance, it can be traditional hosting, where the customer owns hardware, optionally with an on-demand self-service layer for IT teams. It can also be used for (the perception of) a dedicated environment in a public cloud service to improve security and acceptance. Hybrid cloud is a term to describe a mix of public cloud and on-premises services. For instance, in a manufacturing plant there needs to be local compute capacity, and for intelligent services or on-demand capacity there also needs to be public cloud capacity.
Trust
Before you start deploying, or even experimenting, workloads in public clouds, you should investigate whether you trust them: What are your standards for compliance, privacy, and security, and does the cloud provider facilitate them? Since the concepts of on-premises and public cloud are different, you may have to reconsider existing implementation guidelines and ideally go back to the originating requirements. For instance, for on-premises there may be requirements that state that deployments should be made in two datacenters separated by a certain distance and connected by a certain connection, whereas for public cloud that may be less relevant or even possible but the underlying requirement for high availability and/or disaster recovery may very well be possible through other means. Additionally, you want to consider tradeoffs including maintainability verses portability: what level of abstraction do you prefer and how do you look at vendor lock-in. For instance, serverless functions may require little maintenance but what are your options in case you want to leave the cloud provider?
Identity and access
An essential difference between traditional physical hardware hosting and public cloud is that cloud services sit in datacenters owned by cloud providers, whereas traditional hosting could be inside a customer’s building. As a result, identity management with authentication and authorization becomes crucial: as soon as a customer gets access to a cloud environment, others can and will try to get access too. This is why login information consisting of usernames and passwords should include other factors to prove that the user and/or application is indeed who they say they are. Network access control is important too: when your cloud services have public endpoints, other actors can and will access them. So before deploying a workload, you want to understand if and how it should be exposed externally. As mentioned before, the level of abstraction is relevant: for SaaS, network access may be less of an issue, whereas for IaaS it is an issue when a virtual machine with management ports for ssh/rdp is exposed to the internet. See: Topic Area Security
Cost
In public cloud services as described earlier, customers don’t own the hardware, instead they pay for what they use. This shift from capital expense to operational expense brings a number of issues to be aware of. First is that it is very easy to get started with cloud services, ranging from lightweight containers to the largest virtual machines with latest GPUs. When you just experiment for a couple of minutes, there may not seem to be an issue, but too often those services are not removed after experimentation and you only find out when you receive the invoice at the end of the month. So it is important to be aware of the services you start, their cost, and to monitor costs regularly and automated with alerts. Public cloud providers offer pricing calculators as a starting point. Pay-per-use also means you want to be efficient with resources you use, and scale them to your needs; when your workload’s usage changes throughout a period of time, you want to be proactive with scaling to provide enough capacity but not too much. And it may be interesting to understand your cloud provider’s pricing options, to determine whether procuring a certain capacity can be more economical, or whether running a workload at a certain time or location with surplus capacity is economical.
Operations
When operating cloud services, cost is one aspect and you also want to provide a good experience to your users. Automatic scaling sounds simple but can mean that the first additional users are experiencing delays. So when you recognize usage patterns through time, it may be an option to proactively scale upfront for instance during business hours so that all users have a great experience. When a workload isn’t used at all, it may be an option to stop or completely delete it, and then bring it back online when it is needed. This is possible with cloud providers who provide software defined services which can be automatically created through infrastructure as code. Commons scenarios are workstations used by business or it employees, who normally don’t use them at night, which saves in compute cost. See: Topic Area DevOps
Performance
Performance in public cloud is different from on-premises, in the sense that there may be more compute, storage and network capacity available, but it probably is not deployed together like your traditional web server and database server. Cloud provider locations consist of buildings, datacenters, zones, and connections that are intentionally distributed. And with that it becomes important to design solutions that optimize network traffic both in programming logic and in deployments. Additionally the deployment location and location of your users is important to understand, so that the solution is present in the areas where your users connect to it.
Reliability
Your traditional web and database layers may have been deployed double in a single location or even two locations depending on their importance. In the public cloud you can do that too. In fact it is recommended that you expect failures because with the increased complexity, compared to one or two single servers in a rack you can see, more components can and will fail. This starts at design patterns that retry to send a message when it could not, and ends with preparing for a complete datacenter location to be unavailable by deploying a copy of the solution in another location.
Sustainability
Last but not least is the topic of sustainability. On the one hand, public cloud services often operate at an efficiency that is hard to achieve for a customer in another industry, on the other hand public cloud datacenters use vast amounts of energy, which is not going to decrease anytime soon with the rise of artificial intelligence. What are your sustainability goals? How do they relate to the sustainability goals of your cloud provider and do you agree with the way they are trying to increase their sustainability? See: Topic Area Sustainability
These aspects of well architected are described in detail by cloud providers, including guidance on how to use their individual cloud services. AWS Azure [GCP]https://cloud.google.com/architecture/framework)