Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Just some notes for the AWS Certified Developer - Associate.

Components

For this book I made a web component you can use to fetch AWS icons:

<aws-icon
  icon="iam"
  href="https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html"
></aws-icon>

Resources

Identity Access Manager

It consists of applying Policies to Users, Groups and Roles. IAM is universal, not regional.

The root account is created whe you first create the AWS account. It has admin access to the entire account, so it's a good practice to create a different IAM user for day to day activities. Remember to alwasy setup MFA to your root account. From here you can setup a rotaion period for the password on your own account.

Policies

A JSON document which defines one or more permissions.

Policies are the rules that will determine if someone can access a given resource.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "*",
      "Resource": "*"
    }
  ]
}

Users

End users of AWS. New Users have no permissions by default. You need to assign them roles in order to be able to access resources

Access Keys

New users are assigned a access key ID and secret access key when their account is created. You can access them only while creating the account, if you want to see them again, you must regenerate them.

These keys are not the same of username and password, you can't use them to access the user, you can use them only to access the AWS APIs and command line.

Groups

Allow you to group users applying Policies.

Roles

Create roles and then assign them to Users, Applications or Services to grant them permission to access AWS resources.

IAM Policy Simulator

It allows you to test IAM policies before pushing them to production.

Validate if the current policies work as expected - great for troubleshooting.

Questions

Which IAM entity can you use to delegate access to other trusted entities such as IAM users, applications, or AWS services like EC2?

  • IAM Web Identity Federation
  • IAM Role
  • IAM Group
  • IAM User

You can use IAM roles to delegate access to IAM users managed within your account, to IAM users under a different AWS account, to a web service offered by AWS such as Amazon Elastic Compute Cloud (Amazon EC2), or to an external user authenticated by an external identity provider (IdP) service that is compatible with SAML 2.0 or OpenID Connect, or a custom-built identity broker. IAM Roles.

Elastic Compute Cloud

Secure, resizable compute capacity in the Cloud. Is like a VM that is hosted on AWS instead of your own data center.

Pricing Options

  • On Demand: pay only for the time you are running;
  • Reserved: reserve capacity for one or three years. Up to 72% discount;
  • Spot: purchase unused capacity at a massive discount. You decide a price for each instance based on the demand, and when it goes higher the instance get killed;
  • Dedicated: phisical server running EC2 instances only for you. The most expensive options.

On Deman

Super flexible, perfect for first time in the cloud or normal applications.

Reserved Instances

There are three types of reservations:

  1. Standard RIs: up to 72% off on-demand price. You cannot change to larger or smaller instances types;
  2. Convertible RIs: up to 54% off on-demand price. You can decide to change the instances type of equal or greater value;
  3. Scheduled RIs: schedule the reservations between a window of time of the day, week or month.

Spot Instances

This are perfect for application that need to have a very low compute prices or applications that need a large amount of additional computing capacity.

Dedicated

Great for meeting regolatory requirements that do not support multi-tenant virtualization.

  1. On-Demand: can be purchase on a hourly rate;
  2. Reserved: can be purchase at a reservation to up to 72% off the on-demand price.

Instance Types

Determinates the hardware of the host computer. Each instance type offers different compute memory, and storage capabilities.

There is a wide selection of instance types, based on the requirements of the application and hardware needed.

Elastic Block Store

Highly available and scalable storage volumes you can attach to EC2 instances.

  • Production Workloads: designed for critical workloads;
  • Highly Available: automatically replicated withing a single availability zone;
  • Scalable: dynamically increase capacity and change the type volume with no downtime or performance impact.

General Purpose SSD - gp2 or gp3

A reasonable price for a reasonable performance.

Suitable for boot disks and general applications.

gp2gp3
3 IOPS/GiBBaseline of 3,000 IOPS for all volumes
Up to 16,000 IOPS per volumeUp to 16,000 IOPS per volume
Up to 99.9% durabilityUp to 99.9% durability

Provisioned IOPS SSD - io1 and io2

The high performance and also the most expensive.

Suitable for OLTP (Online Transaction Processing) and latency-sensitive applications.

io1io2io2 Block Express
50 IOPS/GiB500 IOPS/GiB-
Up to 64,000 IOPS per volumeUp to 64,000 IOPS per volumeUp to 64TB. Up to 256,000 IOPS per volume
Up to 99.9% durability99.999% durability99.999% durability

Throughtput Optimized HDD - st1

This is not a SSD, it's an hard disk and it's optimized for large amounts of data.

Great for big data, databases, data warehouses, ETL and log processing.

Max throughtput of 500 MB/s per volume.

It cannot be a boot volume.

Cold HDD (sc1)

The lowest cost data available.

A good option for data that need to be accessed few times per day.

Max throughtput of 250 MB/s per volume.

It cannot be a boot volume.

IOPS vs Throughtput

IOPSThroughtput
I/O operations per secondsNumber of bits read or written per seconds
Important metric for quick transactions, low latency appsImportant metric for large databases, large I/O size, complex queries
The ability to read and write very quicklyThe ability to deal with large datasets
Provisioned IOPS SSD (io1 or io2)Choose Throughtput Optimized HDD (st1)

Resources

Elastic Load Balancer

A load balancer distributes network traffic across a group of servers.

Application Load Balancer

HTTP and HTTPS. They operate at Layer 7 (Application Layer) of the OSI model.

Network Load Balancer

TCP and High Performance

Classic Load Balancer (legacy)

HTTP and HTTPS

Gateway Load Balancer

To third-party visual appliencies running in AWS

7 Layer Model

A conceptual framework which describes the function of a network.

Simple Storage Service

S3 is an Object-Based Storage. Store data as objects rather then in file systems or data block.

Basics

  • The total number of objects and the number of data is unlimited
  • S3 objects can be from 0 bytes up to 5 terabytes
  • Store files in Buckets (similar to folders)

All AWS accounts share the same namespace and each S3 bucket must be globally unique (https://<bucket-name>.s3.<region>.amazonaws.com/<key-name>).

S3 is a key-value store, and it stores a key, value, version ID and metadata (e.g. content-type, last-modified, team-name, etc..).

Availability

S3 is an highly available (99.95% - 99.99% depending on the S3 tier) and highly durable (11 9's durability) service.

Secure your data

By default every bucket is private (no public access by default). So by default only the owner can read, delete and update files into a bucket.

You can enable server side encryption on the buckets.

You can define Bucket Policies to define which actions a user can take on the buckets.

You can protect the access using Access Control Lists (ACLs) to define which AWS account can access each resource.

Encryption Exams Tips

  • Encryption in Transit: it can use encryption in transit with SSL/TSL or HTTPS
  • Encryption at Rest (server side encryption SSE)
    • SSE-S3: enabled by default, the keys are provided and managed by AWS
    • SSE-KMS: the keys are provided by AWS and managed by you
    • SSE-C: the keys are provided and managed by you
  • Client Side Encryption is when you encrypt the file by yourself before uploading it
  • CORS resource sharing can be allowed to enable a bucket to access resources that are allocated to another S3 bucket

Tiers

S3 Standard

Is a highly available and highly durable storage. Designed for frequent access and suitable for most workloads.

It's stored in at least 3 different Availability Zone.

S3 Standard-Infrequent Access (S3-IA)

Designed for infrequently access data, so data that you may access a few times a month, but not on a daily bases.

Great for long term storage, backups and disaster recovery files.

It provides Rapid Access, you pay to access the data (low per-GB storage price and a per-GB retrieval fee).

It's stored in at least 3 different Availability Zone.

S3 One Zone-Infrequent Access (S3-IA)

Same as the S3-IA but it's available only in one AZ, but costs 20% less then a regular S3-IA.

Glacier and Glacier Deep Archive

There are 2 Glacier Options: Glacier and Glacier Deep Archive.

They are both very cheap and designed for data that needs to be accessed once per year, so good for archiving data.

To access data from the normal one it can take from 1 minute to 12 hours, while for the second has a default retrieval time of 12 hours.

S3 - Intelligent Tiering

Automatically moves you data on the most cost-effective tier based on how often you access the data.

S3 Exam Tips (dev)

  • S3 is a Object-Based storage that allows you to upload files
  • Not OS or run a DB storage
  • Files from 0 bytes up to 5TB
  • The total value of data and number of objects you can store is unlimited
  • Files are stored in buckets
  • S3 is a global namespace, this means that the buckets must be globally available
  • A S3 Object consists in Value, Key, Version ID and Metadata

Secure Buckets

  • By default every bucket is private (only the owner can read, delete or upload)
  • You can use Bucket Policies that are applied at a bucket level
  • You can use Access Control Lists (ACLs) that are applied at a object level
  • S3 buckets can be configured to create Access Logs (disabled by default), which will log all requests made to a bucket. These logs can be written to another bucket.

Tiers

Storage ClassAZUse Case
S3 Standard3Suitable for most workloads, e.g. websites, content distribution, mobile and gaming applications, and big data analytics.
S3 Standard-Infrequent Access3Long-term, infrequently accessed critical data, e.g. backups, data store for disaster recovery files, etc. Min storage duration: 30 days.
S3 One Zone-Infrequent Access1Long-term, infrequently accessed non-critical data. Min storage duration: 30 days.
S3 Glacier>= 3Long-term data archiving that occasionally needs to be accessed withing a few hours or minutes. Min storage duration: 90 days.
S3 Glacier Deep Archive>= 3Rarely accessed data archiving with a default retrieval time of 12 hours. Min storage duration: 180 days.

CloudFront (CDN)

A system of distributed servers which deliver webpages and other web content.

It's used to improve the performance of the delivery of a website from all the users around the world.

Cache

Edge Locations is a collection of services which are in geographically disposed data centers. The default Time to Live (TTL) is on 1 day, then the object is cleared from the cache (you'll be charged if you clear the cache yourself).

This edge locations are used by cloudfront to make cache of copies of your objects. So people that are far away from your server, can access your content from a closer distance.

The closest edge location get the request, forward it to the CloudFront Distribution and then it caches locally.

CloudFront Exam Tips (dev)

  • CloudFront Origin: this is the origin of all the files that the distribution will serve. This can be a S3 bucket, a EC2 instance, an Elastic Load Balancer, or Route53
  • CloudFront Distribution: this is the name given to the Origin and configuration settings for the content you wish to distribute using CloudFront (CloudFront Delivery Network - CDN)
  • Edge Locations: this is the location where content is cached. It is not the same of AWS Region/AZ
  • S3 Transfer Accelleration: CloudFront Edge Locations are utilized by S3 to Transfer Accelleration to reduce latency for S3 uploads

Athena

Athena is a serverless interactive query service that enables you to run standard SQL queries on data stored in S3.

You pay per query/TB scanned.

Athena Exam Tips (dev)

  • Athena is an Interactive Query service
  • It uses standard SQL to query data from S3
  • It's serverless so you don't need anything to configure
  • The only thing you have to do is point Athena to the data you want to query in S3 and define a table schema

Lambda

Run code in AWS without provisioning any servers.

You are charged based on the number of requests, their duration, and the amount of memory used by your Lambda.

Charged forCostNote
Requests$0.20 per month per 1 million requestsThe first 1 milion requests per months are for free
Duration$0.00001667 per GB-secondThe first 400,000 GB-seconds per months are for free

Versioning

You can manage multiple versions of lambda functions using aliases, and use $LATEST to reference the latest one.

Concurrent Executions

There is a limit of lambdas that can run on the same time on the same region: 1,000 per region. You can improve the limit by subitting a request to the AWS Support Center.

You can also reserve some concurrent executions for some critical functions.

Lambda Exam Tips (dev)

  • Lambda triggers: be aware of the services that can trigger a lambda function
  • Serverless Technology: Lambda, API Gateway, DynamoDB, S3, SNS, SQS
  • Independent: Lambda functions are independent, each event will trigger a single function
  • Extremely Cost Effective: Pay only when your code executes
  • Continuous Scaling: Lambda scales automatically
  • Event-Driven: Lambda functions are triggered by an event of action

Versioning

  • $LATEST tag refers to the latest uploaded Lambda code
  • You can use versioning an aliasing to point your applications to specific version if you don't want to use $LATEST
  • If you use alias instead of $LATEST, it will not use the lastest code automatically
  • If no alias is specific at the end of the ARN, then it will use $LATEST
  • ARN example
    • arn:aws:lambda:us-west-2:123456789012:function:my-function:Prod
    • arn:aws:lambda:us-west-2:123456789012:function:my-function:$LATEST

Concurrent Execution Limits

  • The limit is of 1,000 concurrent executions per second
  • It's likely that you hit the limit at some point
  • If you hit the limit you'll see a 429 HTTP response
  • You can get the limit raised by the AWS support
  • Reserved concurrency guarantees that a set number of concurrent executions are always available for critical functions

VPC Access

  • It's possible to enable the Lambda to access resources that are under a private VPC
  • In order to access a private VPC, it needs VPC ID, private subnet ID, security group ID
  • Lambda create ENIs (Elastic Network Instance) using IPs from the private subnets
  • The security group allows your function to access resources under the VPC

API Gateway

It's a serverless service that allows you to publish, maintain, secure and monitor APIs at any scale. It provides a single endpoint for all client traffic interacting with the backend of your application.

It supports RESTful APIs or Websocket APIs.

It supports CloudWatch and Throttling.

API Gateway Exam Tips (dev)

  • It's the front door of your application by providing a endpoint for your application running in AWS
  • It's serverless, so it's low cost and scales automatically
  • It supports throttling, so it prevents your application on being overloaded with too many requests
  • Everything is logged into CloudWatch, such as API calls, latency and errors