Introduction
Just some notes for the AWS Certified Developer - Associate.
Components
For this book I made a web component you can use to fetch AWS icons:
<aws-icon
icon="iam"
href="https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html"
></aws-icon>
Resources
Identity Access Manager
It consists of applying Policies
to Users
, Groups
and Roles
. IAM is universal, not regional.
The root account is created whe you first create the AWS account. It has admin access to the entire account, so it's a good practice to create a different IAM user for day to day activities. Remember to alwasy setup MFA
to your root account. From here you can setup a rotaion period for the password on your own account.
Policies
A JSON document which defines one or more permissions.
Policies are the rules that will determine if someone can access a given resource.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
Users
End users of AWS. New Users
have no permissions by default. You need to assign them roles in order to be able to access resources
Access Keys
New users are assigned a access key ID
and secret access key
when their account is created. You can access them only while creating the account, if you want to see them again, you must regenerate them.
These keys are not the same of username and password, you can't use them to access the user, you can use them only to access the AWS APIs and command line.
Groups
Allow you to group users applying Policies
.
Roles
Create roles and then assign them to Users
, Applications
or Services
to grant them permission to access AWS resources.
IAM Policy Simulator
It allows you to test IAM policies before pushing them to production.
Validate if the current policies work as expected - great for troubleshooting.
Questions
Which IAM entity can you use to delegate access to other trusted entities such as IAM users, applications, or AWS services like EC2?
- IAM Web Identity Federation
- IAM Role
- IAM Group
- IAM User
You can use IAM roles to delegate access to IAM users managed within your account, to IAM users under a different AWS account, to a web service offered by AWS such as Amazon Elastic Compute Cloud (Amazon EC2), or to an external user authenticated by an external identity provider (IdP) service that is compatible with SAML 2.0 or OpenID Connect, or a custom-built identity broker. IAM Roles.
Elastic Compute Cloud
Secure, resizable compute capacity in the Cloud. Is like a VM
that is hosted on AWS instead of your own data center.
Pricing Options
- On Demand: pay only for the time you are running;
- Reserved: reserve capacity for one or three years. Up to 72% discount;
- Spot: purchase unused capacity at a massive discount. You decide a price for each instance based on the demand, and when it goes higher the instance get killed;
- Dedicated: phisical server running EC2 instances only for you. The most expensive options.
On Deman
Super flexible, perfect for first time in the cloud or normal applications.
Reserved Instances
There are three types of reservations:
- Standard RIs: up to 72% off on-demand price. You cannot change to larger or smaller instances types;
- Convertible RIs: up to 54% off on-demand price. You can decide to change the instances type of equal or greater value;
- Scheduled RIs: schedule the reservations between a window of time of the day, week or month.
Spot Instances
This are perfect for application that need to have a very low compute prices
or applications that need a large amount
of additional computing capacity.
Dedicated
Great for meeting regolatory requirements that do not support multi-tenant virtualization.
- On-Demand: can be purchase on a hourly rate;
- Reserved: can be purchase at a reservation to up to 72% off the on-demand price.
Instance Types
Determinates the hardware of the host computer. Each instance type offers different compute memory, and storage capabilities.
There is a wide selection of instance types, based on the requirements of the application and hardware needed.
Elastic Block Store
Highly available
and scalable
storage volumes you can attach to EC2 instances.
- Production Workloads: designed for critical workloads;
- Highly Available: automatically replicated withing a single availability zone;
- Scalable: dynamically increase capacity and change the type volume with no downtime or performance impact.
General Purpose SSD - gp2 or gp3
A reasonable price for a reasonable performance.
Suitable for boot disks
and general applications
.
gp2 | gp3 |
---|---|
3 IOPS/GiB | Baseline of 3,000 IOPS for all volumes |
Up to 16,000 IOPS per volume | Up to 16,000 IOPS per volume |
Up to 99.9% durability | Up to 99.9% durability |
Provisioned IOPS SSD - io1 and io2
The high performance and also the most expensive.
Suitable for OLTP
(Online Transaction Processing) and latency-sensitive
applications.
io1 | io2 | io2 Block Express |
---|---|---|
50 IOPS/GiB | 500 IOPS/GiB | - |
Up to 64,000 IOPS per volume | Up to 64,000 IOPS per volume | Up to 64TB. Up to 256,000 IOPS per volume |
Up to 99.9% durability | 99.999% durability | 99.999% durability |
Throughtput Optimized HDD - st1
This is not a SSD, it's an hard disk and it's optimized for large amounts of data.
Great for big data, databases, data warehouses, ETL and log processing.
Max throughtput of 500 MB/s per volume
.
It cannot be a boot volume.
Cold HDD (sc1)
The lowest cost data available.
A good option for data that need to be accessed few times per day.
Max throughtput of 250 MB/s per volume
.
It cannot be a boot volume.
IOPS vs Throughtput
IOPS | Throughtput |
---|---|
I/O operations per seconds | Number of bits read or written per seconds |
Important metric for quick transactions, low latency apps | Important metric for large databases, large I/O size, complex queries |
The ability to read and write very quickly | The ability to deal with large datasets |
Provisioned IOPS SSD (io1 or io2) | Choose Throughtput Optimized HDD (st1) |
Resources
Elastic Load Balancer
A load balancer distributes network traffic across a group of servers.
Application Load Balancer
HTTP
and HTTPS
. They operate at Layer 7 (Application Layer)
of the OSI model.
Network Load Balancer
TCP
and High Performance
Classic Load Balancer (legacy)
HTTP
and HTTPS
Gateway Load Balancer
To third-party visual appliencies running in AWS
7 Layer Model
A conceptual framework which describes the function of a network.
Simple Storage Service
S3 is an Object-Based Storage
. Store data as objects rather then in file systems or data block.
Basics
- The total number of objects and the number of data is unlimited
- S3 objects can be from 0 bytes up to 5 terabytes
- Store files in
Buckets
(similar to folders)
All AWS accounts share the same namespace and each S3 bucket must be globally unique (https://<bucket-name>.s3.<region>.amazonaws.com/<key-name>
).
S3 is a key-value
store, and it stores a key, value, version ID and metadata (e.g. content-type, last-modified, team-name, etc..).
Availability
S3 is an highly available (99.95% - 99.99% depending on the S3 tier) and highly durable (11 9's durability) service.
Secure your data
By default every bucket is private (no public access by default). So by default only the owner can read, delete and update files into a bucket.
You can enable server side encryption on the buckets.
You can define Bucket Policies
to define which actions a user can take on the buckets.
You can protect the access using Access Control Lists (ACLs)
to define which AWS account can access each resource.
Encryption Exams Tips
- Encryption
in Transit
: it can use encryption in transit with SSL/TSL or HTTPS - Encryption
at Rest (server side encryption SSE)
SSE-S3
: enabled by default, the keys are provided and managed by AWSSSE-KMS
: the keys are provided by AWS and managed by youSSE-C
: the keys are provided and managed by you
Client Side Encryption
is when you encrypt the file by yourself before uploading itCORS
resource sharing can be allowed to enable a bucket to access resources that are allocated to another S3 bucket
Tiers
S3 Standard
Is a highly available and highly durable storage. Designed for frequent access
and suitable for most workloads
.
It's stored in at least 3 different Availability Zone
.
S3 Standard-Infrequent Access (S3-IA)
Designed for infrequently access data, so data that you may access a few times a month, but not on a daily bases.
Great for long term storage, backups and disaster recovery files.
It provides Rapid Access
, you pay to access the data
(low per-GB storage price and a per-GB retrieval fee).
It's stored in at least 3 different Availability Zone
.
S3 One Zone-Infrequent Access (S3-IA)
Same as the S3-IA but it's available only in one AZ, but costs 20% less
then a regular S3-IA.
Glacier and Glacier Deep Archive
There are 2 Glacier Options: Glacier
and Glacier Deep Archive
.
They are both very cheap and designed for data that needs to be accessed once per year, so good for archiving data
.
To access data from the normal one it can take from 1 minute
to 12 hours
, while for the second has a default
retrieval time of 12 hours
.
S3 - Intelligent Tiering
Automatically moves you data on the most cost-effective tier based on how often you access the data.
S3 Exam Tips (dev)
- S3 is a
Object-Based
storage that allows you to upload files - Not OS or run a DB storage
- Files from 0 bytes up to
5TB
- The total value of data and number of objects you can store is
unlimited
- Files are stored in buckets
- S3 is a global namespace, this means that the buckets must be globally available
- A S3 Object consists in
Value
,Key
,Version ID
andMetadata
Secure Buckets
- By default every bucket is
private
(only the owner can read, delete or upload) - You can use
Bucket Policies
that are applied at a bucket level - You can use
Access Control Lists (ACLs)
that are applied at a object level - S3 buckets can be configured to create
Access Logs
(disabled by default), which will log all requests made to a bucket. These logs can be written to another bucket.
Tiers
Storage Class | AZ | Use Case |
---|---|---|
S3 Standard | 3 | Suitable for most workloads , e.g. websites, content distribution, mobile and gaming applications, and big data analytics. |
S3 Standard-Infrequent Access | 3 | Long-term, infrequently accessed critical data , e.g. backups, data store for disaster recovery files, etc. Min storage duration: 30 days . |
S3 One Zone-Infrequent Access | 1 | Long-term, infrequently accessed non-critical data . Min storage duration: 30 days . |
S3 Glacier | >= 3 | Long-term data archiving that occasionally needs to be accessed withing a few hours or minutes . Min storage duration: 90 days . |
S3 Glacier Deep Archive | >= 3 | Rarely accessed data archiving with a default retrieval time of 12 hours . Min storage duration: 180 days . |
CloudFront (CDN)
A system of distributed servers which deliver webpages and other web content.
It's used to improve the performance of the delivery of a website from all the users around the world.
Cache
Edge Locations
is a collection of services which are in geographically disposed data centers. The default Time to Live (TTL)
is on 1 day, then the object is cleared from the cache (you'll be charged if you clear the cache yourself).
This edge locations are used by cloudfront to make cache of copies of your objects. So people that are far away from your server, can access your content from a closer distance.
The closest edge location get the request, forward it to the CloudFront Distribution and then it caches locally.
CloudFront Exam Tips (dev)
CloudFront Origin
: this is the origin of all the files that the distribution will serve. This can be a S3 bucket, a EC2 instance, an Elastic Load Balancer, or Route53CloudFront Distribution
: this is the name given to the Origin and configuration settings for the content you wish to distribute using CloudFront (CloudFront Delivery Network - CDN)Edge Locations
: this is the location where content iscached
. It is not the same ofAWS Region/AZ
S3 Transfer Accelleration
: CloudFront Edge Locations are utilized by S3 to Transfer Accelleration to reduce latency for S3 uploads
Athena
Athena is a serverless interactive query service
that enables you to run standard SQL queries
on data stored in S3.
You pay per query/TB scanned
.
Athena Exam Tips (dev)
- Athena is an
Interactive Query
service - It uses
standard SQL
to query data from S3 - It's
serverless
so you don't need anything to configure - The only thing you have to do is
point
Athena to the data you want to query in S3 and define atable schema
Lambda
Run code in AWS without provisioning any servers.
You are charged based on the number of requests
, their duration
, and the amount of memory
used by your Lambda.
Charged for | Cost | Note |
---|---|---|
Requests | $0.20 per month per 1 million requests | The first 1 milion requests per months are for free |
Duration | $0.00001667 per GB-second | The first 400,000 GB-seconds per months are for free |
Versioning
You can manage multiple versions of lambda functions using aliases, and use $LATEST
to reference the latest one.
Concurrent Executions
There is a limit of lambdas that can run on the same time on the same region: 1,000 per region
. You can improve the limit by subitting a request to the AWS Support Center.
You can also reserve some concurrent executions for some critical functions.
Lambda Exam Tips (dev)
Lambda triggers
: be aware of the services that can trigger a lambda functionServerless Technology
: Lambda, API Gateway, DynamoDB, S3, SNS, SQSIndependent
: Lambda functions are independent, each event will trigger a single functionExtremely Cost Effective
: Pay only when your code executesContinuous Scaling
: Lambda scales automaticallyEvent-Driven
: Lambda functions are triggered by an event of action
Versioning
$LATEST
tag refers to the latest uploaded Lambda code- You can use versioning an aliasing to point your applications to specific version if you don't want to use $LATEST
- If you use alias instead of $LATEST, it will not use the lastest code automatically
- If no alias is specific at the end of the ARN, then it will use $LATEST
- ARN example
arn:aws:lambda:us-west-2:123456789012:function:my-function:Prod
arn:aws:lambda:us-west-2:123456789012:function:my-function:$LATEST
Concurrent Execution Limits
- The limit is of
1,000 concurrent executions per second
- It's likely that you hit the limit at some point
- If you hit the limit you'll see a
429 HTTP
response - You can get the limit raised by the
AWS support
Reserved concurrency
guarantees that a set number of concurrent executions are always available for critical functions
VPC Access
- It's possible to enable the Lambda to access resources that are under a private
VPC
- In order to access a private VPC, it needs
VPC ID
,private subnet ID
,security group ID
- Lambda create
ENIs (Elastic Network Instance)
using IPs from theprivate subnets
- The
security group
allows your function to access resources under the VPC
API Gateway
It's a serverless service that allows you to publish, maintain, secure and monitor APIs at any scale. It provides a single endpoint for all client traffic interacting with the backend of your application.
It supports RESTful APIs
or Websocket APIs
.
It supports CloudWatch
and Throttling
.
API Gateway Exam Tips (dev)
- It's the
front door
of your application by providing a endpoint for your application running in AWS - It's
serverless
, so it's low cost and scales automatically - It supports
throttling
, so it prevents your application on being overloaded with too many requests - Everything is logged into
CloudWatch
, such as API calls, latency and errors