My objective is to troubleshoot 502 errors in my load balancer. The first step is to set up access logs. That says to create an S3 bucket, create a policy for it, and then configure the load balancer. I refuse to do this in the console; I’m using terraform to create resources and kubernetes to create ingress.
TL;DR: here’s a gist. Then skip down to step 5 to configure the k8s ingress.
Here’s what finally worked:
- Create an s3 bucket. Note that “otel-demo-alb-access-logs” is my BUCKET_NAME, needed later.
resource "aws_s3_bucket" "alb_log_bucket" {
bucket = "otel-demo-alb-access-logs"
tags = {
Notes = "debug 502s from the collector endpoint"
}
}
If the bucket already exists, add it to your terraform state: terraform import aws_s3_bucket.alb_log_bucket "otel-demo-alb-access-logs"
2. Give it the encryption method that ALBs support.
resource "aws_s3_bucket_server_side_encryption_configuration" "dumb_encryption_thing" {
bucket = aws_s3_bucket.alb_log_bucket.id
rule {
apply_server_side_encryption_by_default {
# kms_master_key_id = aws_kms_key.mykey.arn # there is a default one
sse_algorithm = "aws:kms"
}
}
}
3. Give it the magic security policy. Here, “otel-demo-alb” is my PREFIX (needed later).
resource "aws_s3_bucket_policy" "work_dangit" {
bucket = aws_s3_bucket.alb_log_bucket.id
policy = data.aws_iam_policy_document.work_dangit_policy.json
}
data "aws_elb_service_account" "main" {}
data "aws_caller_identity" "current" {}
data "aws_iam_policy_document" "work_dangit_policy" {
statement {
principals {
type = "AWS"
identifiers = [data.aws_elb_service_account.main.arn]
}
actions = [
"s3:PutObject",
]
resources = [
aws_s3_bucket.alb_log_bucket.arn,
"${aws_s3_bucket.alb_log_bucket.arn}/otel-demo-alb/AWSLogs/${data.aws_caller_identity.current.account_id}/*",
]
}
}
Please notice that the account ID in the “principals” is not yours, but a mysterious AWS one. This is very tricky.
4. Now it is time to terraform apply
After that works, it’s time to configure the ALB to send logs to this bucket. I’m not creating that in terraform; I’m creating it using an ingress
resource in Kubernetes. The ALB is then spun up by the ALB Load Balancer Controller, installed in my EKS cluster. (That was hard. I used this terraform.)
5. Add the log configuration as an annotation to your ingress resource.
There’s an incantation to pass the s3 information through to the ALB. Here are the annotations in my ingress.yaml
. Here, “otel-demo-alb-access-logs” is my BUCKET_NAME and “otel-demo-alb” is my PREFIX.
annotations:
alb.ingress.kubernetes.io/group.name: otel-demo
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=otel-demo-alb-access-logs,access_logs.s3.prefix=otel-demo-alb
I updated my ingress with kubectl apply -f ingress.yaml
. Maybe yours is in a helm chart or something.
To see whether it worked, look at the events for your ingress.
bad news?
If it didn’t work, you’ll probably see Failed deploy model due to InvalidConfigurationRequest: Access Denied for bucket: otel-demo-alb-access-logs. Please check S3bucket permission
.
Check the encryption policy on the bucket. I did that by creating an object. It should look like this:
aws s3api put-object --bucket $BUCKET_NAME --key whatever
{
"ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
"ServerSideEncryption": "aws:kms",
"SSEKMSKeyId": "arn:aws:kms:us-west-2:0123124231534:key/e778dcaa-21ab-4d16-bcea-c82db6849a8b"
}
Check the policy on the bucket. If you have jq installed it’s easier to see:
aws s3api get-bucket-policy --bucket $BUCKET_NAME | jq -r .Policy | jq
It should look something like this. Note that I’m granting permission to the ALB account ID for us-west-2. This is strange. You have to dig around on this page to find the secret account numbers.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::797873946194:root"
},
"Action": "s3:PutObject",
"Resource": [
"arn:aws:s3:::otel-demo-alb-access-logs/otel-demo-alb/AWSLogs/034328471234/*",
"arn:aws:s3:::otel-demo-alb-access-logs"
]
}
]
}
good news
If it did work, list the objects in your bucket.
aws s3api list-objects --bucket $BUCKET_NAME
There should be some .gz objects in there soon, containing logs of each request served.
You can pull one down with
aws s3 cp s3://$BUCKET_NAME/$OBJECT_PATH
log.gz
and then unzip it
gunzip log.gz
and then look at it. The fields are listed here. Surely there is an easier way to view these, but finding that is out of scope for me today.
Here’s a scary bash one-liner that downloads all of the logs and unzips them into one file, listing the filenames as it goes:
aws s3api list-objects --bucket $BUCKET_NAME --prefix $PREFIX | jq -r ".Contents | map_values(.Key) | join(\"\n\")" | grep 'gz$' | while read -r f; do echo $f >/dev/stderr; aws s3 cp s3://$BUCKET_NAME/$f - | gunzip; done > logs.txt
Now I can get back to troubleshooting my 502s, and then on to the next yak.
… and the problem turned out to be: the pod it couldn’t connect to was listening on 127.0.0.1
— this made port forwarding work, and everything else fail. 😥