Swiftorial Logo
Home
Swift Lessons
AI Tools
Learn More
Career
Resources

Athena Basics - AWS Serverless

What is Amazon Athena?

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Key Concepts

1. Serverless

Athena is serverless, meaning you do not need to provision or manage any servers. You simply run your queries and pay for the data scanned.

2. Data Sources

Amazon Athena works directly with data stored in Amazon S3, making it easy to analyze large datasets without the need for ETL processes.

3. SQL Queries

Athena supports standard SQL, allowing users to easily write queries to extract insights from the data.

Step-by-Step Setup

  1. Create an S3 Bucket:
    aws s3 mb s3://your-bucket-name
  2. Upload Data to S3: Upload your CSV, JSON, or Parquet files into the S3 bucket.
  3. Create a Database in Athena:
    CREATE DATABASE your_database_name;
  4. Create a Table: Define a table to read your data.
    
    CREATE EXTERNAL TABLE your_table_name (
        column1 STRING,
        column2 INT,
        column3 FLOAT
    ) 
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ',' 
    LOCATION 's3://your-bucket-name/path/';
                    
  5. Run a Query: Execute a SQL query.
    SELECT * FROM your_table_name LIMIT 10;
Note: Make sure the IAM role you are using has permissions for Amazon S3 and Athena.

Best Practices

  • Optimize your data formats (e.g., use Parquet or ORC).
  • Partition your data to improve query performance.
  • Use compression to reduce costs and improve performance.
  • Use AWS Glue to manage your schema and metadata.

FAQ

What formats does Athena support?

Athena supports multiple formats including CSV, JSON, ORC, Parquet, and Avro.

Is there a minimum charge for using Athena?

No, you only pay for the data scanned by your queries.

Can I query data from multiple S3 buckets?

Yes, you can create tables that reference data from multiple S3 buckets.