Athena Basics - AWS Serverless
What is Amazon Athena?
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Key Concepts
1. Serverless
Athena is serverless, meaning you do not need to provision or manage any servers. You simply run your queries and pay for the data scanned.
2. Data Sources
Amazon Athena works directly with data stored in Amazon S3, making it easy to analyze large datasets without the need for ETL processes.
3. SQL Queries
Athena supports standard SQL, allowing users to easily write queries to extract insights from the data.
Step-by-Step Setup
-
Create an S3 Bucket:
aws s3 mb s3://your-bucket-name
- Upload Data to S3: Upload your CSV, JSON, or Parquet files into the S3 bucket.
-
Create a Database in Athena:
CREATE DATABASE your_database_name;
-
Create a Table: Define a table to read your data.
CREATE EXTERNAL TABLE your_table_name ( column1 STRING, column2 INT, column3 FLOAT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://your-bucket-name/path/';
-
Run a Query: Execute a SQL query.
SELECT * FROM your_table_name LIMIT 10;
Best Practices
- Optimize your data formats (e.g., use Parquet or ORC).
- Partition your data to improve query performance.
- Use compression to reduce costs and improve performance.
- Use AWS Glue to manage your schema and metadata.
FAQ
What formats does Athena support?
Athena supports multiple formats including CSV, JSON, ORC, Parquet, and Avro.
Is there a minimum charge for using Athena?
No, you only pay for the data scanned by your queries.
Can I query data from multiple S3 buckets?
Yes, you can create tables that reference data from multiple S3 buckets.