AWS Athena 查詢 VPC Flow logs 的 TOP 10 語法

2020-08-06 AWS

先把文件放在最前面「Querying Amazon VPC Flow Logs」,在這篇記錄 TOP10 常見的 Query 語法

在這篇文章的必要條件是已經開啟 VPC Flow Logs,參考「VPC Flow Logs – Publishing to Amazon S3

Environment

  • Region:us-east-1
  • Account Id (範例):0123456789
  • VPC Flow logs Bucket:vpcflow-logs-WETX02MSQ8213l
  • Bucket Object 結構:s3://vpcflow-logs-WETX02MSQ8213l/AWSLogs/0123456789/vpcflowlogs/us-east-1/2020/08/06/*.log.gz

s3://<your_bucket>/AWSLogs/<your_account_id>/vpcflowlogs/<your_region>/YYYY/MM/dd/*.log.gz

Athena

Athena 可以透過 SQL 語法直接查詢 S3 中的資料,非常方便

  • 建立 VPC Flow logs Table

建立 Table 不一定要新建 Database,可以直接在 default 這個資料庫建立與 S3 關聯的 Table

CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
  version int,
  account string,
  interfaceid string,
  sourceaddress string,
  destinationaddress string,
  sourceport int,
  destinationport int,
  protocol int,
  numpackets int,
  numbytes bigint,
  starttime int,
  endtime int,
  action string,
  logstatus string
)
PARTITIONED BY (`date` date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://vpcflow-logs-WETX02MSQ8213l/AWSLogs/0123456789/vpcflowlogs/us-east-1'
TBLPROPERTIES ("skip.header.line.count"="1");

如果都沒有改過則會是 Default Schema,也可以 Custom Schema 讓顯示的資訊更多

  • 建立 Table Partition,按照日期切分以寫文章的日期 2020-08-06 為例
ALTER TABLE vpc_flow_logs
ADD PARTITION (`date`='2020-08-06')
location 's3://vpcflow-logs-WETX02MSQ8213l/AWSLogs/0123456789/vpcflowlogs/us-east-1/2020/08/06';

先用簡單的 SELECT 撈一下確認資料正常

SELECT * FROM default.vpc_flow_logs LIMIT 10;

TOP 10 VPC Flow logs Query

範例幾個很常見的 TOP 10 Query 示範

  • TOP 10 Packet count of Interface-Id

要查哪一個 Interface-Id 浪費了大量 VPC traffic 費用時很好用。

SELECT SUM(numpackets) AS
  packetcount,
  interfaceid
FROM raw
GROUP BY interfaceid
ORDER BY packetcount DESC
LIMIT 10;
  • TOP 10 按照 Source -> DestProtocol 封包數量統計
SELECT 
  action, 
  sourceaddress, 
  destinationaddress,
  sum( CASE WHEN protocol = 6 THEN numpackets ELSE 0 END ) AS tcp,
  sum( CASE WHEN protocol = 17 THEN numpackets ELSE 0 END ) AS udp,
  sum( CASE WHEN protocol = 2 THEN numpackets ELSE 0 END ) AS icmp,
  sum( numpackets ) AS total
FROM raw
GROUP BY action, sourceaddress, destinationaddress
ORDER BY action DESC, sourceaddress, destinationaddress
LIMIT 10;
  • TOP 10 按照 Source -> Desttraffic bytes 做統計

可以找到不正常的 traffic 是誰在傳輸

SELECT 
  sourceaddress,
  destinationaddress,
  sum(numbytes) as traffic_in_bytes
FROM raw
WHERE action = 'ACCEPT'
GROUP BY sourceaddress, destinationaddress
ORDER BY traffic_in_bytes DESC
LIMIT 10;

給 Mr. 沙先生一點建議

彙整

分類

展開全部 | 收合全部

License

訂閱 Mr. 沙先生 的文章

輸入你的 email 用於訂閱