Skip to content

SOCLabs/Adenium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

56 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Adenium: Normalizer

contributing

Adenium Framework ์™€ Normalizer

Adenium Framework๋Š” ๋ถ„์‚ฐ์ฒ˜๋ฆฌ Framework์ธ Spark ํ™˜๊ฒฝ์—์„œ ๋™์ž‘ํ•˜๋Š” Application์˜ ๋™์  ์ œ์–ด ๋ฐฉ๋ฒ•๊ณผ ์ž…๋ ฅ Data์˜ ๋ถ„๋ฐฐ ๋ฐ ์ถœ๋ ฅ ๊ฒฐ๊ณผ์˜ ์ „์†ก์„ ์ผ๊ด€๋œ ํ”„๋กœ์„ธ์Šค๋กœ ์ œ์–ด ํ•  ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฒฝ์„ ์ œ๊ณต ํ•ฉ๋‹ˆ๋‹ค.

Normalizer๋Š” Adenium Framework์ƒ์—์„œ ๋™์ž‘ํ•˜๋Š” Application์œผ๋กœ ๋‹ค์–‘ํ•œ Sensor์˜ Event(Log)๋ฅผ ์ •๊ทœํ™” ํ•ฉ๋‹ˆ๋‹ค. Regular expression ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฒ”์šฉ์ ์ธ Tokenize method ์™€ ์‚ฌ์ „ ์ •์˜๋œ ์ •๊ทœํ™” ํ•„๋“œ ๋ฐ ์‚ฌ์šฉ์ž ์ •์˜ ํ•„๋“œ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๋…๋ฆฝ ์‹คํ–‰ ๊ตฌ์กฐ๋กœ ์„ค๊ณ„๋˜์–ด ๋…๋ฆฝ ์‹คํ–‰ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋˜๋Š” ๋ถ„์‚ฐ์ฒ˜๋ฆฌ Framework ์ธ Spark์™€ ๊ฐ™์€ ํŠน์ • Framework์˜ ํ•œ ๋ถ€๋ถ„์œผ๋กœ ๋™์ž‘ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Adenium Framework

Framework์˜ ๊ตฌ์„ฑ

Adenium Framework์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ตฌ์„ฑ ๋ฉ๋‹ˆ๋‹ค.

  • Kafka : ์—”์ง„ ์—์„œ ์ฒ˜๋ฆฌํ•  ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋Š” Kafka queue์— ์ €์žฅ ๋ฉ๋‹ˆ๋‹ค.
  • Zookeeper : Framework ์ œ์–ด ๋ช…๋ น๊ณผ ์—”์ง„์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์„ค์ •, ์šด์˜์ •๋ณด๋ฅผ ์ €์ • ํ•ฉ๋‹ˆ๋‹ค.
  • Adnium framework : Adenium framework๋Š” Spark์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ kafka๋กœ ๋ถ€ํ„ฐ ์ž…๋ ฅ๋œ ๋ฐ์ดํ„ฐ๋Š” Spark stream์„ ํ†ตํ•ด ์ฒ˜๋ฆฌ๋˜๋ฉฐ, Controller๋ฅผ ํ†ตํ•ด Zookeeper์— ์ €์žฅ๋œ ์ œ์–ด ๋ช…๋ น์˜ ์ˆ˜ํ–‰๊ณผ ์šด์˜ ์ •๋ณด๋ฅผ Application ์œผ๋กœ Broadcast ํ•ฉ๋‹ˆ๋‹ค.
  • Adenium framework์˜ ์ฒ˜๋ฆฌ ๊ฒฐ๊ณผ๋Š” Kafka ์— ์ €์žฅ๋˜๋ฉฐ ๋ณต์ˆ˜์˜ Framework๋ฅผ ๊ตฌ์„ฑํ•˜์—ฌ ์ฒ˜๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ ํŒŒ์ดํ”„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋œ ๋ฐ์ดํ„ฐ๋Š” Framework์˜ Application์„ ๊ตฌํ˜„ํ•˜์—ฌ Hadoop, ES, TCP, HTTP ์ „์†ก ๋“ฑ์˜ ํ›„์† ์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Partition ๋ถ„๋ฐฐ ๋กœ์ง ( Spark - Kafka Queue )

๋ถ„์‚ฐ์ฒ˜๋ฆฌ๊ณผ์ •์—์„œ ๋ฐ์ดํ„ฐ์˜ ์ฒ˜๋ฆฌ๊ฐ€ ์žฌ๋ฐฐ์น˜๋˜๋Š” ๊ฒƒ์„ shuffling์ด๋ผ ํ•˜๋Š”๋ฐ, shuffling์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์€ ๋ถ„์‚ฐ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ์— ๋งค์šฐ ์ค‘์š”ํ•œ ์  ์ž…๋‹ˆ๋‹ค. Adenium์˜ Stream ๋ฐ์ดํ„ฐ๋Š” Kafka Queue์— ์ €์žฅ๋˜์–ด ๊ฐ ๋ชจ๋“ˆ๊ฐ„ ์ „๋‹ฌ๋˜๋Š” Kafka Queue์— ๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ ๋ฐฐ์น˜ํ•  ๊ฒƒ์ธ๊ฐ€๊ฐ€ Shuffling์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐ์— ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

Kafka๋Š” ํ•˜๋‚˜์˜ ๊ฐœ๋…์ ์ธ Queue๋ฅผ Topic์ด๋ผ๋Š” ๋‹จ์œ„๋กœ ๊ด€๋ฆฌํ•˜๋ฉฐ, ๊ฐ Topic์€ Partition( 0... n )์ด๋ผ๊ณ  ํ•˜๋Š” ๋ถ„์‚ฐ์ฒ˜๋ฆฌ๋‹จ์œ„๋กœ ๊ตฌ๋ถ„๋ฉ๋‹ˆ๋‹ค. ์–ด๋–ค Data๋ฅผ ์–ด๋–ค Partition์— ๋ฐฐ์น˜(์ ์žฌ ๋‚ด์ง€ Ingest, allocate..)ํ•  ๊ฒƒ์ธ๊ฐ€๋ฅผ ์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•˜๋Š” ๋ชจ๋“ˆ์„ Partitioner ๋ผ๊ณ  ํ•˜๋ฉฐ Partitioner๊ฐ€ ์ง€์ •๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ, Kafka๋Š” Default Partitioner๋กœ Partition์„ ์ •ํ•ฉ๋‹ˆ๋‹ค.

Partitioning ๋กœ์ง

Adenium์€ Shuffling์„ ์ตœ์†Œ๋กœ ํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ๋ณธ์ ์œผ๋กœ๋Š” Framework์—์„œ Partition ์„ ๊ฐ•์ œ ์ง€์ •ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋™์ž‘์€ ์‚ฌ์šฉ ํ™˜๊ฒฝ์— ๋”ฐ๋ผ ๊ฐ•์ œ ์ง€์ •๊ณผ, Random ๋ถ„๋ฐฐ๋ฅผ ์„ ํƒ ์ ์œผ๋กœ ์ ์šฉ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์›์‹œ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์‚ฐํ•˜๋Š” ๋ณ„๋„์˜ Producer๊ฐ€ Partitioning๊ธฐ๋Šฅ์„ ์ œ๊ณต ํ•  ๊ฒฝ์šฐ partition์˜ ๊ฐ•์ œ ์ง€์ • ๊ธฐ๋Šฅ์„ ์ œ๊ฑฐ ํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Partitioning์€ Topic์˜ partition ๊ฐฏ์ˆ˜์— ์˜ํ–ฅ์„ ๋ฐ›์œผ๋ฉฐ ๋™์ผํ•œ partition key๋ฅผ ๊ฐ€์ง„ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋Š” ๊ธฐ๋ณธ 3๊ฐœ์˜ partition์— ๋ถ„๋ฐฐ ๋ฉ๋‹ˆ๋‹ค. ์ฒ˜๋ฆฌ๋œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ Topic์— ์ €์žฅ ํ•  ๊ฒฝ์šฐ์—๋„ ์ฒ˜์Œ ๋ถ„๋ฐฐ๋œ Partition์„ ์œ ์ง€ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ›„์† ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ๋‹ค๋ฅธ Framework๋กœ ํŒŒ์ดํ”„ ํ•  ๊ฒฝ์šฐ์—๋„ ๋™์ผํ•œ Partition์„ ์œ ์ง€ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Framework ๋™์ž‘ ์ œ์–ด

Adenium Framework๋Š” Runtime ์‹œ ๊ด€๋ฆฌ์ž์™€ Engine๊ฐ„์˜ Interactiveํ•œ Interface๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ด€๋ฆฌ์ž๋Š” Spark application์ด ๋™์ž‘ํ•˜๋Š” ๋„์ค‘ command๋ฅผ ์ „์†กํ•˜์—ฌ Framework์˜ ๋™์ž‘์„ ์ œ์–ด ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Command ์ „์†ก ์ฑ„๋„

๊ด€๋ฆฌ์ž์™€ Framework ์‚ฌ์ด์˜ Message Channel์€ Zookeeper์™€ Kafka๋ฅผ ์‚ฌ์šฉ ํ•ฉ๋‹ˆ๋‹ค. Kafka์˜ ํŠน์ • Topic์— command๋ฅผ ์ „์†กํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ Zookeeper ํŠน์ • Znode์˜ ๊ฐ’์„ ๋ณ€๊ฒฝํ•˜์—ฌ command๋ฅผ ์ „์†กํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Framework ๋‚ด์˜ Controller๋Š” ์ž…๋ ฅ๋œ command๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ์ง€์ •๋œ ๋™์ž‘์„ ์ˆ˜ํ–‰ ํ•˜๊ฑฐ๋‚˜, ์ฐธ์กฐ Data๋ฅผ Application์— Broadcast ํ•ฉ๋‹ˆ๋‹ค.

์ œ์–ด๋ช…๋ น ์ „๋‹ฌ ํ™•์ธ

Zookeeper znode์˜ ๊ฐ’์„ ๋ณ€๊ฒฝํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ช…๋ น์„ ์ „๋‹ฌํ•œ ๊ฒฝ์šฐ ์—”์ง„์€ ๋ช…๋ น์„ ์ฝ์–ด๊ฐ„ ํ›„ ํ•ด๋‹น node๊ฐ’์„ ์‚ญ์ œํ•ฉ๋‹ˆ๋‹ค. node์˜ ๊ฐ’์ด ์‚ญ์ œ๋  ๊ฒฝ์šฐ controller๊ฐ€ command๋ฅผ ์ˆ˜์‹  ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ œ์–ด๋ช…๋ น์ด ์ „๋‹ฌ๋˜๊ณ  ์ง€์ •๋œ ๋ช…๋ น์˜ ์ •์ƒ ๋™์ž‘ ์—ฌ๋ถ€๋Š” Spark application log๋ฅผ ํ†ตํ•ด ํ™•์ธ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Adenium Controller์˜ ๋™์ž‘

Controller๋Š” ์ž…๋ ฅ๋œ Command๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•˜์—ฌ Framework์˜ ๋ฉ”์ธ Thread์™€ ๋ถ„๋ฆฌ๋œ ๋ณ„๋„์˜ Thread๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ ์ƒ์„ฑ๋œ Thread๋Š” ์ผ๋ฐฉํ–ฅ Queue๋กœ ๊ต์‹  ํ•ฉ๋‹ˆ๋‹ค. ์ƒ์„ฑ๋œ Thread๋Š” zookeeper๋‚˜ kafka์— ๋„์ฐฉํ•œ command๋ฅผ ํ•ด์„ํ•˜์—ฌ ์ง€์ •๋œ ๋™์ž‘์„ ์ˆ˜ํ–‰ ํ•ฉ๋‹ˆ๋‹ค.

Command ๋™์ž‘ ๊ตฌ์กฐ
  1. ์—”์ง„ ๊ธฐ๋™์‹œ์— Command listening์„ ์œ„ํ•œ ๋ณ„๋„ Thread ์‹œ์ž‘
  2. (๋ณ„๋„ Thread๋Š”) Zookeeper ๋˜๋Š” Kafka๋กœ ๋„์ฐฉํ•œ command ๋ฉ”์‹œ์ง€๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ, ์ด๋ฅผ ํ•ด์„ํ•˜์—ฌ ํ•„์š”ํ•œ ์ฐธ์กฐ์ •๋ณด ๋ฅผ ๋‹ค์‹œ ์ฝ์–ด์˜ค๋Š” ๋“ฑ์˜ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ Frame work๋‚ด์˜ Queue์— push
  3. (๋ฉ”์ธ Thread๋Š”) micro-batch ๋งˆ๋‹ค Queue์— ์ƒˆ๋กœ ๋„์ฐฉํ•œ Message๊ฐ€ ์žˆ๋Š” ์ง€ ํ™•์ธํ•˜๊ณ , Message๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ, ํ•˜์œ„์˜ Worker์— Broadcast๋ฅผ ์ˆ˜ํ–‰
  4. Worker๋Š” Broadcast๋œ object๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ processing์„ ์ง„ํ–‰

1544415444443

  • Command listening์„ ์œ„ํ•œ Thread๋ฅผ ๋ณ„๋„๋กœ ๊ฒฉ๋ฆฌ ํ•จ์œผ๋กœ์จ DB, Zookeeper Node ๋“ฑ์œผ๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์–ด์˜ค๋Š” ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์ด ์ง€์—ฐ๋  ๊ฒฝ์šฐ Main Thread์˜ ์ค‘๋‹จ ์—†์ด ๋ฐ์ดํ„ฐ๋ฅผ Initialize ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Work์— Broadcast ๋˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ Message Queue๋ฅผ ํ†ตํ•ด ๊ณต์œ  ํ•จ์œผ๋กœ์จ Broadcast ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ immutable์„ ๋ณด์žฅ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ œ์–ด ๋ช…๋ น

์ œ์–ด๋ช…๋ น ๊ตฌ์กฐ

์ œ์–ด ๋ช…๋ น์€ ์—ฌ๋Ÿฌ ๋ช…๋ น์–ด๋ฅผ ๋ฌถ์–ด ์ „์†ก ํ•  ์ˆ˜ ์žˆ๋‹ค.

-command1 value1 value2 -command2 value3 value4

[Adenium Normalizer์˜ ์ œ์–ด ๋ช…๋ น์€ Normalizer ์ œ์–ด ๋ช…๋ น ์ฐธ์กฐ]

-load:[ํ•ญ๋ชฉ] ํ•ญ๋ชฉ์— ํ•ด๋‹นํ•˜๋Š” ์ •๋ณด๋ฅผ DataSource(DB)๋กœ ๋ถ€ํ„ฐ ๋กœ๋“œํ•˜์—ฌ Zookeeper์— ์ €์žฅ
-clean:[ํ•ญ๋ชฉ] ํ•ญ๋ชฉ์— ํ•ด๋‹นํ•˜๋Š” ์ •๋ณด๋ฅผ Zookeeper์—์„œ ์‚ญ์ œ
-update:[ํ•ญ๋ชฉ] ํ•ญ๋ชฉ์— ํ•ด๋‹นํ•˜๋Š” ์ •๋ณด๋ฅผ Zookeeper๋กœ ๋ถ€ํ„ฐ ๋กœ๋“œํ•˜์—ฌ Broadcast

๋ณต๊ตฌ๊ธฐ๋Šฅ (Restore)

Adenium Framework๋Š” ์žฅ์•  ๋˜๋Š” ํŠน์ • ์‹œ์ ์˜ Data ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ณต๊ตฌ ๊ธฐ๋Šฅ์„ ์ œ๊ณต ํ•ฉ๋‹ˆ๋‹ค. Adenium Framework๋Š” ๋งค ๋ฐฐ์น˜๋งˆ๋‹ค ์ฒ˜๋ฆฌํ•œ Topic์˜ Partition๋ณ„๋กœ Offset rage๋ฅผ ์ง€์ •๋œ Zookeeper node์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๋ณต๊ตฌ ๊ธฐ๋Šฅ์€ Kafka Topic์˜ Partition๋ณ„๋กœ ์ง€์ •๋œ offset ๋ฒ”์œ„ ๋ถ€ํ„ฐ Data๋ฅผ ์žฌ ์ฒ˜๋ฆฌ ํ•ฉ๋‹ˆ๋‹ค.

๋งŒ์•ฝ ์ €์žฅ๋œ offset ์ •๋ณด๊ฐ€ ์—†๊ฑฐ๋‚˜ ํ•ด๋‹น offsets๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์–ด๋“ค์ผ ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ ํ˜„์žฌ offset(์ตœ์ข… offset)๋ถ€ํ„ฐ ์ฒ˜๋ฆฌ ๋ฉ๋‹ˆ๋‹ค.

๋ณต๊ตฌ๊ธฐ๋Šฅ ํ™œ์„ฑ

๋ณต๊ตฌ๊ธฐ๋Šฅ์€ Framework ์‹œ์ž‘ ์‹œ -kf:restore ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ ํ™œ์„ฑํ™” ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ณต๊ตฌ์˜ ์ œํ•œ์‚ฌํ•ญ

kafka Topic ๋ฐ Partition์ด ๋ณ€๊ฒฝ๋˜์ง€ ์•Š์•„์•ผ ํ•˜๋ฉฐ, ๋ณต๊ตฌ์‹œ ์ตœ์ข… batch์—์„œ ์ฒ˜๋ฆฌํ–ˆ๋˜ ์ด๋ฒคํŠธ๋Š” ์ค‘๋ณต์ฒ˜๋ฆฌ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

State ์ •๋ณด (์„ค์ •, ์šด์˜์ •๋ณด)

Adenium์€ ๊ฐ์ข… ์„ค์ •๊ณผ ์šด์˜์ •๋ณด๋ฅผ Zookeeper๋ฅผ ํ†ตํ•ด ๊ด€๋ฆฌ ํ•ฉ๋‹ˆ๋‹ค. ์šด์˜ ์ •๋ณด๊ฐ€ 512 Kb ์ดˆ๊ณผ ํ•  ๊ฒฝ์šฐ ์‹œํ€€์Šคํ•œ Sub node ( 0, 1, 2 ... )๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ 512 Kb ๋‹จ์œ„๋กœ ๋‚˜๋ˆ„์–ด ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. Zookeeper znode ๊ตฌ์„ฑ์€ ์•„๋ž˜์™€ ๊ฐ™์€ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋ฉฐ, ๋ชจ๋“  ๋…ธ๋“œ๋Š” ์†Œ๋ฌธ์ž์™€ ๋‹จ์–ด ์‚ฌ์ด๋Š” "_" ๊ตฌ๋ถ„์ž๋ฅผ ์‚ฌ์šฉํ•œ ๋…ธ๋“œ ์ด๋ฆ„์„ ๊ฐ–์Šต๋‹ˆ๋‹ค.

* VersionRoot							: Framework ๋ฒ„์ „
*  |-- app								
*  |   |-- watch						: application ๋ณ„ ํ•˜์œ„ ๋…ธ๋“œ์—์„œ command ์ฑ„๋„๋กœ ์‚ฌ์šฉ
*  |   |   |-- [application name]		: Submit ์‹œ ์ง€์ •ํ•œ Application ๋ช…์œผ๋กœ ์ž๋™ ์ƒ์„ฑ
*  |   |-- offsets						: application ๋ณ„ kafka offset๋ฅผ ์ €์žฅ
*  |   |-- common               
*  |   |   |-- var_fields       		: ์ •๊ทœํ™” ํ•„๋“œ ์ •์˜
*  |   |-- normalizer					: ์ •๊ทœํ™” ์—”์ง„์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์šด์˜ ์ •๋ณด
*  |   |   |-- parser_log				: ์ •๊ทœํ™” ์—”์ง„์˜ log on/off ์ƒํƒœ ์ •๋ณด
*  |   |   |-- parser_ref				: ์ •๊ทœํ™” ์ฐธ์กฐ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ
*  |   |   |   |-- geo_ip_range			: ๊ตญ๊ฐ€ Ip
*  |   |   |   |-- company_ip_range		: ์žฅ๋น„ ์†Œ์œ ์ฃผ IP ๋Œ€์—ญ
*  |   |   |   |-- signatures			: Signatuares
*  |   |   |   |-- agents				: Agents
*  |   |   |   |-- tokenize_rules		: Tokenize rule
*  |   |   |   |-- arrange_rules		: Regex captuare order To Field Rule
*  |   |   |   |-- replace_fields		: ์ •๊ทœํ™” ๊ฒฐ๊ณผ๋ฅผ ๋‹ค๋ฅธ ๊ฐ’์œผ๋กœ ๋ณ€๊ฒฝํ•ด์•ผ ํ•˜๋Š” ๊ทœ ์น™
*  |   |   |   |-- company_ips			: ์žฅ๋น„ ์†Œ์œ ์ฃผ IP

Field ์ผ๋ฐ˜

Adenium์˜ ์ด๋ฒคํŠธ๋Š” ํ•ด์„๋œ ํ•„๋“œ๋“ค์˜ ์กฐํ•ฉ์ด๋ฉฐ, ํ•ด์„๋œ ํ•„๋“œ์˜ ์ง‘ํ•ฉ์„ ์ •๊ทœํ™”๋œ ์ด๋ฒคํŠธ๋ผ๊ณ  ์ •์˜ ํ•ฉ๋‹ˆ๋‹ค.

์ •๊ทœํ™” ๋Š” ์ด๋ฒคํŠธ๋ฅผ ๋ถ„๋ฅ˜ ํ•˜๋Š” ๊ทœ์น™ ๊ณผ ๋ถ„๋ฅ˜ ๋œ ํ•ญ๋ชฉ์ด ์–ด๋–ค ํ•„๋“œ์— ํ•ด๋‹นํ•˜๋Š” ์ง€๋ฅผ ํŒ์ •ํ•˜๋Š” ๊ทœ์น™ ( Arrange๊ทœ์น™)์— ๋”ฐ๋ผ 1์ฐจ ํ•ด์„ ๋˜์–ด์ง‘๋‹ˆ๋‹ค. 1์ฐจ ํ•ด์„๋œ ํ•„๋“œ๋กœ ๋ถ€ํ„ฐ ์•Œ๋ ค์ง„ ์ผ๋ฐ˜ ๊ทœ์น™์— ๋”ฐ๋ผ ํŒŒ์ƒ๋œ ์ผ๋ฐ˜ํŒŒ์ƒ์ •๋ณด (ex: ํŠน์ • IP์˜ ์†Œ์†๊ตญ๊ฐ€)์™€ ๊ด€๋ฆฌ์ •๋ณด์— ๋”ฐ๋ผ ํŒŒ์ƒ๋œ ์ฐธ์กฐํŒŒ์ƒ์ •๋ณด (ex: ํšŒ์‚ฌ์— ๋“ฑ๋ก๋œ IP๋ฒ”์œ„์— ๋”ฐ๋ผ ํšŒ์‚ฌ ๋‚ด๋ถ€ IP์ธ์ง€ ํŒ์ •), ๊ทธ๋ฆฌ๊ณ  ์œ ์‚ฌ์ •๋ณด๊ฐ€ ์น˜ํ™˜์ด ํ•„์š”ํ•  ๋•Œ ์–ด๋–ป๊ฒŒ ์น˜ํ™˜ํ•  ์ง€ ์ •ํ•˜๋Š” ์น˜ํ™˜์ •๋ณด ๋ฅผ ํ†ตํ•ด ํ•ด์„๋œ ํ›„, Folding๊ณผ์ •์„ ํ†ตํ•ด Merge๋˜์–ด ์™„๋ฃŒ ๋ฉ๋‹ˆ๋‹ค.

Field ๊ตฌ์„ฑ

Adenium ํ•„๋“œ๋Š” ํ•„๋“œ๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๊ณ ์œ  ID + Key name , Value๋กœ ๊ตฌ์„ฑ ๋ฉ๋‹ˆ๋‹ค. 1 ~ 49 ๋ฒˆ์€ Adenium์—์„œ ์„ ์ ํ•œ ์˜ˆ์•ฝ ํ•„๋“œ ID ์ž…๋‹ˆ๋‹ค. ํ•„๋“œ์˜ ํ™•์žฅ ์‹œ Adnium์—์„œ ์˜ˆ์•ฝ๋œ ๊ณ ์œ  ID์™€ Key name์€ ์‚ฌ์šฉ ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ •๊ทœํ™” ์—”์ง„์—์„œ๋Š” ๋ชจ๋“  Field๋ฅผ ๊ณ ์œ  ID๋กœ ์ฒ˜๋ฆฌ ํ•ฉ๋‹ˆ๋‹ค. ํ•„๋“œ์˜ Key name์€ key - value ์ €์žฅ ๋ฐฉ์‹( json )์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์ง€์› ๋ฉ๋‹ˆ๋‹ค.

์ •๊ทœํ™” ์ด๋ฒคํŠธ ์ €์žฅ ๊ตฌ์กฐ

Adenium์˜ ์ •๊ทœํ™” ์ด๋ฒคํŠธ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ Field ID + TAB + Value๋กœ ์ €์žฅ ๋ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ Json ๊ณผ ๊ฐ™์€ Key - value ํ˜•ํƒœ๋กœ ์ €์žฅ ํ•˜๊ฑฐ๋‚˜, ์ „์†ก์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ Field ID์™€ ๋งคํ•‘๋˜๋Š” Key name ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ €์žฅ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•„๋“œ์˜ ๊ตฌ๋ถ„

๊ตฌ๋ถ„ ๋‚ด์šฉ
์˜ˆ์•ฝํ•„๋“œ ํ•ด๋‹น ํ•„๋“œ๊ฐ€ ์–ด๋–ค ์˜๋ฏธ์ธ์ง€ ๋ฏธ๋ฆฌ ์˜ˆ์•ฝ๋˜์–ด ์žˆ๋Š” ํ•„๋“œ
์‚ฌ์šฉ์ž ์ •์˜ํ•„๋“œ (UDF) ์‚ฌ์šฉ์ž๊ฐ€ ์„ค์ •์œผ๋กœ ์ •์˜ํ•œ ํ•„๋“œ. ์‚ฌ์šฉ์ž์ •์˜ํ•„๋“œ๋ฅผ ํŠน์ • ์ด๋ฒคํŠธ์˜ ์ •๊ทœํ™” ๊ณผ์ •์—์„œ ์ถ”์ถœํ•˜๋ ค๋ฉด, ํŠน์ • Token์ด ์–ด๋–ค UDF-ID์— ํ•ด๋‹นํ•˜๋Š” ์ง€๋ฅผ Arrange์„ค์ •์œผ๋กœ ์ง€์ •ํ•œ๋‹ค.
์›๋ณธํ•„๋“œ ์›๋ณธ ์ด๋ฒคํŠธ์—์„œ ์ถ”์ถœํ•œ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ๋˜๋Š” ํ•„๋“œ
ํŒŒ์ƒํ•„๋“œ ์›๋ณธ ์ด๋ฒคํŠธ์—์„œ ์ถ”์ถœํ•œ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ƒ์„ฑํ•˜๋Š” ํ•„๋“œ ex) IP๋ฅผ ์ด์šฉํ•œ ๊ตญ๊ฐ€๋ช… ํ•„๋“œ
์น˜ํ™˜ํ•„๋“œ ํŠน์ •ํ•œ ์กฐ๊ฑด์— ํ•ด๋‹นํ•  ๊ฒฝ์šฐ, ์ง€์ •ํ•œ ๊ฐ’์œผ๋กœ ์น˜ํ™˜ํ•˜๋Š” ์น˜ํ™˜ ํ•„๋“œ

Field Type๊ณผ ๊ฐ’

Adneium ํ•„๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ 4๊ฐ€์ง€ Format์œผ๋กœ ๊ตฌ๋ถ„๋œ๋‹ค.

Type format
Type String String format
TypeNumeric Numeric format
TypeDateString Date String format
TypeDateMills Date Mills(Long Integer) format

์‚ฌ์ „ ์ •์˜๋œ ํ•„๋“œ

ID Key name ๋‚ด์šฉ ๊ตฌ๋ถ„
1 CATEGORY1 ๋Œ€ ๋ถ„๋ฅ˜ ํŒŒ์ƒ
2 CATEGORY2 ์ค‘ ๋ถ„๋ฅ˜ ํŒŒ์ƒ
3 CATEGORY3 ์†Œ ๋ถ„๋ฅ˜ ํŒŒ์ƒ
4 SIGNATURE Sensor ๋ณ„ ์ •์˜๋œ ์ด๋ฒคํŠธ Signature ํŒŒ์ƒ
5 SEVERITY ์‹ฌ๊ฐ๋„ ์›๋ณธ
6 COUNT ๊ณต๊ฒฉ ์นด์šดํŠธ ์›๋ณธ
7 REPEATCOUNT ๊ณต๊ฒฉ ๋ฐ˜๋ณต ์นด์šดํŠธ ์›๋ณธ
8 SRCIP ๊ณต๊ฒฉ์ง€ IP Address ์›๋ณธ
9 SRCPORT ๊ณต๊ฒฉ์ง€ PORT ์›๋ณธ
10 SRCMAC ๊ณต๊ฒฉ์ง€ MAC Address ์›๋ณธ
11 SRCCOUNTRY ๊ณต๊ฒฉ์ง€ ๊ตญ๊ฐ€์ฝ”๋“œ ํŒŒ์ƒ
12 DESTIP ๋ชฉ์ ์ง€ IP ์›๋ณธ
13 DESTPORT ๋ชฉ์ ์ง€ PORT ์›๋ณธ
14 DESTMAC ๋ชฉ์ ์ง€ MAC Address ์›๋ณธ
15 DESTCOUNTRY ๋ชฉ์ ์ง€ ๊ตญ๊ฐ€์ฝ”๋“œ ํŒŒ์ƒ
16 SRCDIRECTION ๊ณต๊ฒฉ์ง€ ํƒ์ง€ ๋ฐฉํ–ฅ ํŒŒ์ƒ
17 DESTDIRECTION ๋ชฉ์ ์ง€ ํƒ์ง€ ๋ฐฉํ–ฅ ํŒŒ์ƒ
18 URL URL ์›๋ณธ
19 URI URI ์›๋ณธ
20 URIPARAMS URI Parameter ์›๋ณธ
21 HEADER HTTP Header ์›๋ณธ
22 PROTOCOL ํ”„๋กœํ† ์ฝœ ์›๋ณธ
23 PAYLOAD Payload ์›๋ณธ
24 CODE Sensor ์˜ ์ƒํƒœ ์ฝ”๋“œ ์›๋ณธ
25 RCVDBYTES ์ˆ˜์‹  Bytes ์นด์šดํŠธ ์›๋ณธ
26 SENTBYTES ์ „์†ก Bytes ์นด์šดํŠธ ์›๋ณธ
27 MESSAGEID ๋ฉ”์‹œ์ง€ ID ์›๋ณธ
28 SRCZONE SRC Zone ์›๋ณธ
29 DESTZONE DEST Zone ์›๋ณธ
30 SERVICE Service ์›๋ณธ
31 DURATION Duration ์›๋ณธ
32 ACLNM ACL Name ์›๋ณธ
33 ACTION Allow/Deny, IPS Action ํŒŒ์ƒ
34 RAWDATA Raw Data ์›๋ณธ
35 SENDER ๋ฐœ์‹ ์ž ์›๋ณธ
36 ATTACHMENT ์ฒจ๋ถ€ ํŒŒ์ผ๋ช… ์›๋ณธ
37 START ATTACK TIME ๊ณต๊ฒฉ ์‹œ์ž‘ ์‹œ๊ฐ„ ์›๋ณธ
38 END ATTACK TIME ๊ณต๊ฒฉ ์ข…๋ฃŒ ์‹œ๊ฐ„ ์›๋ณธ
39 LOGTIME ๋กœ๊ทธ ์ˆ˜์ง‘ ์‹œ๊ฐ„ ์›๋ณธ
40 SYSLOGTIME Syslog ๋ฐœ์ƒ ์‹œ๊ฐ„ ์›๋ณธ
41 SYSLOGHOST System์ด ๋ฐœ์ƒ์‹œํ‚จ Syslog Host ์›๋ณธ
42 AGENTID Agent ID ํŒŒ์ƒ
43 AGENTIP Agent IP ํŒŒ์ƒ
44 COMPANYID ์†Œ์œ ์ž(๊ณ ๊ฐ์‚ฌ) ID ํŒŒ์ƒ
45 COMPANYNM ์†Œ์œ ์ž ๋ช… ํŒŒ์ƒ
46 COMPANYGROUPID ์†Œ์œ ์ž ๊ทธ๋ฃน ID ํŒŒ์ƒ
47 DEVICETYPE Sensor Type ํŒŒ์ƒ
48 DEVICEMODEL Sensor Model ๋ช… ํŒŒ์ƒ
49 VENDOR Sensor Vendor name ํŒŒ์ƒ

Adenium Normalizer

Adenium Normalizer ๋Š” Adenium framework๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ์ด๋ฒคํŠธ ์ •๊ทœํ™” Engine ์ด๋ฉฐ Framework + Parser๋กœ ๊ตฌ์„ฑ ๋ฉ๋‹ˆ๋‹ค.

Adenium Normalizer๋Š” Adenium framework์˜ ๊ตฌ์„ฑ์— ๋”ฐ๋ผ ๊ธฐ๋ณธ ์ž…์ถœ๋ ฅ ์†Œ์Šค๋กœ Kafka Queue๋ฅผ ์‚ฌ์šฉ ํ•˜๋ฉฐ ์ •๊ทœํ™”๋ฅผ ์œ„ํ•œ ๊ธฐ์ค€ ์ •๋ณด๋Š” Zookeeper๋กœ ๋ถ€ํ„ฐ ๋กœ๋“œ ๋ฉ๋‹ˆ๋‹ค. ์ •๊ทœํ™”๊ฐ€ ์‹คํŒจ ํ•œ ๋กœ๊ทธ๋Š” ๋ณ„๋„์˜ Kafka Topic์— ์ €์žฅํ•˜์—ฌ ๊ด€๋ฆฌ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Parser๋Š” Regular expression ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฒ”์šฉ์ ์ธ Tokenize method ๋ฅผ ์ง€์›ํ•˜๋ฉฐ Adenium framework์˜ Field ์ •์˜๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. Parser๋Š” ๋…๋ฆฝ ์‹คํ–‰ ๊ตฌ์กฐ๋กœ ์„ค๊ณ„๋˜์–ด ๋…๋ฆฝ ์‹คํ–‰ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋˜๋Š” ํŠน์ • Framework์˜ ํ•œ ๋ถ€๋ถ„์œผ๋กœ ๋™์ž‘ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Parser์˜ ๋™์ž‘

  1. Syslog format ํ˜•ํƒœ์˜ Event ๋ฅผ ์ž…๋ ฅ ๋ฐ›์•„ Regular expression์œผ๋กœ Tokenize ํ›„ ์ •๊ทœํ™” ํ•œ๋‹ค.

  2. Parser๋Š” Tokenize์— ํ•„์š”ํ•œ Regular expression๊ณผ ์ •๊ทœํ™” ์ž‘์—…์— ํ•„์š”ํ•œ ์ฐธ์กฐ๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•œ๋‹ค.

  3. Parser์˜ Default ์ž…/์ถœ๋ ฅ ์†Œ์Šค๋Š” Std In/Out์ด๋ฉฐ ์‚ฌ์šฉ ํ™˜๊ฒฝ์— ๋งž๊ฒŒ ๋ณ€๊ฒฝ ๊ฐ€๋Šฅํ•˜๋‹ค. ( ex : data base, file, stream, kafka ..)

  4. Parser๋Š” JVM 8.0 ์ด์ƒ์ด ์„ค์น˜๋œ ๋ชจ๋“  ํ™˜๊ฒฝ์—์„œ ๋™์ž‘ ํ•˜๋ฉฐ, ๋…๋ฆฝ๋œ ํ”„๋กœ์„ธ์Šค ๋˜๋Š” Spark์™€ ๊ฐ™์€ ๋ถ„์‚ฐ์ฒ˜๋ฆฌ ํ”„๋ ˆ์ž„์›Œํฌ ์ƒ์—์„œ ๋™์ž‘ ๊ฐ€๋Šฅํ•˜๋‹ค.

  5. Parser์˜ ์‹คํ–‰ ๊ณผ์ •

    Figure_1

Event ํ”„๋กœํ† ์ฝœ

Default ์ „์†ก ํ”„๋กœํ† ์ฝœ = Syslog

์ •๊ทœํ™” ๋Œ€์ƒ Event๋Š” Syslog format์˜ body ํ˜•ํƒœ๋กœ ์ „๋‹ฌ๋œ๋‹ค. ์ „์†ก ํ”„๋กœํ† ์ฝœ์˜ ์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ์€ ๋ถ„๋ฆฌ ์„ค๊ณ„ ๋˜์–ด ์žˆ์–ด ํ•„์š” ์‹œ ๋ณ„๋„์˜ ํ”„๋กœํ† ์ฝœ ์ฒ˜๋ฆฌ Layer์„ ์ถ”๊ฐ€ ํ•˜์—ฌ ๋ณ€๊ฒฝ ๊ฐ€๋Šฅํ•œ๋‹ค.

๊ตฌ๋ถ„ ๋‚ด์šฉ
Header Priority ์ด๋ฒคํŠธ๋ฅผ ๋ฐœ์ƒ์‹œํ‚จ Facility์™€ ์šฐ์„ ์ˆœ์œ„ ์ •๋ณด
DateTime ์ด๋ฒคํŠธ ๋ฐœ์ƒ์‹œ๊ฐ
Hostname ์ด๋ฒคํŠธ๋ฅผ ๋ฐœ์ƒ์‹œํ‚จ System์˜ Host ์ •๋ณด
Body ์ด๋ฒคํŠธ ๋‚ด์šฉ ( ์ •๊ทœํ™” ๋Œ€์ƒ )

์ฐธ์กฐ ๋ฐ์ดํ„ฐ

์ฐธ์กฐ ๋ฐ์ดํ„ฐ๋Š” ์ •๊ทœํ™” ์ž‘์—…์„ ์œ„ํ•œ ์ •๊ทœ์‹ ๊ณผ ๋ถ€๊ฐ€ ์ •๋ณด์ด๋ฉฐ Sensor ๋ณ„ ์ ์šฉ ํ•  ์ฐธ์กฐ ๋ฐ์ดํ„ฐ๋Š” Host name ์„ ๊ธฐ์ค€์œผ๋กœ ํŒ๋ณ„ํ•˜๋ฉฐ ๊ฐ ํ•ญ๋ชฉ์€ TAB์œผ๋กœ ๊ตฌ๋ถ„ํ•œ๋‹ค. ๊ธฐ๋ณธ ๊ตฌ๋ถ„๋ฌธ์ž๋Š” ๋ณ€๊ฒฝ ๊ฐ€๋Šฅํ•˜๋‹ค.

Agent

Agent๋Š” ํŠน์ •ํ•œ "Sensor + ์†Œ์œ ์ž ์ •๋ณด"๋ฅผ ํฌํ•จํ•œ ๊ฐœ๋…์ด๋‹ค. ์ฆ‰, ๊ฐ™์€ ์žฅ๋น„(Sensor)๋ผ๊ณ  ํ•˜๋”๋ผ๋„ ๋‹ค๋ฅธ ์†Œ์œ ์ž๊ฐ€ ์†Œ์œ ํ•  ๊ฒฝ์šฐ ๋‹ค๋ฅธ Agent๋กœ ๋ณธ๋‹ค.

๊ตฌ๋ถ„ Type ๋‚ด์šฉ
agentIp String ip ๋˜๋Š” Host name [syslog header : host]
agentId Long Agent๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๊ณ ์œ  ID
companyId Long Agent Owner์˜ ID
companyName String Owner ๋ช…
companyGroupId Long Owner ๊ทธ๋ฃน ID
sensorId Long Agent์— ๋“ฑ๋ก๋œ Sensor model ID
sensor String Sensor model ๋ช…
sensorType String Sensor model Type [ FW, WAF, DDOS, IPS, IDS ... ]
vendorId Long Sensor ์ œ์กฐ์‚ฌ ID
vendorName String ์ œ์กฐ์‚ฌ ๋ช…
active Boolean Agent ์‚ฌ์šฉ ์œ ๋ฌด

Data sample

192.168.0.1	1234	2425	SUNLEAF	77	68257	WebFront1	WF	26	PIOLINK	Y
Tokenization rule

Sensor ๋ณ„ Syslog body ๋ฅผ Tokenization ํ•˜๊ธฐ ์œ„ํ•œ Regular expression

๊ตฌ๋ถ„ Type ๋‚ด์šฉ
id Int Rule์„ ์‹๋ณ„ํ•˜๊ธฐ ์œ„ํ•œ ID
sensorId Long ์—ฐ๊ฒฐ๋œ Sensor ID
sensorType String ์—ฐ๊ฒฐ๋œ Sensor Type
regEx Regex

Data sample

100	68257	WF src\_ip=\"(.+?)\".+src\_port=\"(.+?)\".+dest\_ip=\"(.+?)\"
Arrange Rule

Tokenization ๊ฒฐ๊ณผ์™€ Normalization Field๋ฅผ ์—ฐ๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๊ทœ์น™

๊ตฌ๋ถ„ Type ๋‚ด์šฉ
tokenizeRuleId Int Tokenize Rule ID
captureOrder Int Regex match group sequence
fieldId Int ์ •๊ทœํ™” Field ID

Data sample

100 1 8
Company Ip

Agent๊ฐ€ ์„ค์น˜๋œ ์†Œ์œ ์ž IP์ •๋ณด, ์ •๊ทœํ™” ๋œ ๊ฒฐ๊ณผ์˜ Agent๋ฅผ ์ตœ์ข… ํŒ์ •ํ•˜๊ธฐ ์œ„ํ•œ ์ •๋ณด๋กœ ํ™œ์šฉ ๋œ๋‹ค.

๊ตฌ๋ถ„ Type ๋‚ด์šฉ
companyId Long Agent ์†Œ์œ ์ฃผ์˜ ID
publicIp String Agent์†Œ์œ ์ฃผ ์˜ ๊ณต์ธ IP
privateIp String Agent ์†Œ์œ ์ฃผ ์˜ ์‚ฌ์„ค IP

Data sample

2425	192.168.0.1	-
Company Ip Range

Company ip ์ •๋ณด์™€ ํ•จ๊ป˜ Agent๋ฅผ ์ตœ์ข… ํŒ์ •ํ•˜๊ธฐ ์œ„ํ•œ ๋ณด์กฐ ์ •๋ณด๋กœ ํ™œ์šฉ ๋œ๋‹ค.

๊ตฌ๋ถ„ Type ๋‚ด์šฉ
companyId Long Agent ์†Œ์œ ์ฃผ์˜ ID
sip String ๋Œ€์—ญ์˜ ์‹œ์ž‘ Ip
eip String ๋Œ€์—ญ์˜ ์ข…๋ฃŒ Ip

Data sample

 2425	192.168.0.1	192.168.1.254
Fields

์ •๊ทœํ™” ๊ฒฐ๊ณผ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ํ•„๋“œ ์ •์˜

๊ตฌ๋ถ„ Type ๋‚ด์šฉ
id Int Field id
fieldName String Feild name

Data sample

4	Signature
Replace Fields

์ •๊ทœํ™” ํ›„ ๊ฒฐ๊ณผ ๋‚ด์šฉ์˜ ๋ณ€๊ฒฝ์ด ํ•„์š”ํ•œ Field๋ฅผ ์ •์˜

๊ตฌ๋ถ„ Type ๋‚ด์šฉ
fieldId String ๋ณ€๊ฒฝ ๋Œ€์ƒField Id
Int outstr ๋ณ€๊ฒฝ ๋Œ€์ƒ ๋ฌธ์ž์—ด
instr String ๋ณ€๊ฒฝ ํ•  ๋ฌธ์ž์—ด
vendor Long ๋ณ€๊ฒฝ ๋Œ€์ƒ ์กฐ๊ฑด์ด ๋˜๋Š” Vendor id

Data sample

4	110600275	์‹œ์Šคํ…œ ํด๋” ์ ‘๊ทผ ์ทจ์•ฝ์ 	26
Signatures

Vendor ๋ณ„ Sensor Signature ์™€ ๋ฒ”์ฃผ ์ •์˜

๊ตฌ๋ถ„ Type ๋‚ด์šฉ
vendorId Long Vendor id
signature String Signature
category1 String ๊ตฌ๋ถ„ 1
category2 String ๊ตฌ๋ถ„ 2
category3 String ๊ตฌ๋ถ„ 3
category4 String ๊ตฌ๋ถ„ 4

Data sample

26	์‹œ์Šคํ…œ์ ‘๊ทผ ์ทจ์•ฝ์ 	Security	Exploit	Overflow\_Buffers	Web\_Application\_Vulnerability
GeoIp range

๊ตญ๊ฐ€๋ณ„ IP ๋ฒ”์œ„

๊ตฌ๋ถ„ Type ๋‚ด์šฉ
nationCdoe String ๊ตญ๊ฐ€ ์ฝ”๋“œ
startIp_dec Long Decimal format start Ip
endIp_dec Long Decimal format end Ip

Data sample

AU	16777216	16777471

์ •๊ทœํ™” ๊ทœ์น™์˜ ์ ์šฉ

Event ์˜ ํŒŒ์‹ฑ ๊ณผ ์ •๊ทœํ™” ๊ณผ์ •์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

์ •๊ทœํ™” ๊ธฐ๋Šฅ์˜ ๋™์ž‘

Figure_2

1. ์ „์†ก ํ”„๋กœํ† ์ฝœ ์ฒ˜๋ฆฌ

์ด๋ฒคํŠธ ๋‚ด์šฉ์„ ์ •๊ทœํ™” ํ•˜๊ธฐ ์œ„ํ•ด, ์–ด๋–ค ๊ทœ์น™์„ ์ ์šฉํ•  ๊ฒƒ์ธ์ง€ ํŒ๋ณ„์ด ํ•„์š”ํ•˜๋‹ค. ์ ์šฉํ•  ๊ทœ์น™ ํŒ๋ณ„์„ ์œ„ํ•ด host name ํ•„๋“œ์˜ ๊ฐ’์„ ์ „์†กํ”„๋กœํ† ์ฝœ( Syslog) ํ—ค๋”์—์„œ ์ถ”์ถœํ•œ๋‹ค.

2. ํ›„๋ณด Agent ๋ชฉ๋ก ์ถ”์ถœ ๋ฐ ์ •๊ทœํ™” ๊ทœ์น™ ์ถ”์ถœ

host name์€ ์ฐธ์กฐ ์ •๋ณด [Agent] ์˜ agentIp ํ•„๋“œ์™€ ๋Œ€์‘ ๋˜๋ฉฐ Event ์— ์ ์šฉํ•  ์ •๊ทœํ™” ๊ทœ์น™์„ ์„ ํƒํ•˜๋Š” ๊ธฐ์ค€์ด ๋œ๋‹ค. ํ›„๋ณด Agent๋กœ ํ‘œํ˜„ํ•œ ์ด์œ ๋Š” ํ•˜๋‚˜์˜ Hostname์„ ๊ณต์œ ํ•˜๋Š” ๋ณต์ˆ˜์˜ Agent๊ฐ€ ์กด์žฌ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

3. Parsing
  1. Multiple-Regular Expression ๊ณผ Tokenize

ํ›„๋ณด Agent ์˜ SensorId์— ํ•ด๋‹น ํ•˜๋Š” ๋ชจ๋“  ์ •๊ทœ์‹์„ Tokeniz Rules ์—์„œ ์„ ํƒํ•˜์—ฌ ์ ์šฉ ํ•œ๋‹ค. ๋™์ผํ•œ Sensor์—์„œ ๋ฐœ์ƒํ•œ Event๋“ค ์ด๋ผ๊ณ  ํ•˜๋”๋ผ๋„, ๊ฐ๊ธฐ ๋‹ค๋ฅธ Regular expression์œผ๋กœ Tokenize ๋  ์ˆ˜ ์žˆ๊ณ , host name์„ ๊ณต์œ ํ•˜๋Š” ๊ฒฝ์šฐ ๋ณต์ˆ˜์˜ ํ›„๋ณด Agent์— ์—ฐ๊ฒฐ๋œ sensor์˜ ์ข…๋ฅ˜๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์–ด Multiple-Regular Expression ์„ ์ง€์›ํ•œ๋‹ค.

Multiple-Regular Expression ์ œํ•œ ๋ฐ ์ฃผ์˜ ๊ฐ™์€ ์žฅ๋น„์—์„œ ๋ฐœ์ƒํ•œ Event๋ผ๊ณ  ํ•˜๋”๋ผ๋„, sensor version, sensor ์„ค์ • ๊ฐ’์— ์˜ํ•ด ์ „์†ก๋˜๋Š” Event Format์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, host name ์„ ๊ณต์œ ํ•˜๋Š” ๊ฒฝ์šฐ (๊ณต์œ  ์žฅ๋น„, ๊ณต์œ ์กด) ๋“ฑ๋ก๋œ Sensor์˜ ์ข…๋ฅ˜๊ฐ€ ๋ณต์ˆ˜๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ Parser๋Š” ์„ ํƒ๋œ ์ •๊ทœ์‹์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ ์šฉํ•ด ๋ณด๊ณ , ๊ทธ์ค‘ ๋จผ์ € ์„ฑ๊ณตํ•œ ๊ทœ์น™์œผ๋กœ Tokenize๊ฐ€ ๋œ๋‹ค๊ณ  ๋ณธ๋‹ค. ๋•Œ๋ฌธ์— ์ˆœ์ฐจ ์ ์šฉ๋˜๋Š” ์ •๊ทœํ‘œํ˜„์‹์ด ํŠน์ •ํ•œ ์ด๋ฒคํŠธ์— ๋Œ€ํ•ด ๋ชจ๋‘ ์„ฑ๊ณต์ ์œผ๋กœ Tokenize๋˜์ง€ ์•Š๋„๋ก ์ฃผ์˜ํ•ด์•ผ ํ•œ๋‹ค. ํŠนํžˆ, .* ์™€ ๊ฐ™์ด ๋ชจ๋“  ์ด๋ฒคํŠธ์— ์„ฑ๊ณต๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ทœ์น™์€ ์ ์šฉ ์ˆœ์„œ์— ๋”ฐ๋ผ side-effect๊ฐ€ ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋งค์šฐ ๋†’๋‹ค.

  1. Arrange Result

    Tokenize๊ฐ€ ์„ฑ๊ณตํ•  ๊ฒฝ์šฐ, Tokenize ๋œ ๊ฒฐ๊ณผ๋ฅผ [Arrange] ๊ทœ์น™์„ ์ด์šฉํ•˜์—ฌ ์–ด๋–ค Token์ด ์–ด๋–ค ํ•„๋“œ์— ํ•ด๋‹นํ•˜๋Š” ์ง€ List of Tuple ( field ID => String) ํ˜•ํƒœ์˜ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

4. ๊ณต์œ ์žฅ๋น„, ๊ณต์œ ์กด์˜ ์†Œ์œ ์ž ํŒ์ •

๊ณต์œ ์žฅ๋น„, ๊ณต์œ ์กด ์€ ๋ณต์ˆ˜์˜ Sensor๊ฐ€ ํ•˜๋‚˜์˜ ์žฅ๋น„๋ฅผ ๊ณต์œ ํ•˜์—ฌ ๋กœ๊ทธ๋ฅผ ์ „์†ก ํ•˜๋Š” ๊ฒฝ์šฐ, ๋“ฑ๋ก๋œ 1๊ฐœ์˜ Agent๋กœ ๋ณต์ˆ˜์˜ Sensor ์ด๋ฒคํŠธ๊ฐ€ ์œ ์ž…๋œ๋‹ค. ๋ณต์ˆ˜์˜ Sensor๋Š” ๊ฐ๊ฐ์˜ ์†Œ์œ ์ฃผ๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์–ด Event์˜ ์› ์†Œ์œ ์ฃผ๋ฅผ ํŒ๋ณ„ ํ•˜์—ฌ์•ผ ํ•œ๋‹ค.

  1. Company Ip ๋น„๊ต

์ •๊ทœํ™” ์ดํ›„ agent ip ํ•„๋“œ๋ฅผ ๋“ฑ๋ก๋œ Company IP์™€ ๋น„๊ตํ•˜์—ฌ ๋งค์นญ๋œ ํ•ด๋‹น Company๋กœ ๊ฒฐ์ •ํ•œ๋‹ค.

  1. Company Ip Range๋กœ ๋น„๊ต

    1์˜ ๊ณผ์ •์—์„œ Company IP๋กœ ํŒ๋ณ„ํ•˜์ง€ ๋ชป ํ•˜์˜€์„ ๊ฒฝ์šฐ ์ •๊ทœํ™” ํ•„๋“œ์˜ SrcIP, DestIP ํ•„๋“œ๋ฅผ Compant Ip Range์™€ ๋น„๊ตํ•˜์—ฌ ๋งค์นญ๋œ Company๋กœ ๊ฒฐ์ • ํ•œ๋‹ค.

  2. Company๋ฅผ ๊ฒฐ์ •ํ•˜์ง€ ๋ชป ํ•˜์˜€์„ ๊ฒฝ์šฐ

    ์˜ˆ๋ฅผ ๋“ค์–ด Company IP ํ…Œ์ด๋ธ”์˜ ๊ด€๋ฆฌ ์ด์Šˆ๋กœ, IP Range๊ฐ€ ๊ฒน์น  ๊ฒฝ์šฐ, ๋˜๋Š” ๋™์ผํ•œ Sensor ์ •๋ณด๋ฅผ ๊ฐ€์ง„ ๋ณต์ˆ˜์˜ Agent๊ฐ€ ๋ฐœ๊ฒฌ ๋˜์—ˆ์„ ๊ฒฝ์šฐ์—๋Š” ๊ฐ€์žฅ ์ตœ๊ทผ์— ๋“ฑ๋ก๋œ Agent๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์žฅ๋น„์˜ ์†Œ์œ ์ฃผ๋ฅผ ๊ฒฐ์ • ํ•œ๋‹ค.

5. ๊ฒฐ๊ณผ์ƒ์„ฑ

ํŒŒ์‹ฑ ๊ณผ์ •์ด ์™„๋ฃŒ๋˜๋ฉด Tokenize ๊ฒฐ๊ณผ๋กœ ์ •๊ทœํ™” ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ •๊ทœํ™” ๊ฒฐ๊ณผ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ํ•„๋“œ๋Š” ์ •๊ทœํ™” Library๊ฐ€ ์–ด๋–ค ์˜๋ฏธ์˜ ํ•„๋“œ์ธ์ง€๋ฅผ ์•Œ๊ณ  ์žˆ๋Š” ์ง€ ์—ฌ๋ถ€์— ๋”ฐ๋ผ โ€œ์˜ˆ์•ฝํ•„๋“œโ€์™€ โ€œ์‚ฌ์šฉ์ž ์ •์˜ ํ•„๋“œโ€๋กœ ๊ตฌ๋ถ„๋˜๋ฉฐ, ์ด๋ฒคํŠธ์—์„œ ์ถ”์ถœ๋œ ํ•„๋“œ**(์›๋ณธํ•„๋“œ)์™€ ์ด๋ฒคํŠธ์—์„œ ์ถ”์ถœํ•œ ์ •๋ณด๋ฅผ ์ฐธ์กฐ๋กœ ์ƒ์„ฑํ•œ ์ •๋ณดํ•„๋“œ(ํŒŒ์ƒํ•„๋“œ)**, ์›๋ณธ ๋˜๋Š” ํŒŒ์ƒํ•„๋“œ๊ฐ€ ํŠน์ •ํ•œ ์กฐ๊ฑด์— ํ•ด๋‹นํ•  ๊ฒฝ์šฐ, ์น˜ํ™˜ํ•˜๋Š” **(์น˜ํ™˜ํ•„๋“œ)**๋กœ ๊ตฌ๋ถ„๋œ๋‹ค.

  1. ํ•„๋“œ์˜ ๊ตฌ๋ถ„

    Field ์ผ๋ฐ˜ ์ฐธ์กฐ

  2. ํ•„๋“œ์˜ ์น˜ํ™˜

    ์ •๊ทœํ™” ๋œ ํ•„๋“œ๊ฐ€ ์น˜ํ™˜ ํ•„๋“œ ์ •๋ณด์— ์ •์˜๋œ ํŠน์ • ์กฐ๊ฑด (ํ•„๋“œ ID, ํŠน์ • Vendor )์„ ๋งŒ์กฑ ํ•  ๊ฒฝ์šฐ Tokenize ๋œ ๋ฌธ์ž์—ด์„ ์ง€์ •๋œ ๋ฌธ์ž์—ด๋กœ ๋ณ€๊ฒฝํ•œ๋‹ค ( instr => outstr )

  3. ์ •๊ทœํ™” ๊ฒฐ๊ณผ ์ƒ์„ฑ

    ์น˜ํ™˜ ๊ณผ์ •์ด ์™„๋ฃŒ๋œ ์ „์ฒด ํ•„๋“œ์˜ ๊ฒฐ๊ณผ๋ฅผ ์ •๊ทœํ™” ๊ฒฐ๊ณผ๋กœ ์ƒ์„ฑํ•œ๋‹ค

  4. ์‚ฌ์ „ ์ •์˜๋œ ํ•„๋“œ

    Field ์ผ๋ฐ˜ ์ฐธ์กฐ

Logging์„ ํ†ตํ•œ ์ •๊ทœํ™” ๊ณผ์ • ํ™•์ธ

Regular Expression ๊ธฐ๋ฐ˜ ์ •๊ทœํ™” ๋ชจ๋“ˆ์€ ์„ค์ • ๊ธฐ๋ฐ˜์œผ๋กœ ๋™์ž‘ํ•˜๋ฏ€๋กœ, ์„ค์ • ๊ฐ’์˜ ๋ณ€๊ฒฝ์— ๋”ฐ๋ผ ๊ฐ๊ธฐ ๋‹ค๋ฅธ ์ •๊ทœํ™” ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. Library ์‚ฌ์šฉํ™˜๊ฒฝ์—์„œ ์›ํ•˜๋Š” (๋˜๋Š” ๊ธฐ๋Œ€ํ•˜์ง€ ์•Š์€) ์ •๊ทœํ™” ๊ฒฐ๊ณผ๊ฐ€ ์–ด๋–ค ๊ณผ์ •์„ ํ†ตํ•ด ๋‚˜์˜ค๊ฒŒ ๋œ ๊ฒƒ์ธ์ง€ ๋‹จ๊ณ„๋ณ„๋กœ ์ถ”์ ํ•˜๋Š” ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค.

์ •๊ทœํ™” Library๋Š” ์ •๊ทœํ™” ๊ณผ์ •์˜ ๊ธฐ๋ก์„ ์ˆœ์ฐจ์ ์œผ๋กœ ๊ธฐ๋กํ•œ ๋กœ๊ทธ๋ฅผ ์„ค์ •์— ๋”ฐ๋ผ on/off ํ•  ์ˆ˜ ์žˆ๋‹ค. Logging Option์„ enableํ•  ๊ฒฝ์šฐ, ์ •๊ทœํ™” ๊ณผ์ •์˜ ๊ธฐ๋ก์„ ์ˆœ์ฐจ์ ์œผ๋กœ ํฌํ•จํ•˜๋Š” List of String์„ ์ •๊ทœํ™”์˜ ๊ฒฐ๊ณผ์™€ ํ•จ๊ป˜ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

Normalizer์˜ ์‹œ์ž‘

Normalizer๋Š” Spark standalone ๋ชจ๋“œ๋กœ ๋™์ž‘ํ•˜๋ฉฐ Spark submit ๋ช…๋ น์„ ํ†ตํ•ด ์‹คํ–‰ ๊ฐ€๋Šฅ ํ•ฉ๋‹ˆ๋‹ค.

์‹คํ–‰ ๋ช…๋ น

spark-submit --master [SPARK_MASTER_NODE] --deploy-mode client --supervise --class com.adenium.app.logNormalizer.LogNormalizer \
--driver-java-options "-Dlog4j.configuration=file:log4j.properties -Ddm.logging.name=Normalizer -Ddm.logging.path=logs" \
--jars [DEPENDENCY_LIB_PATH] \
--conf "spark.streaming.blockInterval=100ms" \
--conf "spark.locality.wait=100ms" \
--conf "spark.executor.logs.rolling.strategy=size" \
--conf "spark.executor.logs.rolling.maxSize=100000" \
--conf "spark.executor.logs.rolling.maxRetainedFiles=5" \
--conf "spark.streaming.backpressure.enabled=true" \
--conf "spark.streaming.kafka.maxRatePerPartition=4305" \
--conf "spark.executor.heartbeatInterval=20" \
--total-executor-cores [TOTAL_EXECUTOR_CORES] --executor-memory [EXECUTOR_MEMORY] \
--driver-memory [DRIVER_MEMORY] --name Normalizer [ADENIUM_NORMALIZER_PATH] \
-sp:master [SPARK_MASTER_NODE] \
-sp:app Normalizer \
-zk:conn [ZK_CONN] \
-sp:duration [SP_DURATION] \
-kf:broker [KAFKA_BROKERS] \
-kf:topic [IN_TOPIC] \
-kf:out_topic [OUT_TOPIC] \
-kf:err_topic [ERR_TOPIC] \
-kf:ctrl [CTRL_TOPIC] \
-kf:save \
-kf:restore

์‹คํ–‰ ์˜ต์…˜

  1. Spark submit ์˜ต์…˜ : stand alone ๋ชจ๋“œ๋กœ ๋™์ž‘ํ•˜๋Š” ๊ธฐ๋ณธ submit ์˜ต์…˜์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

    ์˜ต์…˜ ์„ค๋ช…
    --master [SPARK_MASTER_NODE] Spark master node Host
    --jars [DEPENDENCY_LIB_PATH] Dependenct libs path
    --total-executor-cores [TOTAL_EXECUTOR_CORES] executor์— ํ• ๋‹น ํ•  ์ฝ”์–ด ์ˆ˜
    --executor-memory [EXECUTOR_MEMORY] executor์— ํ• ๋‹น ํ•  ๋ฉ”๋ชจ๋ฆฌ
    --driver-memory [DRIVER_MEMORY] driver์— ํ• ๋‹น ํ•  ๋ฉ”๋ชจ๋ฆฌ
    --name Normalizer [ADENIUM_NORMALIZER_PATH] Adenium Normalizer bin path
  2. Adnium Normalizer ์˜ต์…˜

    ์˜ต์…˜ ์„ค๋ช… format
    -sp:master [SPARK_MASTER_NODE] Spark master node Host spark://localhost:7077
    -sp:app Normalizer Application name Normalizer
    -zk:conn [ZK_CONN] Zookeeper Host spark://localhost:7077
    -sp:duration [SP_DURATION] micro-batch duration ( Sec ) 4
    -kf:broker [KAFKA_BROKERS] kafka broker list localhost:9092, host1:9092
    -kf:topic [IN_TOPIC] ์›๋ณธ ๋กœ๊ทธ ์œ ์ž… ํ† ํ”ฝ rawlog_Topic
    -kf:out_topic [OUT_TOPIC] ์ •๊ทœํ™” ๋กœ๊ทธ ์ €์žฅ ํ† ํ”ฝ normalized_Topic
    -kf:err_topic [ERR_TOPIC] ์‹คํŒจ ๋กœ๊ทธ ์ €์žฅ ํ† ํ”ฝ fail_topic
    -kf:ctrl [CTRL_TOPIC] ์ œ์–ด๋ช…๋ น ์ˆ˜์‹  ํ† ํ”ฝ ctrl_topic
    -kf:save ์ตœ์ข… ์˜คํ”„์…‹ ์ €์žฅ ์—ฌ๋ถ€ ๊ฐ’ ์—†์ด ์„ ์–ธ๋งŒ
    -kf:restore ๋ณต๊ตฌ ๋ชจ๋“œ ๊ฐ’ ์—†์ด ์„ ์–ธ๋งŒ

Normalizer Logging

Adenium Normalizer๋Š” parser logging ์˜ต์…˜์„ ํ™œ์„ฑํ™” ํ•˜์—ฌ ์ด๋ฒคํŠธ๊ฐ€ ์ •๊ทœํ™” ๋˜๋Š” ๊ณผ์ •์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. logging ์˜ต์…˜์€ Run time์‹œ command ๋ฉ”์„ธ์ง€๋ฅผ ํ†ตํ•ด ์ „๋‹ฌ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์˜ต์…˜ ์ „๋‹ฌ๋ฐฉ๋ฒ•

    State ์ •๋ณด (์„ค์ •, ์šด์˜์ •๋ณด)์— ๊ธฐ์ˆ ๋œ znode path ์ค‘ parser_log ๋…ธ๋“œ์— ๋‹ค์Œ command๋ฅผ ์ž…๋ ฅ ํ•ฉ๋‹ˆ๋‹ค.

    -logon

  • Framework์˜ ๋‹ค๋ฅธ ์ œ์–ด ๋ช…๋ น๊ณผ ๋‹ฌ๋ฆฌ -logon ๋ช…๋ น์€ Engine์ด ์ˆ˜์‹ ์‹œ ๊ฐ’์„ ์‚ญ์ œ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  • ๋กœ๊ทธ์˜ ํ™•์ธ์€ Spark executor ์˜ work ๋กœ๊ทธ๋ฅผ ํ†ตํ•ด ํ™•์ธ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. log ์„ค๋ช…์€ "Logging์„ ํ†ตํ•œ ์ •๊ทœํ™” ๊ณผ์ • ํ™•์ธ" ๋ถ€๋ถ„์„ ์ฐธ๊ณ  ํ•˜์‹ญ์‹œ์˜ค

Build ๋ฐ ํ™˜๊ฒฝ

Build ๋ฐ ๋™์ž‘ ํ™˜๊ฒฝ

๋™์ž‘ํ™˜๊ฒฝ

Java SE Runtime Environment 8

Build script

pom.xml

Maven ๋นŒ๋“œ

  • Normalizer Framework
mvn -clean package -P Normalizer  // with out dependencies.
mvn -clean install -P Normalizer  // with dependencies.
  • Parser Tester
mvn -clean package -P Parser  // with out dependencies.
mvn -clean install -P Parser  // with dependencies.

Library dependencies

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>

    <!-- Apach Spark -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.2.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.2.1</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka_2.11</artifactId>
        <version>1.6.3</version>
    </dependency>


    <!-- Apach Kafka -->
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka_2.11</artifactId>
        <version>0.8.2.1</version>
    </dependency>

    <!-- Apach Curator -->
    <dependency>
        <groupId>org.apache.curator</groupId>
        <artifactId>curator-framework</artifactId>
        <version>2.5.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.curator</groupId>
        <artifactId>curator-client</artifactId>
        <version>2.5.0</version>
    </dependency>

    <!-- Apach Zookeeper -->
    <dependency>
        <groupId>org.apache.zookeeper</groupId>
        <artifactId>zookeeper</artifactId>
        <version>3.4.5</version>
        <type>pom</type>
    </dependency>

    <!-- Log4J -->
    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
    </dependency>
</dependencies>

Parser Tester

์ •๊ทœํ™” ๊ณผ์ •์„ ํ…Œ์ŠคํŠธ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ณ„๋„์˜ Tool์„ ์ œ๊ณตํ•œ๋‹ค. Test tool์€ console ํ™˜๊ฒฝ์—์„œ ๋™์ž‘ ๊ฐ€๋Šฅํ•œ Utility์ด๋ฉฐ ๋‹จ๋… ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ Jar ํŒจํ‚ค์ง€๋กœ ์ œ๊ณต๋œ๋‹ค. ์ •๊ทœํ™” ๊ณผ์ •์— ํ•„์š”ํ•œ ์ฐธ์กฐ ๋ฐ์ดํ„ฐ๋Š” ํŒŒ์ผ๋กœ ์ œ๊ณต๋˜๋ฉฐ, ๋กœ๊ทธ๋Š” Std-In์œผ๋กœ ์ž…๋ ฅ ๋ฐ›์œผ๋ฉฐ ์ •๊ทœํ™” ๊ฒฐ๊ณผ๋Š” Std-Out์œผ๋กœ ์ถœ๋ ฅ ๋œ๋‹ค.

Test tool์˜ ๋™์ž‘ ๊ณผ์ •

img

Sample Dataset

๊ฐ๊ฐ์˜ ์ฐธ์กฐ ์ •๋ณด ํŒŒ์ผ์€ Tab์œผ๋กœ ๊ตฌ๋ถ„๋œ text ํŒŒ์ผ์ด๋ฉฐ 1 ํ–‰์€ ํ•ญ๋ชฉ์˜ Header ์ด๋‹ค. ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ ์€ UTF-8 ํ˜•์‹ ์ด๋‹ค. ์ƒ˜ํ”Œ๋กœ๊ทธ๋Š” ํ•˜๋‚˜์˜ ๋ผ์ธ์ด ํ•˜๋‚˜์˜ ๋กœ๊ทธ์ด๋ฉฐ ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ ์€ UTF-8 ํ˜•์‹ ์ด๋‹ค.

Samples : ํ…Œ์ŠคํŠธ์šฉ ์ƒ˜ํ”Œ Event log

RefData : ์ •๊ทœํ™” ์ฐธ์กฐ ๋ฐ์ดํ„ฐ

  • agentInfo.ref : Agent ์ •๋ณด
  • arrangeRules.ref : Tokenize ๊ฒฐ๊ณผ๋ฅผ Normalization Field๋กœ ๋ณ€๊ฒฝํ•˜๋Š” ๊ทœ์น™
  • companyIpRange.ref : ์†Œ์œ ์ฃผ์˜ IP ๋ฒ”์œ„
  • companyServerIp.ref : ์†Œ์œ ์ฃผ์˜ IP ์ •๋ณด
  • fields.ref : ์ •๊ทœํ™” ํ•„๋“œ
  • geoIpRange.ref : ๊ตญ๊ฐ€ IP Band
  • replaceFields.ref : ๋ณ€๊ฒฝ ํ•„๋“œ
  • signatures.ref : Sensor Signature
  • tokenizeRules.ref : ์ •๊ทœ์‹
Test

ํ…Œ์ŠคํŠธ ํ”„๋กœ๊ทธ๋žจ์€ ์‹คํ–‰ ์‹œ โ€œ|โ€ (ํŒŒ์ดํ”„) ๋ฅผ ์ด์šฉํ•˜์—ฌ Event ๋กœ๊ทธ๋ฅผ ์ „๋‹ฌํ•˜๊ฑฐ๋‚˜, ํ”„๋กœ๊ทธ๋žจ ์‹คํ–‰ ํ›„ Console์—์„œ ์ง์ ‘ ์ž…๋ ฅ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

์‹คํ–‰ ์˜ต์…˜

-path [ File path ] : ์ง€์ •ํ•œ ๊ฒฝ๋กœ์—์„œ ์ฐธ์กฐ ์ •๋ณด๋ฅผ ๋กœ๋“œ ํ•œ๋‹ค. Default : ../resources

-logon : ๋ณ„๋„์˜ ์˜ต์…˜ ๊ฐ’ ์—†์ด ์˜ต์…˜๋งŒ ์„ ์–ธํ•˜์—ฌ ๋กœ๊ทธ ๊ธฐ๋Šฅ์„ ํ™œ์„ฑํ™” ํ•œ๋‹ค. โ€“logon ์˜ต์…˜์„ ์ž…๋ ฅํ•˜์ง€ ์•Š์œผ๋ฉด ๋กœ๊ทธ๋Š” ์ถœ๋ ฅํ•˜์ง€ ์•Š๋Š”๋‹ค.

์‹คํ–‰
  1. Run

    ์‹คํ–‰ ํ›„ Event ๋กœ๊ทธ๋ฅผ ์ž…๋ ฅ ํ•˜๊ฑฐ๋‚˜, ์‹คํ–‰ ์‹œ "|"๋ฅผ ํ†ตํ•ด Event ๋กœ๊ทธ๋ฅผ ์ „๋‹ฌ ํ•  ์ˆ˜ ์žˆ๋‹ค.

    > java -jar AdeniumParser.jar -path Resource\RefData
    
    > cat sample.log | java -jar AdeniumParser.jar -path Resource\RefData
    
  2. ๋กœ๊ทธ ์ž…๋ ฅ

    Event ๋กœ๊ทธ ์—†์ด ์‹คํ–‰ ํ•˜์˜€์„ ๊ฒฝ์šฐ, Command์— ๋กœ๊ทธ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ์ •๊ทœํ™”๋ฅผ ์‹œ์ž‘ํ•œ๋‹ค.

    > java -jar AdeniumParser.jar -path Resource\RefData
    Ref Files From : Resource\RefData\
    <44>Aug  7 17:49:53 192.169.0.1 (warning) kernel: [WEBFRONT/0x00726001] Violated SQL Injection - the form field isn't allowed. (log_id="2085090068",app_name="07_purunetedu",app_id="7",src_if="waf",src_ip="116.46.237.195",src_port="52453",dest_ip="211.169.244.10",dest_port="80",forwarded_for="",hos
    t="www.purunetedu.com",url="/mystudyroom/questionbank/classlistAjax.prn",sig_warning="Middle",url_param="",block="no",evidence_id="1910044145",owasp="A1",field="subjCd",sigid="110600275",data="0-0-0")
    
  3. ์ •๊ทœํ™”

    ์ •๊ทœํ™” ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด adenium Header ์ดํ›„ Tab ์œผ๋กœ ๊ตฌ๋ถ„๋œ key, value ํ˜•ํƒœ๋กœ ์ถœ๋ ฅ ๋œ๋‹ค.

    adenium 17      OUT     16      OUT     15      KR      11      KR      40      7 Aug 17:49:53  41  211.169.244.2       6       1       7       1       47      WF      48      WebFront K2400  49      PIOLINK 42      12346776        46      77      44      2425    45      SUNLEAF 4       ์‹œ์Šคํ…œ ํด๋”
    ์ ‘๊ทผ ์ทจ์•ฝ์      8       116.46.237.195  18      www.purunetedu.com      12      211.169.244.10  13  80  9       52453   33      no
    
  4. ์ข…๋ฃŒ

    Black line ์ž…๋ ฅ ๋˜๋Š” q, quit, bye, exit ์ž…๋ ฅ

  5. Logging

    -logon ์˜ต์…˜์„ ์ •์˜ํ•˜๋ฉด ์ •๊ทœํ™” ๊ณผ์ •์˜ ๋กœ๊ทธ๋ฅผ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค.

    > java -jar AdeniumParser.jar -path Resource\RefData -logon
    
    [ State ] ========== parser.execute : success ? true
     fields log[ Syslog: RFC3164 ] Header3164(44,Aug,7,17:49:53,211.169.244.2)
    [ SOCDeviceTypeHint ] 211.169.244.2
    [ filterAgents ] host ( normal : 211.169.244.2 )
    [ lookupDeviceRules ] ( type, name) = (1234568257,WF)
    [ tryTokenizeRules ] (WF,1234568257) in (WF,1234568257)
    [ findArrangeRule ] 100
    [ arrangeResult ] : (1,8,Some(116.46.237.195))
    [ arrangeResult ] : (2,9,Some(52453))
    [ arrangeResult ] : (3,12,Some(211.169.244.10))
    [ arrangeResult ] : (4,13,Some(80))
    [ arrangeResult ] : (5,18,Some(www.purunetedu.com))
    [ arrangeResult ] : (6,33,Some(no))
    [ arrangeResult ] : (7,4,Some(110600275))
    [ decideAgent ] agent = (2425,SUNLEAF,PIOLINK)
    + makeField: (Some(action),no)
    + makeField: (Some(srcPort),52453)
    + makeField: (Some(destPort),80)
    + makeField: (Some(destIp),211.169.244.10)
    + makeField: (Some(url),www.purunetedu.com)
    + makeField: (Some(srcIp),116.46.237.195)
    + makeField: (Some(signature),110600275)
    + makeField: (Some(companyNm),SUNLEAF)
    + makeField: (Some(companyId),2425)
    + makeField: (Some(companyGroupId),77)
    + makeField: (Some(AgentId),12346776)
    + makeField: (Some(vendor),PIOLINK)
    + makeField: (Some(deviceModel),WebFront K2400)
    + makeField: (Some(deviceType),WF)
    + makeField: (Some(RepeatCount),1)
    + makeField: (Some(count),1)
    + makeField: (Some(SyslogHost),211.169.244.2)
    + makeField: (Some(SyslogTime),7 Aug 17:49:53)
    makeField: Some(srcCountry)
    makeField: Some(destCountry)
    makeField: Some(srcDirection)
    makeField: Some(destDirection)
    + makeField: (Some(signature),์‹œ์Šคํ…œ ํด๋” ์ ‘๊ทผ ์ทจ์•ฝ์ )
    added: (destDirection,Some(OUT))
    added: (srcDirection,Some(OUT))
    added: (destCountry,Some(KR))
    added: (srcCountry,Some(KR))
    added: (SyslogTime,Some(7 Aug 17:49:53))
    added: (SyslogHost,Some(211.169.244.2))
    added: (count,Some(1))
    added: (RepeatCount,Some(1))
    added: (deviceType,Some(WF))
    added: (deviceModel,Some(WebFront K2400))
    added: (vendor,Some(PIOLINK))
    added: (AgentId,Some(12346776))
    added: (companyGroupId,Some(77))
    added: (companyId,Some(2425))
    added: (companyNm,Some(SUNLEAF))
    changed: (signature,Some(์‹œ์Šคํ…œ ํด๋” ์ ‘๊ทผ ์ทจ์•ฝ์ ))
    added: (signature,Some(110600275))
    added: (srcIp,Some(116.46.237.195))
    added: (url,Some(www.purunetedu.com))
    added: (destIp,Some(211.169.244.10))
    added: (destPort,Some(80))
    added: (srcPort,Some(52453))
    added: (action,Some(no)) ==========
    

    Log Message

    - [ State ] : ์ •๊ทœํ™” ์„ฑ๊ณต / ์‹คํŒจ

    - [ SocDeviceTypeHint ] : Syslog header์˜ hostname

    - [ filterAgents ] : hostname์œผ๋กœ ๋“ฑ๋ก๋œ Agent์˜ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ

    - [ lookcupDeviceRules ] : Tokenize์— ์ ์šฉ ํ•  ์ •๊ทœ์‹ ID ์™€ Sensor type

    - [ tryTokenizeRules ] : Tokenize์— ์ ์šฉํ•œ ์ •๊ทœ์‹

    - [ findArrangeRules ] : Tokenize ๊ฒฐ๊ณผ๋ฅผ ์ •๊ทœํ™” ํ•„๋“œ๋กœ Mapping ํ•˜๋Š” ๊ทœ์น™์˜ ID

    - [arrangeResult ] : Mapping ๊ฒฐ๊ณผ < token order, ์ •๊ทœํ™” ํ•„๋“œ id, value>

    - [decideAgent ] : ์ตœ์ข… ๊ฒฐ์ •๋œ ์†Œ์œ ์ฃผ์™€ Agent ์ •๋ณด

    - + makeField : ๋ณ€๊ฒฝ ํ•„๋“œ๊ฐ€ ์ ์šฉ ๋˜๊ธฐ ์ „ ์ƒ์„ฑ๋œ ์ •๊ทœํ™” ํ•„๋“œ

    - added : ๋ณ€๊ฒฝ๋˜์ง€ ์•Š์€ ํ•„๋“œ

    - change : ๋ณ€๊ฒฝ๋œ ํ•„๋“œ

์ •์˜ ๋ฐ ์•ฝ์–ด

  • ์ด๋ฒคํŠธ(Event) : ์ผ๋ฐ˜์ ์œผ๋กœ, ํŠน์ • ์‹œ๊ฐ์— ๋ฐœ์ƒํ•œ ์‚ฌ๊ฑด ๋‚ด์ง€ ์ƒํƒœ์˜ ๋ณ€ํ™”๋ฅผ ์˜๋ฏธํ•˜๋Š”๋ฐ, ์ •์˜ ์ƒ โ€˜์‹œ๊ฐโ€™์ •๋ณด๋ฅผ ํ•„์ˆ˜์ ์œผ๋กœ ํฌํ•จํ•˜๋Š” ํŠน์ง•์ด ์žˆ๋‹ค. ์–ด๋–ค ์‚ฌ๊ฑด์˜ ๊ธฐ๋ก์„ ์˜๋ฏธํ•˜๋Š” log (์‹œ๊ฐ ์ •๋ณด์˜ ํ•„์—ฐ์„ฑ์€ ์™„ํ™”๋จ)์™€ ์œ ์‚ฌํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๋ฉด, ๋ณธ ๋ฌธ์„œ์ƒ์—์„œ ์ด๋ฒคํŠธ(event)์™€ ๋กœ๊ทธ(log)๋Š” ์˜๋ฏธ์˜ ํ˜ผ๋™์ด ์—†์„ ๊ฒฝ์šฐ ํ˜ผ์šฉํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค. ๋‹จ, ํ”„๋กœ๊ทธ๋žจ/๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ๋™์ž‘์„ ๊ธฐ๋กํ•˜๋Š” ๊ฒฝ์šฐ๋„ log(๋‚ด์ง€ logging)๋กœ ํ‘œํ˜„ํ•˜๋‹ˆ ์ฃผ์˜ํ•˜๊ธฐ ๋ฐ”๋ž€๋‹ค
  • ํŒŒ์„œ(Parser) : ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฌธ์ž์—ด์„ ์˜๋ฏธ์žˆ๋Š” ํ† ํฐ(token)์œผ๋กœ ๋ถ„ํ•ดํ•˜๊ณ  ์ด๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ ํŒŒ์Šค ํŠธ๋ฆฌ(parse tree)๋ฅผ ๋งŒ๋“œ๋Š” ๊ณผ์ •์„ ๋งํ•œ๋‹ค.
  • ์ •๊ทœํ™”(Normalization ) : ์–ด๋–ค ๋Œ€์ƒ์„ ์ผ์ •ํ•œ ๊ทœ์น™์ด๋‚˜ ๊ธฐ์ค€์— ๋”ฐ๋ฅด๋Š” โ€˜์ •๊ทœ์ ์ธโ€™ ์ƒํƒœ๋กœ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ๋น„์ •์ƒ์ ์ธ ๋Œ€์ƒ์„ ์ •์ƒ์ ์œผ๋กœ ๋˜๋Œ๋ฆฌ๋Š” ๊ณผ์ •์„ ๋งํ•˜๋Š”๋ฐ, ๋ณธ ๋ฌธ์„œ์—์„œ ์ •๊ทœํ™”๋Š” Parsing ๊ณผ์ •๊ณผ Parse tree์˜ ํ•ด์„๊ณผ์ •์„ ์˜๋ฏธํ•˜๋Š”๋ฐ, ์˜๋ฏธ์ƒ ํ˜ผ๋™์ด ์—†์„ ๊ฒฝ์šฐ Parsing๊ณผ Normalization์€ ํ˜ผ์šฉํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค.
  • Sensor : ์ด๋ฒคํŠธ๋ฅผ ๋งŒ๋“ค์–ด๋‚ธ S/W ๋˜๋Š” ์žฅ์น˜
  • Agent : ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ์‚ฌ์šฉ์ž์˜ ๊ฐœ์ž… ์—†์ด ์ฃผ๊ธฐ์ ์œผ๋กœ ์ •๋ณด๋ฅผ ๋ชจ์œผ๊ฑฐ๋‚˜ ๋˜๋Š” ์ผ๋ถ€ ๋‹ค๋ฅธ ์„œ๋น„์Šค๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ์˜๋ฏธํ•˜๋Š” ๋ฐ, ๋ณธ ๋ฌธ์„œ์—์„œ Agent๋Š” ์ผ๋ฐ˜์ ์ธ ์˜๋ฏธ์™€ ๋‹ค๋ฅด๊ฒŒ ์‚ฌ์šฉ๋˜๋‹ˆ ์ฃผ์˜ํ•˜๊ธฐ ๋ฐ”๋ž€๋‹ค. ๋ณธ ๋ฌธ์„œ์—์„œ Agent๋Š” โ€œํŠน์ •ํ•œ Sensor + ์†Œ์œ ์ž ์ •๋ณดโ€๋ฅผ ํฌํ•จํ•œ ๊ฐœ๋…์ด๋‹ค. ์ฆ‰, ๊ฐ™์€ ์žฅ๋น„(Sensor)๋ผ๊ณ  ํ•˜๋”๋ผ๋„ ๋‹ค๋ฅธ ์†Œ์œ ์ž๊ฐ€ ์†Œ์œ ํ•  ๊ฒฝ์šฐ ๋‹ค๋ฅธ Agent๋กœ ๋ณธ๋‹ค.