Skip to content

It shows a question and answering chatbot using Multi-LLM and Multi-RAG

Notifications You must be signed in to change notification settings

kyopark2014/korean-chatbot-using-amazon-bedrock

Repository files navigation

RAG๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ–ฅ์ƒ๋œ Korean Chatbot ๋งŒ๋“ค๊ธฐ

RAG(Retrieval-Augmented Generation)๋ฅผ ํ™œ์šฉํ•˜๋ฉด, LLM(Large Language Model)์˜ ๊ธฐ๋Šฅ์„ ๊ฐ•ํ™”ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๊ฐœ๋ฐœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” RAG์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ณ  ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ธฐ์—… ๋˜๋Š” ๊ฐœ์ธ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‰ฝ๊ฒŒ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ•œ๊ตญ์–ด Chatbot์„ ๋งŒ๋“ค๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

  • Multimodal: ํ…์ŠคํŠธ๋ฟ ์•„๋‹ˆ๋ผ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๋ถ„์„์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Multi-RAG: ๋‹ค์–‘ํ•œ ์ง€์‹ ์ €์žฅ์†Œ(Knowledge Store)ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Multi-Region LLM: ์—ฌ๋Ÿฌ ๋ฆฌ์ „์— ์žˆ๋Š” LLM์„ ๋™์‹œ์— ํ™œ์šฉํ•จ์œผ๋กœ์จ ์งˆ๋ฌธํ›„ ๋‹ต๋ณ€๊นŒ์ง€์˜ ๋™์ž‘์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•˜๊ณ , On-Demand ๋ฐฉ์‹์˜ ๋™์‹œ ์‹คํ–‰ ์ˆ˜์˜ ์ œํ•œ์„ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Agent: ์™ธ๋ถ€ API๋ฅผ ํ†ตํ•ด ์–ป์–ด์ง„ ๊ฒฐ๊ณผ๋ฅผ ๋Œ€ํ™”์‹œ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰: RAG์˜ ์ง€์‹์ €์žฅ์†Œ์— ๊ด€๋ จ๋œ ๋ฌธ์„œ๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ์— ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ํ™œ์šฉ๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค.
  • ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰: RAG์— ํ•œ๊ตญ์–ด์™€ ์˜์–ด ๋ฌธ์„œ๋“ค์ด ํ˜ผ์žฌํ•  ๊ฒฝ์šฐ์— ํ•œ๊ตญ์–ด๋กœ ์˜์–ด ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ํ•œ๊ตญ์–ด๋กœ ํ•œ๊ตญ์–ด, ์˜์–ด ๋ฌธ์„œ๋ฅผ ๋ชจ๋‘ ๊ฒ€์ƒ‰ํ•˜์—ฌ RAG์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Prioroty Search: ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ๊ด€๋ จ๋„์— ๋”ฐ๋ผ ์ •๋ ฌํ•˜๋ฉด LLM์˜ ๊ฒฐ๊ณผ๊ฐ€ ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.
  • Kendra ์„ฑ๋Šฅ ํ–ฅ์ƒ: LangChain์—์„œ Kendra์˜ FAQ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Vector/Keyword ๊ฒ€์ƒ‰: Vector ๊ฒ€์ƒ‰(Sementaic) ๋ฟ ์•„๋‹ˆ๋ผ, Lexical ๊ฒ€์ƒ‰(Keyword)์„ ํ™œ์šฉํ•˜์—ฌ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์ฐพ์„ ํ™•์œจ์„ ๋†’์ž…๋‹ˆ๋‹ค.
  • Code Generation: ๊ธฐ์กด ์ฝ”๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ Python/Node.js ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ๊ตฌํ˜„ํ•œ ์ฝ”๋“œ๋“ค์€ LangChain์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์•„๋ž˜์™€ ๊ฐ™์€ Prompt Engineing ์˜ˆ์ œ๋ฅผ ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋ฒˆ์—ญ (translation): ์ž…๋ ฅ๋œ ๋ฌธ์žฅ์„ ๋ฒˆ์—ญํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฌธ๋ฒ• ์˜ค๋ฅ˜ ์ถ”์ถœ (Grammatical Error Correction): ์˜์–ด์— ๋Œ€ํ•œ ๋ฌธ์žฅ ์—๋Ÿฌ๋ฅผ ์„ค๋ช…ํ•˜๊ณ , ์ˆ˜์ •๋œ ๋ฌธ์žฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๋ฆฌ๋ทฐ ๋ถ„์„ (Extracted Topic and Sentiment): ์ž…๋ ฅ๋œ ๋ฆฌ๋ทฐ์˜ ์ฃผ์ œ์™€ ๊ฐ์ •(Sentiment)์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ์ •๋ณด ์ถ”์ถœ (Information Extraction): ์ž…๋ ฅ๋œ ๋ฌธ์žฅ์—์„œ email๊ณผ ๊ฐ™์€ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐœ์ธ ์ •๋ณด ์‚ญ์ œ (Removing PII): ์ž…๋ ฅ๋œ ๋ฌธ์žฅ์—์„œ ๊ฐœ์ธ์ •๋ณด(PII)๋ฅผ ์‚ญ์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ณต์žกํ•œ ์งˆ๋ฌธ (Complex Question): step-by-step์œผ๋กœ ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  • ์–ด๋ฆฐ์ด์™€ ๋Œ€ํ™” (Child Conversation): ๋Œ€ํ™”์ƒ๋Œ€์— ๋งž๊ฒŒ ์ ์ ˆํ•œ ์–ดํœ˜๋‚˜ ๋‹ต๋ณ€์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์‹œ๊ฐ„์ •๋ณด ์ถ”์ถœ (Timestamp Extraction): ์ž…๋ ฅ๋œ ์ •๋ณด์—์„œ ์‹œ๊ฐ„์ •๋ณด(timestemp)๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ์ž์œ ๋กœ์šด ๋Œ€ํ™” (Free Conversation): ์นœ๊ตฌ์ฒ˜๋Ÿผ ๋ฐ˜๋ง๋กœ ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š”

์ „์ฒด์ ์ธ ์•„ํ‚คํ…์ฒ˜๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์€ WebSocket์„ ์ด์šฉํ•˜์—ฌ AWS Lambda์—์„œ RAG์™€ LLM์„ ์ด์šฉํ•˜์—ฌ ๋‹ต๋ณ€ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€ํ™” ์ด๋ ฅ(chat history)๋ฅผ ์ด์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ(Question)์„ ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised question)์œผ๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด ์งˆ๋ฌธ์œผ๋กœ ์ง€์‹ ์ €์žฅ์†Œ(Knowledge Store)์ธ Kendra์™€ OpenSearch์— ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‘๊ฐœ์˜ ์ง€์‹์ €์žฅ์†Œ์—๋Š” ์šฉ๋„์— ๋งž๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์ž…๋ ฅ๋˜์–ด ์žˆ๋Š”๋ฐ, ๋งŒ์•ฝ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋”๋ผ๋„, ๋‘๊ฐœ์˜ ์ง€์‹์ €์žฅ์†Œ์˜ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ์ฐจ์ด๋กœ ์ธํ•ด, ์„œ๋กœ ๋ณด์™„์ ์ธ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ง€์‹์ €์žฅ์†Œ์— ํ•œ๊ตญ์–ด/ํ•œ๊ตญ์–ด๋กœ ๋œ ๋ฌธ์„œ๋“ค์ด ์žˆ๋‹ค๋ฉด, ํ•œ๊ตญ์–ด ์งˆ๋ฌธ์€ ์˜์–ด๋กœ ๋œ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์งˆ๋ฌธ์ด ํ•œ๊ตญ์–ด๋ผ๋ฉด ํ•œ๊ตญ์–ด๋กœ ํ•œ๊ตญ์–ด ๋ฌธ์„œ๋ฅผ ๋จผ์ € ๊ฒ€์ƒ‰ํ•œ ํ›„์—, ์˜์–ด๋กœ ๋ฒˆ์—ญํ•˜์—ฌ ๋‹ค์‹œ ํ•œ๋ฒˆ ์˜์–ด ๋ฌธ์„œ๋“ค์„ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ ํ•œ๊ตญ์–ด๋กœ ์งˆ๋ฌธ์„ ํ•˜๋”๋ผ๋„ ์˜์–ด ๋ฌธ์„œ๊นŒ์ง€ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ๋‘ ์ง€์‹์ €์žฅ์†Œ๊ฐ€ ๊ด€๋ จ๋œ ๋ฌธ์„œ(Relevant documents)๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š๋‹ค๋ฉด, Google Search API๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธํ„ฐ๋„ท์— ๊ด€๋ จ๋œ ์›นํŽ˜์ด์ง€๋“ค์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ , ์ด๋•Œ ์–ป์–ด์ง„ ๊ฒฐ๊ณผ๋ฅผ RAG์ฒ˜๋Ÿผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ƒ์„ธํ•˜๊ฒŒ ๋‹จ๊ณ„๋ณ„๋กœ ์„ค๋ช…ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 1: ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ(question)์€ API Gateway๋ฅผ ํ†ตํ•ด Lambda์— Web Socket ๋ฐฉ์‹์œผ๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. Lambda๋Š” JSON body์—์„œ ์งˆ๋ฌธ์„ ์ฝ์–ด์˜ต๋‹ˆ๋‹ค. ์ด๋•Œ ์‚ฌ์šฉ์ž์˜ ์ด์ „ ๋Œ€ํ™”์ด๋ ฅ์ด ํ•„์š”ํ•˜๋ฏ€๋กœ Amazon DynamoDB์—์„œ ์ฝ์–ด์˜ต๋‹ˆ๋‹ค. DynamoDB์—์„œ ๋Œ€ํ™”์ด๋ ฅ์„ ๋กœ๋”ฉํ•˜๋Š” ์ž‘์—…์€ ์ฒ˜์Œ 1ํšŒ๋งŒ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 2: ์‚ฌ์šฉ์ž์˜ ๋Œ€ํ™”์ด๋ ฅ์„ ๋ฐ˜์˜ํ•˜์—ฌ ์‚ฌ์šฉ์ž์™€ Chatbot์ด interactiveํ•œ ๋Œ€ํ™”๋ฅผ ํ•  ์ˆ˜ ์žˆ๋„๋ก, ๋Œ€ํ™”์ด๋ ฅ๊ณผ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์œผ๋กœ ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised Question)์„ ์ƒ์„ฑํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. LLM์— ๋Œ€ํ™”์ด๋ ฅ(chat history)๋ฅผ Context๋กœ ์ œ๊ณตํ•˜๊ณ  ์ ์ ˆํ•œ Prompt๋ฅผ ์ด์šฉํ•˜๋ฉด ์ƒˆ๋กœ์šด ์งˆ๋ฌธ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 3: ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised question)์œผ๋กœ OpenSearch์— ์งˆ๋ฌธ์„ ํ•˜์—ฌ ๊ด€๋ จ๋œ ๋ฌธ์„œ(Relevant Documents)๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 4: ์งˆ๋ฌธ์ด ํ•œ๊ตญ์–ด์ธ ๊ฒฝ์šฐ์— ์˜์–ด ๋ฌธ์„œ๋„ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised question)์„ ์˜์–ด๋กœ ๋ฒˆ์—ญํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 5: ๋ฒˆ์—ญ๋œ ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(translated revised question)์„ ์ด์šฉํ•˜์—ฌ Kendra์™€ OpenSearch์— ์งˆ๋ฌธํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 6: ๋ฒˆ์—ญ๋œ ์งˆ๋ฌธ์œผ๋กœ ์–ป์€ ๊ด€๋ จ๋œ ๋ฌธ์„œ๊ฐ€ ์˜์–ด ๋ฌธ์„œ์ผ ๊ฒฝ์šฐ์—, LLM์„ ํ†ตํ•ด ๋ฒˆ์—ญ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ด€๋ จ๋œ ๋ฌธ์„œ๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ์ด๋ฏ€๋กœ Multi-Region์˜ LLM๋“ค์„ ํ™œ์šฉํ•˜์—ฌ ์ง€์—ฐ์‹œ๊ฐ„์„ ์ตœ์†Œํ™” ํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 7: ํ•œ๊ตญ์–ด ์งˆ๋ฌธ์œผ๋กœ ์–ป์€ N๊ฐœ์˜ ๊ด€๋ จ๋œ ๋ฌธ์„œ์™€, ์˜์–ด๋กœ ๋œ N๊ฐœ์˜ ๊ด€๋ จ๋œ ๋ฌธ์„œ์˜ ํ•ฉ์€ ์ตœ๋Œ€ 2xN๊ฐœ์ž…๋‹ˆ๋‹ค. ์ด ๋ฌธ์„œ๋ฅผ ๊ฐ€์ง€๊ณ  Context Window ํฌ๊ธฐ์— ๋งž๋„๋ก ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ๊ด€๋ จ๋˜๊ฐ€ ๋†’์€ ๋ฌธ์„œ๊ฐ€ Context์˜ ์ƒ๋‹จ์— ๊ฐ€๋„๋ก ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 8: ๊ด€๋ จ๋„๊ฐ€ ์ผ์ • ์ดํ•˜์ธ ๋ฌธ์„œ๋Š” ๋ฒ„๋ฆฌ๋ฏ€๋กœ, ํ•œ๊ฐœ์˜ RAG์˜ ๋ฌธ์„œ๋„ ์„ ํƒ๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ์—๋Š” Google Seach API๋ฅผ ํ†ตํ•ด ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์ด๋•Œ ์–ป์–ด์ง„ ๋ฌธ์„œ๋“ค์„ Priority Search๋ฅผ ํ•˜์—ฌ ๊ด€๋ จ๋„๊ฐ€ ์ผ์ • ์ด์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ RAG์—์„œ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 9: ์„ ํƒ๋œ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋“ค(Selected relevant documents)๋กœ Context๋ฅผ ์ƒ์„ฑํ•œ ํ›„์— ์ƒˆ๋กœ์šด ์งˆ๋ฌธ(Revised question)๊ณผ ํ•จ๊ป˜ LLM์— ์ „๋‹ฌํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ด๋•Œ์˜ Sequence diagram์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ RAG์—์„œ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์ฐพ์ง€๋ชปํ•  ๊ฒฝ์šฐ์—๋Š” Google Search API๋ฅผ ํ†ตํ•ด Query๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ RAG์ฒ˜๋Ÿผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋Œ€ํ™”์ด๋ ฅ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•œ DynamoDB๋Š” ์ฒซ๋ฒˆ์งธ ์งˆ๋ฌธ์—๋งŒ ํ•ด๋‹น๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” "us-east-1"๊ณผ "us-west-2"์˜ Bedrock์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ์•„๋ž˜์™€ ๊ฐ™์ด ์งˆ๋ฌธ๋งˆ๋‹ค ๋‹ค๋ฅธ Region์˜ Bedrock Claude LLM์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋Œ€๋Ÿ‰์œผ๋กœ ํŒŒ์ผ ์—…๋กœ๋“œ ๋˜๋Š” ์‚ญ์ œ์‹œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ Event driven๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด S3๋กœ ๋Œ€๊ทœ๋ชจ๋กœ ๋ฌธ์„œ ๋˜๋Š” ์ฝ”๋“œ๋ฅผ ๋„ฃ์„๋•Œ์— ์ •๋ณด์˜ ์œ ์ถœ์—†์ด RAG์˜ ์ง€์‹์ €์žฅ์†Œ๋ฅผ ๋ฐ์ดํ„ฐ๋ฅผ ์ฃผ์ž…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ–ฅ์ƒ๋œ RAG ๊ตฌํ˜„ํ•˜๊ธฐ

Multi-RAG

์—ฌ๋Ÿฌ๊ฐœ์˜ RAG๋ฅผ ํ™œ์šฉํ•  ๊ฒฝ์šฐ์— ์š”์ฒญํ›„ ์‘๋‹ต๊นŒ์ง€์˜ ์ง€์—ฐ์‹œ๊ฐ„์ด ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณ‘๋ ฌ ํ”„๋กœ์„ธ์‹ฑ์„ ์ด์šฉํ•˜์—ฌ ๋™์‹œ์— ์ง€์‹ ์ €์žฅ์†Œ์— ๋Œ€ํ•œ ์งˆ๋ฌธ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ๊ด€๋ จ๋œ Blog์ธ Multi-RAG์™€ Multi-Region LLM๋กœ ํ•œ๊ตญ์–ด Chatbot ๋งŒ๋“ค๊ธฐ๋ฅผ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

from multiprocessing import Process, Pipe

processes = []
parent_connections = []
for rag in capabilities:
    parent_conn, child_conn = Pipe()
    parent_connections.append(parent_conn)

    process = Process(target = retrieve_process_from_RAG, args = (child_conn, revised_question, top_k, rag))
    processes.append(process)

for process in processes:
    process.start()

for parent_conn in parent_connections:
    rel_docs = parent_conn.recv()

    if (len(rel_docs) >= 1):
        for doc in rel_docs:
            relevant_docs.append(doc)

for process in processes:
    process.join()

def retrieve_process_from_RAG(conn, query, top_k, rag_type):
    relevant_docs = []
    if rag_type == 'kendra':
        rel_docs = retrieve_from_kendra(query=query, top_k=top_k)      
    else:
        rel_docs = retrieve_from_vectorstore(query=query, top_k=top_k, rag_type=rag_type)
    
    if(len(rel_docs)>=1):
        for doc in rel_docs:
            relevant_docs.append(doc)    
    
    conn.send(relevant_docs)
    conn.close()

Multi-Region LLM

์—ฌ๋Ÿฌ ๋ฆฌ์ „์˜ LLM์— ๋Œ€ํ•œ profile์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ cdk-korean-chatbot-stack.ts์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

const claude3_sonnet = [
  {
    "bedrock_region": "us-west-2", // Oregon
    "model_type": "claude3",
    "model_id": "anthropic.claude-3-sonnet-20240229-v1:0",   
    "maxOutputTokens": "4096"
  },
  {
    "bedrock_region": "us-east-1", // N.Virginia
    "model_type": "claude3",
    "model_id": "anthropic.claude-3-sonnet-20240229-v1:0",
    "maxOutputTokens": "4096"
  }
];

const profile_of_LLMs = claude3_sonnet;

Bedrock์—์„œ client๋ฅผ ์ง€์ •ํ• ๋•Œ bedrock_region์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์ด LLM์„ ์„ ํƒํ•˜๋ฉด Lambda์— event๊ฐ€ ์˜ฌ๋•Œ๋งˆ๋‹ค ๋‹ค๋ฅธ ๋ฆฌ์ „์˜ LLM์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from langchain_aws import ChatBedrock

profile_of_LLMs = json.loads(os.environ.get('profile_of_LLMs'))
selected_LLM = 0

def get_chat(profile_of_LLMs, selected_LLM):
    profile = profile_of_LLMs[selected_LLM]
    bedrock_region =  profile['bedrock_region']
    modelId = profile['model_id']
    print(f'LLM: {selected_LLM}, bedrock_region: {bedrock_region}, modelId: {modelId}')
    maxOutputTokens = int(profile['maxOutputTokens'])
                          
    # bedrock   
    boto3_bedrock = boto3.client(
        service_name='bedrock-runtime',
        region_name=bedrock_region,
        config=Config(
            retries = {
                'max_attempts': 30
            }            
        )
    )
    parameters = {
        "max_tokens":maxOutputTokens,     
        "temperature":0.1,
        "top_k":250,
        "top_p":0.9,
        "stop_sequences": [HUMAN_PROMPT]
    }
    # print('parameters: ', parameters)

    chat = ChatBedrock(   # new chat model
        model_id=modelId,
        client=boto3_bedrock, 
        model_kwargs=parameters,
    )       
    
    return chat

lambda(chat)์™€ ๊ฐ™์ด ๋ฌธ์„œ๋ฅผ ๋ฒˆ์—ญํ•  ๋•Œ์—์„œ ๋ณ‘๋ ฌ๋กœ ์กฐํšŒํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, Lambda์˜ Multi thread๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ, ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฐ๋™ ํ•  ๋•Œ์—๋Š” Pipe()์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

def translate_relevant_documents_using_parallel_processing(docs):
    selected_LLM = 0
    relevant_docs = []    
    processes = []
    parent_connections = []
    for doc in docs:
        parent_conn, child_conn = Pipe()
        parent_connections.append(parent_conn)
            
        chat = get_chat(profile_of_LLMs, selected_LLM)
        bedrock_region = profile_of_LLMs[selected_LLM]['bedrock_region']

        process = Process(target=translate_process_from_relevent_doc, args=(child_conn, chat, doc, bedrock_region))
        processes.append(process)

        selected_LLM = selected_LLM + 1
        if selected_LLM == len(profile_of_LLMs):
            selected_LLM = 0

    for process in processes:
        process.start()
            
    for parent_conn in parent_connections:
        doc = parent_conn.recv()
        relevant_docs.append(doc)    

    for process in processes:
        process.join()
    
    #print('relevant_docs: ', relevant_docs)
    return relevant_docs

Agent ์ •์˜ ๋ฐ ํ™œ์šฉ

Agent ํ™œ์šฉ์—์„œ๋Š” LangChain์˜ ReAct Agent๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” ๊ต๋ณด๋ฌธ์˜ Search API๋ฅผ ์ด์šฉํ•ด ๋„์„œ์ •๋ณด๋ฅผ ์กฐํšŒํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰

ํ•œ์˜ ๊ฒ€์ƒ‰์„ ์œ„ํ•ด ๋จผ์ € ํ•œ๊ตญ์–ด๋กœ RAG๋ฅผ ์กฐํšŒํ•˜๊ณ , ์˜์–ด๋กœ ๋ฒˆ์—ญํ•œ ํ›„์— ๊ฐ๊ฐ์˜ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋“ค(Relevant Documents)๋ฅผ ๋ฒˆ์—ญํ•ฉ๋‹ˆ๋‹ค. ๊ด€๋ จ๋œ ๋ฌธ์„œ๋“ค์— ๋Œ€ํ•ด ์งˆ๋ฌธ์— ๋”ฐ๋ผ ๊ด€๋ จ์„ฑ์„ ๋น„๊ตํ•˜์—ฌ ๊ด€๋ จ๋„๊ฐ€ ๋†’์€ ๋ฌธ์„œ์ˆœ์„œ๋กœ Context๋ฅผ ๋งŒ๋“ค์–ด์„œ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ๊ด€๋ จ๋œ Blog์ธ ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰ ๋ฐ ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ํ™œ์šฉํ•˜์—ฌ RAG๋ฅผ ํŽธ๋ฆฌํ•˜๊ฒŒ ํ™œ์šฉํ•˜๊ธฐ์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

translated_revised_question = traslation_to_english(llm=llm, msg=revised_question)

relevant_docs_using_translated_question = retrieve_from_vectorstore(query=translated_revised_question, top_k=4, rag_type=rag_type)
            
docs_translation_required = []
if len(relevant_docs_using_translated_question)>=1:
    for i, doc in enumerate(relevant_docs_using_translated_question):
        if isKorean(doc)==False:
            docs_translation_required.append(doc)
        else:
            relevant_docs.append(doc)
                                   
    translated_docs = translate_relevant_documents_using_parallel_processing(docs_translation_required)
    for i, doc in enumerate(translated_docs):
        relevant_docs.append(doc)

์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰

Multi-RAG๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ๊ฐœ์˜ ์ง€์‹ ์ €์žฅ์†Œ์— ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์กฐํšŒํ•˜์˜€์Œ์—๋„ ๋ฌธ์„œ๊ฐ€ ์—†๋‹ค๋ฉด, ๊ตฌ๊ธ€ ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ์–ป์–ด์ง„ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ, assessed_score๋Š” priority search์‹œ FAISS์˜ Score๋กœ ์—…๋ฐ์ดํŠธ ๋ฉ๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ Google Search API ๊ด€๋ จ๋œ Blog์ธ ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰ ๋ฐ ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ํ™œ์šฉํ•˜์—ฌ RAG๋ฅผ ํŽธ๋ฆฌํ•˜๊ฒŒ ํ™œ์šฉํ•˜๊ธฐ์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

from googleapiclient.discovery import build

google_api_key = os.environ.get('google_api_key')
google_cse_id = os.environ.get('google_cse_id')

api_key = google_api_key
cse_id = google_cse_id

relevant_docs = []
try:
    service = build("customsearch", "v1", developerKey = api_key)
    result = service.cse().list(q = revised_question, cx = cse_id).execute()
    print('google search result: ', result)

    if "items" in result:
        for item in result['items']:
            api_type = "google api"
            excerpt = item['snippet']
            uri = item['link']
            title = item['title']
            confidence = ""
            assessed_score = ""

            doc_info = {
                "rag_type": 'search',
                "api_type": api_type,
                "confidence": confidence,
                "metadata": {
                    "source": uri,
                    "title": title,
                    "excerpt": excerpt,                                
                },
                "assessed_score": assessed_score,
            }
        relevant_docs.append(doc_info)

Priority Search (๊ด€๋ จ๋„ ๊ธฐ์ค€ ๋ฌธ์„œ ์„ ํƒ)

Multi-RAG, ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰, ์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰๋“ฑ์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ˆ˜์˜ ๊ด€๋ จ๋œ ๋ฌธ์„œ๊ฐ€ ๋‚˜์˜ค๋ฉด, ๊ด€๋ จ๋„๊ฐ€ ๋†’์€ ์ˆœ์„œ๋Œ€๋กœ ์ผ๋ถ€ ๋ฌธ์„œ๋งŒ์„ RAG์—์„œ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Faiss์˜ similarity search๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์ •๋Ÿ‰๋œ ๊ฐ’์˜ ๊ด€๋ จ๋„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์–ด์„œ, ๊ด€๋ จ๋˜์ง€ ์•Š์€ ๋ฌธ์„œ๋ฅผ Context๋กœ ํ™œ์šฉํ•˜์ง€ ์•Š๋„๋ก ํ•ด์ค๋‹ˆ๋‹ค.

selected_relevant_docs = []
if len(relevant_docs)>=1:
    selected_relevant_docs = priority_search(revised_question, relevant_docs, bedrock_embeddings)

def priority_search(query, relevant_docs, bedrock_embeddings):
    excerpts = []
    for i, doc in enumerate(relevant_docs):
        if doc['metadata']['translated_excerpt']:
            content = doc['metadata']['translated_excerpt']
        else:
            content = doc['metadata']['excerpt']
        
        excerpts.append(
            Document(
                page_content=content,
                metadata={
                    'name': doc['metadata']['title'],
                    'order':i,
                }
            )
        )  

    embeddings = bedrock_embeddings
    vectorstore_confidence = FAISS.from_documents(
        excerpts,  # documents
        embeddings  # embeddings
    )            
    rel_documents = vectorstore_confidence.similarity_search_with_score(
        query=query,
        k=top_k
    )

    docs = []
    for i, document in enumerate(rel_documents):

        order = document[0].metadata['order']
        name = document[0].metadata['name']
        assessed_score = document[1]

        relevant_docs[order]['assessed_score'] = int(assessed_score)

        if assessed_score < 200:
            docs.append(relevant_docs[order])    

    return docs

Kendra์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ

Kendra ๋ฅผ ์ด์šฉํ•œ RAG์˜ ๊ตฌํ˜„์— ๋”ฐ๋ผ Kendra์˜ RAG ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Kendra์˜ FAQ์™€ ๊ฐ™์ด ์ •๋ฆฌ๋œ ๋ฌธ์„œ๋ฅผ ํ™œ์šฉํ•˜๊ณ , ๊ด€๋ จ๋„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•˜์—ฌ Context๋กœ ํ™•์ธ ํ•ฉ๋‹ˆ๋‹ค. Kendra์—์„œ ๋ฌธ์„œ ๋“ฑ๋ก์— ํ•„์š”ํ•œ ๋‚ด์šฉ์€ kendra-document.md์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ๊ด€๋ จ๋œ Blog์ธ Amazon Bedrock์˜ Claude์™€ Amazon Kendra๋กœ ํ–ฅ์ƒ๋œ RAG ์‚ฌ์šฉํ•˜๊ธฐ์„ ์ฐธ๊ณ ํ•ฉ๋‹ˆ๋‹ค.

Code Generation

RAG์— ์ €์žฅ๋œ ๊ธฐ์กด ์ฝ”๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. rag-code-generation๋Š” Code๋ฅผ ํ•œ๊ตญ์–ด๋กœ ์š”์•ฝํ•˜์—ฌ RAG์— ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” ์ผ๋ฐ˜ ๋ฌธ์„œ์™€ Code reference๋ฅผ ํ•˜๋‚˜์˜ RAG์— ์ €์žฅํ•˜๊ณ  ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

Embedding

BedrockEmbeddings์„ ์ด์šฉํ•˜์—ฌ Embedding์„ ํ•ฉ๋‹ˆ๋‹ค. 'amazon.titan-embed-text-v1'์€ Titan Embeddings Generation 1 (G1)์„ ์˜๋ฏธํ•˜๋ฉฐ 8k token์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

bedrock_embeddings = BedrockEmbeddings(
    client=boto3_bedrock,
    region_name = bedrock_region,
    model_id = 'amazon.titan-embed-text-v1' 
)

Knowledge Store

์—ฌ๊ธฐ์„œ๋Š” Knowledge Store๋กœ OpenSearch, Faiss, Kendra์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

๋ฉ”๋ชจ๋ฆฌ์— ๋Œ€ํ™” ์ €์žฅ

lambda-chat-ws๋Š” ์ธ์ž…๋œ ๋ฉ”์‹œ์ง€์˜ userId๋ฅผ ์ด์šฉํ•˜์—ฌ map_chain์— ์ €์žฅ๋œ ๋Œ€ํ™” ์ด๋ ฅ(memory_chain)๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ์ฑ„ํŒ… ์ด๋ ฅ์ด ์—†๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ConversationBufferWindowMemory๋กœ memory_chain์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ,

map_chain = dict() 

if userId in map_chain:
    print('memory exist. reuse it!')        
    memory_chain = map_chain[userId]
        
else: 
    memory_chain = ConversationBufferWindowMemory(memory_key="chat_history", output_key='answer', return_messages=True, k=10)
    map_chain[userId] = memory_chain
        
    allowTime = getAllowTime()
    load_chat_history(userId, allowTime)

msg = general_conversation(connectionId, requestId, chat, text)

def general_conversation(connectionId, requestId, chat, query):
    if isKorean(query)==True :
        system = (
            "๋‹ค์Œ์˜ Human๊ณผ Assistant์˜ ์นœ๊ทผํ•œ ์ด์ „ ๋Œ€ํ™”์ž…๋‹ˆ๋‹ค. Assistant์€ ์ƒํ™ฉ์— ๋งž๋Š” ๊ตฌ์ฒด์ ์ธ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์ถฉ๋ถ„ํžˆ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Assistant์˜ ์ด๋ฆ„์€ ์„œ์—ฐ์ด๊ณ , ๋ชจ๋ฅด๋Š” ์งˆ๋ฌธ์„ ๋ฐ›์œผ๋ฉด ์†”์งํžˆ ๋ชจ๋ฅธ๋‹ค๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค."
        )
    else: 
        system = (
            "Using the following conversation, answer friendly for the newest question. If you don't know the answer, just say that you don't know, don't try to make up an answer. You will be acting as a thoughtful advisor."
        )
    
    human = "{input}"
    
    prompt = ChatPromptTemplate.from_messages([("system", system), MessagesPlaceholder(variable_name="history"), ("human", human)])
    
    history = memory_chain.load_memory_variables({})["chat_history"]
                
    chain = prompt | chat    
    try: 
        isTyping(connectionId, requestId)  
        stream = chain.invoke(
            {
                "history": history,
                "input": query,
            }
        )
        msg = readStreamMsg(connectionId, requestId, stream.content)    
                            
        msg = stream.content
        print('msg: ', msg)
    except Exception:
        err_msg = traceback.format_exc()
        print('error message: ', err_msg)        
            
        sendErrorMessage(connectionId, requestId, err_msg)    
        raise Exception ("Not able to request to LLM")

    return msg

์ƒˆ๋กœ์šด Diaglog๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด chat_memory์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

memory_chain.chat_memory.add_user_message(text) 
memory_chain.chat_memory.add_ai_message(msg)

Stream ์ฒ˜๋ฆฌ

์—ฌ๊ธฐ์„œ stream์€ ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ WebSocket์„ ์‚ฌ์šฉํ•˜๋Š” client์— ๋ฉ”์‹œ์ง€๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ๊ด€๋ จ๋œ Blog์ธ Amazon Bedrock์„ ์ด์šฉํ•˜์—ฌ Stream ๋ฐฉ์‹์˜ ํ•œ๊ตญ์–ด Chatbot ๊ตฌํ˜„ํ•˜๊ธฐ์„ ์ฐธ๊ณ ํ•ฉ๋‹ˆ๋‹ค.

def readStreamMsg(connectionId, requestId, stream):
    msg = ""
    if stream:
        for event in stream:
            msg = msg + event

            result = {
                'request_id': requestId,
                'msg': msg
            }
            sendMessage(connectionId, result)
    print('msg: ', msg)
    return msg

์—ฌ๊ธฐ์„œ client๋กœ ๋ฉ”์‹œ์ง€๋ฅผ ๋ณด๋‚ด๋Š” sendMessage()๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” boto3์˜ post_to_connection๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฉ”์‹œ์ง€๋ฅผ WebSocket์˜ endpoint์ธ API Gateway๋กœ ์ „์†กํ•ฉ๋‹ˆ๋‹ค.

def sendMessage(id, body):
    try:
        client.post_to_connection(
            ConnectionId=id, 
            Data=json.dumps(body)
        )
    except: 
        raise Exception ("Not able to send a message")

S3๋ฅผ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋กœ ํ•˜๊ธฐ ์œ„ํ•œ ํผ๋ฏธ์…˜

Log์— ๋Œ€ํ•œ ํผ๋ฏธ์…˜์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "cloudwatch:GenerateQuery",
                "logs:*"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

๊ฐœ๋ฐœ ๋ฐ ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•ด Kendra์—์„œ ์ถ”๊ฐ€๋กœ S3๋ฅผ ๋“ฑ๋กํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ชจ๋“  S3์— ๋Œ€ํ•œ ์ฝ๊ธฐ ํผ๋ฏธ์…˜์„ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Action": [
				"s3:Describe*",
				"s3:Get*",
				"s3:List*"
			],
			"Resource": "*",
			"Effect": "Allow"
		}
	]
}

์ด๋ฅผ CDK๋กœ ๊ตฌํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

const kendraLogPolicy = new iam.PolicyStatement({
    resources: ['*'],
    actions: ["logs:*", "cloudwatch:GenerateQuery"],
});
roleKendra.attachInlinePolicy( // add kendra policy
    new iam.Policy(this, `kendra-log-policy-for-${projectName}`, {
        statements: [kendraLogPolicy],
    }),
);
const kendraS3ReadPolicy = new iam.PolicyStatement({
    resources: ['*'],
    actions: ["s3:Get*", "s3:List*", "s3:Describe*"],
});
roleKendra.attachInlinePolicy( // add kendra policy
    new iam.Policy(this, `kendra-s3-read-policy-for-${projectName}`, {
        statements: [kendraS3ReadPolicy],
    }),
);    

Kendra ํŒŒ์ผ ํฌ๊ธฐ Quota

Quota Console - File size์™€ ๊ฐ™์ด Kendra์— ์˜ฌ๋ฆด์ˆ˜ ์žˆ๋Š” ํŒŒ์ผํฌ๊ธฐ๋Š” 50MB๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” Quota ์กฐ์ • ์š”์ฒญ์„ ์œ„ํ•ด ์ ์ ˆํ•œ ๊ฐ’์œผ๋กœ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์ด ๊ฒฝ์šฐ์—๋„ ํŒŒ์ผ ํ•œ๊ฐœ์—์„œ ์–ป์–ด๋‚ผ์ˆ˜ ์žˆ๋Š” Text์˜ ํฌ๊ธฐ๋Š” 5MB๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. msg๋ฅผ ํ•œ๊ตญ์–ด Speech๋กœ ๋ณ€ํ™˜ํ•œ ํ›„์— CloudFront URL์„ ์ด์šฉํ•˜์—ฌ S3์— ์ €์žฅ๋œ Speech๋ฅผ URI๋กœ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ ์ฝ์–ด์ฃผ๊ธฐ

Amazon Polly๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ์ฝ์–ด์ค๋‹ˆ๋‹ค. start_speech_synthesis_task์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

def get_text_speech(path, speech_prefix, bucket, msg):
    ext = "mp3"
    polly = boto3.client('polly')
    try:
        response = polly.start_speech_synthesis_task(
            Engine='neural',
            LanguageCode='ko-KR',
            OutputFormat=ext,
            OutputS3BucketName=bucket,
            OutputS3KeyPrefix=speech_prefix,
            Text=msg,
            TextType='text',
            VoiceId='Seoyeon'        
        )
        print('response: ', response)
    except Exception:
        err_msg = traceback.format_exc()
        print('error message: ', err_msg)        
        raise Exception ("Not able to create voice")
    
    object = '.'+response['SynthesisTask']['TaskId']+'.'+ext
    print('object: ', object)

    return path+speech_prefix+parse.quote(object)

๋ฐ์ดํ„ฐ ์†Œ์Šค ์ถ”๊ฐ€

S3๋ฅผ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅด ์ถ”๊ฐ€ํ• ๋•Œ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜ํ–‰ํ•˜๋ฉด ๋˜๋‚˜, languageCode๊ฐ€ ๋ฏธ์ง€์›๋˜์–ด์„œ CLI๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.

const cfnDataSource = new kendra.CfnDataSource(this, `s3-data-source-${projectName}`, {
    description: 'S3 source',
    indexId: kendraIndex,
    name: 'data-source-for-upload-file',
    type: 'S3',
    // languageCode: 'ko',
    roleArn: roleKendra.roleArn,
    // schedule: 'schedule',

    dataSourceConfiguration: {
        s3Configuration: {
            bucketName: s3Bucket.bucketName,
            documentsMetadataConfiguration: {
                s3Prefix: 'metadata/',
            },
            inclusionPrefixes: ['documents/'],
        },
    },
});

CLI ๋ช…๋ น์–ด ์˜ˆ์ œ์ž…๋‹ˆ๋‹ค.

aws kendra create-data-source
--index-id azfbd936-4929-45c5-83eb-bb9d458e8348
--name data-source-for-upload-file
--type S3
--role-arn arn:aws:iam::123456789012:role/role-lambda-chat-ws-for-korean-chatbot-us-west-2
--configuration '{"S3Configuration":{"BucketName":"storage-for-korean-chatbot-us-west-2", "DocumentsMetadataConfiguration": {"S3Prefix":"metadata/"},"InclusionPrefixes": ["documents/"]}}'
--language-code ko
--region us-west-2

OpenSearch

Python client์— ๋”ฐ๋ผ OpenSearch๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

opensearch-py๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

pip install opensearch-py

Index naming restrictions์— ๋”ฐ๋ž index๋Š” low case์—ฌ์•ผํ•˜๊ณ , ๊ณต๋ฐฑ์ด๋‚˜ ','์„ ๊ฐ€์งˆ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

OpenSearch์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๋ฐฉ๋ฒ•

Vector ๊ฒ€์ƒ‰(Sementaic) ๋ฟ ์•„๋‹ˆ๋ผ, Lexical ๊ฒ€์ƒ‰(Keyword)์„ ํ™œ์šฉํ•˜์—ฌ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์ฐพ์„ ํ™•์œจ์„ ๋†’์ž…๋‹ˆ๋‹ค. ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ OpenSearch์—์„œ Lexical ๊ฒ€์ƒ‰์— ์žˆ์Šต๋‹ˆ๋‹ค.

OpenSearch์˜ ๋ฌธ์„œ ์—…๋ฐ์ดํŠธ

๋ฌธ์„œ ์ƒ์„ฑ์‹œ ์—…๋ฐ์ดํŠธ๊นŒ์ง€ ๊ณ ๋ คํ•˜์—ฌ index๋ฅผ ์ฒดํฌํ•˜์—ฌ ์ง€์šฐ๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋‚˜ shard๊ฐ€ ๊ณผ๋„ํ•˜๊ฒŒ ์ฆ๊ฐ€ํ•˜์—ฌ, metadata์— ids๋ฅผ ์ €์žฅํ›„ ์ง€์šฐ๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ณ€๊ฒฝํ•˜์˜€์Šต๋‹ˆ๋‹ค. lambda-document-manager์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค. ๋™์ž‘์€ ํŒŒ์ผ ์—…๋ฐ์ดํŠธ์‹œ meta์—์„œ ์ด์ „ document๋“ค์„ ์ฐพ์•„์„œ ์ง€์šฐ๊ณ  ์ƒˆ๋กœ์šด ๋ฌธ์„œ๋ฅผ ์‚ฝ์ž…๋‹ˆ๋‹ค.

def store_document_for_opensearch(docs, key):    
    objectName = (key[key.find(s3_prefix)+len(s3_prefix)+1:len(key)])
    metadata_key = meta_prefix+objectName+'.metadata.json'
    delete_document_if_exist(metadata_key)
    
    try:        
        response = vectorstore.add_documents(docs, bulk_size = 2000)
    except Exception:
        err_msg = traceback.format_exc()
        print('error message: ', err_msg)                
        #raise Exception ("Not able to request to LLM")

    print('uploaded into opensearch')
    
    return response

def delete_document_if_exist(metadata_key):
    try: 
        s3r = boto3.resource("s3")
        bucket = s3r.Bucket(s3_bucket)
        objs = list(bucket.objects.filter(Prefix=metadata_key))
        print('objs: ', objs)
        
        if(len(objs)>0):
            doc = s3r.Object(s3_bucket, metadata_key)
            meta = doc.get()['Body'].read().decode('utf-8')
            print('meta: ', meta)
            
            ids = json.loads(meta)['ids']
            print('ids: ', ids)
            
            result = vectorstore.delete(ids)
            print('result: ', result)        
        else:
            print('no meta file: ', metadata_key)
            
    except Exception:
        err_msg = traceback.format_exc()
        print('error message: ', err_msg)        
        raise Exception ("Not able to create meta file")

OpenSearch Embedding์‹œ bulk_size

์•„๋ž˜๋Š” OpenSearch์—์„œ Embedding์„ ํ• ๋•Œ bulk_size ๊ธฐ๋ณธ๊ฐ’์ธ 500์„ ์‚ฌ์šฉํ• ๋•Œ์˜ ์—๋Ÿฌ์ž…๋‹ˆ๋‹ค. ๋ฌธ์„œ๋ฅผ embeddingํ•˜๊ธฐ ์œ„ํ•ด 1840๋ฒˆ embedding์„ ํ•ด์•ผํ•˜๋Š”๋ฐ, bulk_size๊ฐ€ 500์ด๋ฏ€๋กœ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•˜์˜€์Šต๋‹ˆ๋‹ค.

RuntimeError: The embeddings count, 1840 is more than the [bulk_size], 500. Increase the value of [bulk_size].

bulk_size๋ฅผ 2000์œผ๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

new_vectorstore = OpenSearchVectorSearch(
    index_name=index_name,  
    is_aoss = False,
    #engine="faiss",  # default: nmslib
    embedding_function = bedrock_embeddings,
    opensearch_url = opensearch_url,
    http_auth=(opensearch_account, opensearch_passwd),
)
response = new_vectorstore.add_documents(docs, bulk_size = 2000)

AWS CDK๋กœ ์ธํ”„๋ผ ๊ตฌํ˜„ํ•˜๊ธฐ

CDK ๊ตฌํ˜„ ์ฝ”๋“œ์—์„œ๋Š” Typescript๋กœ ์ธํ”„๋ผ๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ƒ์„ธํžˆ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ง์ ‘ ์‹ค์Šต ํ•ด๋ณด๊ธฐ

์‚ฌ์ „ ์ค€๋น„ ์‚ฌํ•ญ

์ด ์†”๋ฃจ์…˜์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์‚ฌ์ „์— ์•„๋ž˜์™€ ๊ฐ™์€ ์ค€๋น„๊ฐ€ ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

CDK๋ฅผ ์ด์šฉํ•œ ์ธํ”„๋ผ ์„ค์น˜

์ธํ”„๋ผ ์„ค์น˜์— ๋”ฐ๋ผ CDK๋กœ ์ธํ”„๋ผ ์„ค์น˜๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์‹คํ–‰๊ฒฐ๊ณผ

Multi modal ๋ฐ RAG

"Conversation Type"์œผ๋กœ [General Conversation]์„ ์„ ํƒํ•˜๊ณ , dice.png ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

์ดํ›„์— ์ฑ„ํŒ…์ฐฝ ์•„๋ž˜์˜ ํŒŒ์ผ ๋ฒ„ํŠผ์„ ์„ ํƒํ•˜์—ฌ ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

image

fsi_faq_ko.csv์„ ๋‹ค์šด๋กœ๋“œํ•œ ํ›„์— ํŒŒ์ผ ์•„์ด์ฝ˜์„ ์„ ํƒํ•˜์—ฌ ์—…๋กœ๋“œํ•œํ›„, ์ฑ„ํŒ…์ฐฝ์— "๊ฐ„ํŽธ์กฐํšŒ ์„œ๋น„์Šค๋ฅผ ์˜๋ฌธ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‚˜์š”?โ€ ๋ผ๊ณ  ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ๏ผ‚์•„๋‹ˆ์˜คโ€์ž…๋‹ˆ๋‹ค. ์ด๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

image

์ฑ„ํŒ…์ฐฝ์— "์ด์ฒด๋ฅผ ํ• ์ˆ˜ ์—†๋‹ค๊ณ  ๋‚˜์˜ต๋‹ˆ๋‹ค. ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•˜๋‚˜์š”?โ€ ๋ผ๊ณ  ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

์ฑ„ํŒ…์ฐฝ์— "๊ฐ„ํŽธ์กฐํšŒ ์„œ๋น„์Šค๋ฅผ ์˜๋ฌธ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‚˜์š”?โ€ ๋ผ๊ณ  ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. "์˜๋ฌธ๋ฑ…ํ‚น์—์„œ๋Š” ๊ฐ„ํŽธ์กฐํšŒ์„œ๋น„์Šค ์ด์šฉ๋ถˆ๊ฐ€"ํ•˜๋ฏ€๋กœ ์ข€๋” ์ž์„ธํ•œ ์„ค๋ช…์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.

image

์ฑ„ํŒ…์ฐฝ์— "๊ณต๋™์ธ์ฆ์„œ ์ฐฝ๊ตฌ๋ฐœ๊ธ‰ ์„œ๋น„์Šค๋Š” ๋ฌด์—‡์ธ๊ฐ€์š”?"๋ผ๊ณ  ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

Agent ์‚ฌ์šฉํ•˜๊ธฐ

์ฑ„ํŒ…์ฐฝ์—์„œ ๋’ค๋กœ๊ฐ€๊ธฐ ํ•œ ํ›„์— "1-2 Agent"๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์ด "์—ฌํ–‰ ๊ด€๋ จ ๋„์„œ ์ถ”์ฒœํ•ด์ค˜."์™€ ๊ฐ™์ด ์ž…๋ ฅํ•˜๋ฉด ๊ต๋ณด๋ฌธ๊ณ ์˜ API๋ฅผ ์ด์šฉํ•˜์—ฌ "์—ฌํ–‰"๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์„œ๋ฅผ ์กฐํšŒํ•œ ํ›„ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

image

"์„œ์šธ์˜ ์˜ค๋Š˜ ๋‚ ์”จ ์•Œ๋ ค์ค˜"๋ผ๊ณ  ์ž…๋ ฅํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚ ์”จ ์ •๋ณด๋ฅผ ์กฐํšŒํ•˜์—ฌ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

image

LLM์— ์‹œ๊ฐ„์„ ๋ฌผ์–ด๋ณด๋ฉด ๋งˆ์ง€๋ง‰ Training ์‹œ๊ฐ„์ด๋‚˜ ์ „ํ˜€ ๊ด€๋ จ์—†๋Š” Hallucination ๊ฐ’์„ ์ค๋‹ˆ๋‹ค. Agent๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ์— ์•„๋ž˜์™€ ๊ฐ™์ด ํ˜„์žฌ ์‹œ๊ฐ„์„ ์กฐํšŒํ•˜์—ฌ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. "์˜ค๋Š˜ ๋‚ ์งœ ์•Œ๋ ค์ค˜."์™€ "ํ˜„์žฌ ์‹œ๊ฐ„์€?"์„ ์ด์šฉํ•˜์—ฌ ๋™์ž‘์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

์ž˜๋ชป๋œ ์‘๋‹ต ์œ ๋„ํ•ด๋ณด๊ธฐ

"์—”์”จ์˜ Lex ์„œ๋น„์Šค๋Š” ๋ฌด์—‡์ธ์ง€ ์„ค๋ช…ํ•ด์ค˜."์™€ ๊ฐ™์ด ์ž˜๋ชป๋œ ๋‹จ์–ด๋ฅผ ์กฐํ•ฉํ•˜์—ฌ ์งˆ๋ฌธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

image

"Amazon Varco ์„œ๋น„์Šค๋ฅผ Manufactoring์— ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ• ์•Œ๋ ค์ค˜."๋กœ ์งˆ๋ฌธํ•˜๊ณ  ์‘๋‹ต์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

ํ•œ์˜ ๋™์‹œ๊ฒ€์ƒ‰

"Amazon์˜ Athena ์„œ๋น„์Šค์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”."๋กœ ๊ฒ€์ƒ‰ํ• ๋•Œ ํ•œ์˜ ๋™์‹œ ๊ฒ€์ƒ‰์„ ํ•˜๋ฉด ์˜์–ด ๋ฌธ์„œ์—์„œ ๋‹ต๋ณ€์— ํ•„์š”ํ•œ ๊ด€๋ จ๋ฌธ์„œ๋ฅผ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

image

ํ•œ์˜๋™์‹œ ๊ฒ€์ƒ‰์„ ํ•˜์ง€ ์•Š์•˜์„๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋™์ผํ•œ ์งˆ๋ฌธ์ด์ง€๋งŒ, OpenSearch์˜ ๊ฒฐ๊ณผ๋ฅผ ๋งŽ์ด ์ฐธ์กฐํ•˜์—ฌ ์ž˜๋ชป๋œ ๋‹ต๋ณ€์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

image

Prompt Engineering ๊ฒฐ๊ณผ ์˜ˆ์ œ

Translation

"์•„๋งˆ์กด ๋ฒ ๋“œ๋ฝ์„ ์ด์šฉํ•˜์—ฌ ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ํŽธ์•ˆํ•œ ๋Œ€ํ™”๋ฅผ ์ฆ๊ธฐ์‹ค์ˆ˜ ์žˆ์œผ๋ฉฐ, ํŒŒ์ผ์„ ์—…๋กœ๋“œํ•˜๋ฉด ์š”์•ฝ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.โ€๋กœ ์ž…๋ ฅํ•˜๊ณ  ๋ฒˆ์—ญ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

Extracted Topic and sentiment

โ€œ์‹์‚ฌ ๊ฐ€์„ฑ๋น„ ์ข‹์Šต๋‹ˆ๋‹ค. ์œ„์น˜๊ฐ€ ์ข‹๊ณ  ์Šค์นด์ด๋ผ์šด์ง€ ๋ฐ”๋ฒ ํ / ์•ผ๊ฒฝ ์ตœ๊ณฑ๋‹ˆ๋‹ค. ์•„์‰ฌ์› ๋˜ ์  ยท ์ง€ํ•˜์ฃผ์ฐจ์žฅ์ด ๋น„์ข์Šต๋‹ˆ๋‹ค.. ํ˜ธํ…”์•ž ๊ตํ†ต์ด ๋„ˆ๋ฌด ๋ณต์žกํ•ด์„œ ์ฃผ๋ณ€์‹œ์„ค์„ ์ด์šฉํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. / ํ•œ๊ฐ•๋‚˜๊ฐ€๋Š” ๊ธธ / ์ฃผ๋ณ€์‹œ์„ค์— ๋‚˜๊ฐ€๋Š” ๋ฐฉ๋ฒ•๋“ฑ.. ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.โ€๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

Information extraction

โ€œJohn Park. Solutions Architectย |ย WWCS Amazon Web Services Email:ย john@amazon.com Mobile:ย +82-10-1234-5555โ€œ๋กœ ์ž…๋ ฅํ›„์— ์ด๋ฉ”์ผ์ด ์ถ”์ถœ๋˜๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

PII(personally identifiable information) ์‚ญ์ œํ•˜๊ธฐ

PII(Personal Identification Information)์˜ ์‚ญ์ œ์˜ ์˜ˆ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. "John Park, Ph.D. Solutions Architect | WWCS Amazon Web Services Email: john@amazon.com Mobile: +82-10-1234-4567"์™€ ๊ฐ™์ด ์ž…๋ ฅํ•˜์—ฌ name, phone number, address๋ฅผ ์‚ญ์ œํ•œ ํ…์ŠคํŠธ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. ํ”„๋กฌํ”„ํŠธ๋Š” PII๋ฅผ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

image

๋ฌธ์žฅ ์˜ค๋ฅ˜ ๊ณ ์น˜๊ธฐ

"To have a smoth conversation with a chatbot, it is better for usabilities to show responsesess in a stream-like, conversational maner rather than waiting until the complete answer."๋กœ ์˜ค๋ฅ˜๊ฐ€ ์žˆ๋Š” ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.

image

"Chatbot๊ณผ ์›ํ• ํ•œ ๋ฐํ™”๋ฅผ ์œ„ํ•ด์„œ๋Š” ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์—ฅ ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ์™„์ „ํžˆ ์–ป์„ ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ๊ธฐ ๋ณด๋‹ค๋Š” Stream ํ˜•ํƒœ๋กœ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค."๋กœ ์ž…๋ ฅํ›„์— ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

๋ณต์žกํ•œ ์งˆ๋ฌธ (step-by-step)

"I have two pet cats. One of them is missing a leg. The other one has a normal number of legs for a cat to have. In total, how many legs do my cats have?"๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

"๋‚ด ๊ณ ์–‘์ด ๋‘ ๋งˆ๋ฆฌ๊ฐ€ ์žˆ๋‹ค. ๊ทธ์ค‘ ํ•œ ๋งˆ๋ฆฌ๋Š” ๋‹ค๋ฆฌ๊ฐ€ ํ•˜๋‚˜ ์—†๋‹ค. ๋‹ค๋ฅธ ํ•œ ๋งˆ๋ฆฌ๋Š” ๊ณ ์–‘์ด๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ๊ฐ€์ ธ์•ผ ํ•  ๋‹ค๋ฆฌ ์ˆ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ „์ฒด์ ์œผ๋กœ ๋ณด์•˜์„ ๋•Œ, ๋‚ด ๊ณ ์–‘์ด๋“ค์€ ๋‹ค๋ฆฌ๊ฐ€ ๋ช‡ ๊ฐœ๋‚˜ ์žˆ์„๊นŒ?"๋กœ ์งˆ๋ฌธ์„ ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

image

๋‚ ์งœ/์‹œ๊ฐ„ ์ถ”์ถœํ•˜๊ธฐ

๋ฉ”๋‰ด์—์„œ "Timestamp Extraction"์„ ์„ ํƒํ•˜๊ณ , "์ง€๊ธˆ์€ 2023๋…„ 12์›” 5์ผ 18์‹œ 26๋ถ„์ด์•ผ"๋ผ๊ณ  ์ž…๋ ฅํ•˜๋ฉด prompt๋ฅผ ์ด์šฉํ•ด ์•„๋ž˜์ฒ˜๋Ÿผ ์‹œ๊ฐ„์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

noname

์‹ค์ œ ๊ฒฐ๊ณผ ๋ฉ”์‹œ์ง€๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

<result>
<year>2023</year>
<month>12</month>
<day>05</day>
<hour>18</hour>
<minute>26</minute>
</result>

์–ด๋ฆฐ์ด์™€ ๋Œ€ํ™” (Few shot example)

๋Œ€ํ™”์˜ ์ƒ๋Œ€์— ๋งž์ถ”์–ด์„œ ์งˆ๋ฌธ์— ๋‹ต๋ณ€์„ํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผํ…Œ๋ฉด [General Conversation]์—์„œ "์‚ฐํƒ€๊ฐ€ ํฌ๋ฆฌ์Šค๋งˆ์Šค์— ์„ ๋ฌผ์„ ๊ฐ€์ ธ๋‹ค ์ค„๊นŒ?"๋กœ ์งˆ๋ฌธ์„ ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋‹ต๋ณ€ํ•ฉ๋‹ˆ๋‹ค.

image

[9. Child Conversation (few shot)]์œผ๋กœ ์ „ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋™์ผํ•œ ์งˆ๋ฌธ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ƒ๋Œ€์— ๋งž์ถ”์–ด์„œ ์ ์ ˆํ•œ ๋‹ต๋ณ€์„ ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

image

๋ฆฌ์†Œ์Šค ์ •๋ฆฌํ•˜๊ธฐ

๋”์ด์ƒ ์ธํ”„๋ผ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์— ์•„๋ž˜์ฒ˜๋Ÿผ ๋ชจ๋“  ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ญ์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. API Gateway Console๋กœ ์ ‘์†ํ•˜์—ฌ "rest-api-for-stream-chatbot", "ws-api-for-stream-chatbot"์„ ์‚ญ์ œํ•ฉ๋‹ˆ๋‹ค.

  2. Cloud9 console์— ์ ‘์†ํ•˜์—ฌ ์•„๋ž˜์˜ ๋ช…๋ น์–ด๋กœ ์ „์ฒด ์‚ญ์ œ๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.

cdk destroy --all

๊ฒฐ๋ก 

LLM์„ ์‚ฌ์šฉํ•œ Enterprise์šฉ application์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ธฐ์—…์ด ๊ฐ€์ง„ ๋‹ค์–‘ํ•œ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Fine-tuning์ด๋‚˜ RAG๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Fine-tuning์€ ์ผ๋ฐ˜์ ์œผ๋กœ RAG๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ๋‹ค์–‘ํ•œ application์—์„œ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋งŽ์€ ๋น„์šฉ๊ณผ ์‹œํ–‰์ฐฉ์˜ค๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. RAG๋Š” ๋ฐ์ดํ„ฐ์˜ ๋น ๋ฅธ ์—…๋ฐ์ดํŠธ ๋ฐ ๋น„์šฉ๋ฉด์—์„œ ํ™œ์šฉ๋„๊ฐ€ ๋†’์•„์„œ, Fine-tuning๊ณผ RAG๋ฅผ ๋ณ‘ํ–‰ํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” RAG์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋ฆฌ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๊ธฐ์ˆ ์„ ํ†ตํ•ฉํ•˜๊ณ , ์ด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” Korean Chatbot์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ RAG ๊ธฐ์ˆ ๋“ค์„ ํ…Œ์ŠคํŠธํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ์šฉ๋„์— ๋งž๊ฒŒ RAG ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.