๐Ÿ“˜
Lif31up's Blog
  • Welcome! I'm Myeonghwan
  • How to Read the Pages
  • Fundamental Machine Learning
    • Foundational Work of ML: Linear/Logistic Regression
    • Early-stage of AI: Perceptron and ADALINE
    • What is Deep Learning?: Artificial Neural Network to Deep Neural Network
    • Challenges in Training Deep Neural Network and the Latest Solutions
  • Modern AI Systems: An In-depth Guide to Cutting-edge Technologies and Applications
  • Few Shot Learning
    • Overview on Meta Learning
    • Prototypical Networks for Few-shot Learning
    • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
  • Natural Language Process
    • Tokenization and Stemming, Lemmatization, Stop-word Removal: Foundational Works of NLP
    • Attention Mechanism: The Core of Modern AI
  • Front-end Development
    • Overview on Front-end Development
    • Learning React Basic
      • React Component: How They are Rendered and Behave in Browser
      • State and Context: A Key Function to Operate the React Application
      • Design Pattern for Higher React Programming
  • Songwriting
    • A Comprehensive Guide to Creating Memorable Melodies through Motif and Phrasing
  • Sound Engineering
    • How to Install and Load Virtual Studio Instruments
    • A Guide to Audio Signal Chains and Gain Staging
    • Equalizer and Audible Frequency: How to Adjust Tone of the Signal
    • Dynamic Range: the Right Way to Compress your Sample
    • Acoustic Space Perception and Digital Reverberation: A Comprehensive Analysis of Sound Field Simulat
  • Musical Artistry
    • What is Artistry: in Perspective of Modern Pop/Indie Artists
    • Visualizing as Musical Context: Choose Your Aesthetic
    • Analysis on Modern Personal Myth and How to Create Your Own
    • Instagram Management: Business Approach to Your Social Account
  • Art Historiography
    • Importance of Art Historiography: Ugly and Beauty Across Time and Space
    • Post-internet Art, New Aesthetic and Post-Digital Art
    • Brutalism and Brutalist Architecture
Powered by GitBook
On this page
  1. Natural Language Process

Attention Mechanism: The Core of Modern AI

์–ดํ…์…˜(attention)์€ ์—ฐ์† ๋ฐ์ดํ„ฐ ์•ˆ์˜ ๊ฐ ์š”์†Œ๊ฐ€ ๋‹ค๋ฅธ ์š”์†Œ์— ๋Œ€ํ•ด ๊ฐ€์ง€๋Š” ์ค‘์š”๋„๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. RNN ๋“ฑ์˜ ์ธ๊ณต์ง€๋Šฅ ์ฒด๊ณ„์—์„œ hard weight์„ soft weight์œผ๋กœ ๋ณด์กฐํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ฒ˜์Œ ๋“ฑ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. ํ˜„์žฌ๋Š” RNN์„ ์™„์ „ํžˆ ๋Œ€์ฒดํ•˜์—ฌ ํŠธ๋žœ์Šคํฌ๋จธ์Šค(transformers)๋ผ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ตฌ์กฐ๋กœ ๋ฐœ์ „ํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ๋ณ‘๋ ฌํ™”(parallelization): ํ•œ ๋ฒˆ์˜ ์ฒ˜๋ฆฌ ๊ณผ์ •์—์„œ ๋ฌธ์žฅ์˜ ์ „์ฒด์ ์ธ ๋งฅ๋ฝ๊ณผ ์˜๋ฏธ๋ฅผ ๋™์‹œ์— ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

  • ๊ธด ๊ฑฐ๋ฆฌ ์˜์กด์„ฑ(long-range dependencies): ์„œ๋กœ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ํ† ํฐ๋“ค์ด ์˜๋ฏธ์ ์œผ๋กœ ์—ฐ๊ฒฐ๋˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

  • ์ดˆ๊ธฐ์˜ ์–ดํ…์…˜์€ ์—”์ฝ”๋”-๋””์ฝ”๋” ๊ตฌ์กฐ์˜ ๋ณ€ํ˜•์œผ๋กœ ์ ‘๊ทผ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ˜„์žฌ๋Š” ์ด์™€๋Š” ๋…๋ฆฝ๋œ ๊ฐœ๋…์œผ๋กœ ๋ฐœ์ „ํ–ˆ์œผ๋ฉฐ, ์–ดํ…์…˜์„ ๊ณ„์‚ฐํ•˜๋Š” ์ธต์„ ์–ดํ…์…˜ ํ—ค๋“œ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

RNN์€ ์—ฐ์† ๋ฐ์ดํ„ฐ์˜ ์ˆœ์„œ์— ์ง€๋‚˜์น˜๊ฒŒ ์˜์กดํ•˜๋ฉฐ ์ด๋ฅผ ์ƒํƒœ์˜ ๊ฐœ๋…์œผ๋กœ ์ ‘๊ทผํ–ˆ์ง€๋งŒ, ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์ด๋ฅผ ์–ดํ…์…˜์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ด€์ ์œผ๋กœ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.

์–ดํ…์…˜์˜ ์ดˆ๊ธฐ ์—ฐ๊ตฌ๋Š” ํ† ํฐ ๊ฐ„์˜ ๋น„๋Œ€์นญ์  ๊ด€๊ณ„๋ฅผ ๋Œ€์นญ์  ์—ฐ์‚ฐ์œผ๋กœ๋งŒ ๋‹ค๋ค„์•ผ ํ•œ๋‹ค๋Š” ํ•œ๊ณ„๋กœ ์ธํ•ด ํฐ ์–ด๋ ค์›€์„ ๊ฒช์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋Š” ์…€ํ”„ ์–ดํ…์…˜ ๋ชจ๋ธ์˜ ๋„์ž…์œผ๋กœ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Self-Attention Mechanism

์…€ํ”„ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜(self-attention mechanism)์€ "Highly Parallelizable Self-attention" (2016)์—์„œ decomposable attention์ด๋ผ๋Š” ๊ฐœ๋…์œผ๋กœ ์ฒ˜์Œ ์†Œ๊ฐœ๋œ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ๋ณ€ํ˜•์ž…๋‹ˆ๋‹ค. ์ดํ›„ 1๋…„ ๋’ค์— ๋“ฑ์žฅํ•œ ํ˜์‹ ์ ์ธ ๋”ฅ๋Ÿฌ๋‹ ๊ตฌ์กฐ์ธ ํŠธ๋žœ์Šคํฌ๋จธ์Šค์˜ ํ•ต์‹ฌ ์š”์†Œ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Formulation

์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์ฟผ๋ฆฌ์™€ ํ‚ค ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ๋ถ„์„ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ์ด ์„ค๊ณ„๋Š” ์ฟผ๋ฆฌ์™€ ํ‚ค ๋ฒกํ„ฐ๊ฐ€ ๋น„๋Œ€์นญ์ ์ด๊ณ  ๊ด€๊ณ„์ ์ธ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ๋„, ํ† ํฐ ๋ฒกํ„ฐ, ํ‚ค ํ–‰๋ ฌ, ์ฟผ๋ฆฌ ํ–‰๋ ฌ ๊ฐ„์˜ ๋Œ€์นญ์  ์—ฐ์‚ฐ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.


  • ์ž…๋ ฅ XXX๋Š” ์—ฐ์†๋œ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹คโ€”์–ดํ…์…˜ ๋ชจ๋ธ์€ ์ด ๋ฐ์ดํ„ฐ์—์„œ ์š”์†Œ๋“ค ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • ์…€ํ”„ ์–ดํ…์…˜์€ ๋‹ค์Œ ์„ธ ๊ฐ€์ง€ ์—ฐ์ง ๊ฐ€์ค‘์น˜(soft weight)๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

    • ์ฟผ๋ฆฌ์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜ WQW^QWQ๋Š” "๋ฌด์—‡์„ ์ฐพ์•„์•ผ ํ•˜๋‚˜"์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

    • ํ‚ค์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜ WKW^KWK๋Š” "๋ฌด์—‡์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๊ฐ€"์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

    • ๊ฐ’์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜ WVW^VWV๋Š” ์‹ค์ œ ์ถœ๋ ฅ๊ฐ’์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜์ž…๋‹ˆ๋‹ค.

๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ WQW^QWQ, WKW^KWK, WVW^VWV๋Š” ๋ชจ๋“  ์—ฐ์† ๋ฐ์ดํ„ฐ์— ๋™์ผํ•œ ๊ทœ๊ฒฉ์œผ๋กœ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ด๋“ค์˜ ์ฐจ์›์ด ์ž„๋ฒ ๋”ฉ ์ฐจ์› ๋ฐ ํ—ค๋“œ ์ฐจ์›๊ณผ ํ˜ธํ™˜๋˜๋Š” ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.


  1. ์…€ํ”„ ์–ดํ…์…˜ ๊ณ„์ธต์€ ์ž…๋ ฅ XXX์™€ ๊ฐ ์—ฐ์„ฑ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณฑํ•˜์—ฌ ๋‹ค์Œ ์„ธ ๊ฐ€์ง€ ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค:

    1. Q=Xโ‹…WQQ = X \cdot W^QQ=Xโ‹…WQ

    2. K=Xโ‹…WKK = X \cdot W^KK=Xโ‹…WK

    3. V=Xโ‹…WVV = X \cdot W^VV=Xโ‹…WV

  2. ์–ดํ…์…˜ ์›์ ์ˆ˜(raw attention score)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค: rawย attentionย scorei,j=Qiโ‹…Kjtranspose\text{raw attention score}{i,j} = Q_i \cdot K{j}^{\text{transpose}}rawย attentionย scorei,j=Qiโ€‹โ‹…Kjtranspose

  3. ๊ทธ๋ผ๋””์–ธํŠธ ํญ์ฃผ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์ฟผ๋ฆฌ/ํ‚ค์˜ ์ฐจ์› dk=embeddingย sizenumberย ofย attentionย headd_k = \frac{\text{embedding size}}{\text{number of attention head}}dkโ€‹=numberย ofย attentionย headembeddingย sizeโ€‹๋กœ ์Šค์ผ€์ผ ๋‹ค์šด์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค: S=QKtransposedkS = \frac{QK^{\text{transpose}}}{\sqrt{d_k}}S=dkโ€‹โ€‹QKtransposeโ€‹

  4. ์†Œํ”„ํŠธ๋งฅ์Šค ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์—ฌ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค: A=softmax(S=QKtransposedk)A = \text{softmax}(S = \frac{QK^{\text{transpose}}}{\sqrt{d_k}})A=softmax(S=dkโ€‹โ€‹QKtransposeโ€‹)

  5. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฐ’ VVV์™€ ๊ณฑํ•˜์—ฌ ์ตœ์ข… ์…€ํ”„ ์–ดํ…์…˜ ์ ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค: selfย attentionย score/ouput(x)=Aโ‹…V\text{self attention score/ouput}(x) = A \cdot Vselfย attentionย score/ouput(x)=Aโ‹…V


PreviousTokenization and Stemming, Lemmatization, Stop-word Removal: Foundational Works of NLPNextOverview on Front-end Development

Last updated 1 day ago