πŸ“˜
Lif31up's Blog
  • Welcome! I'm Myeonghwan
  • How to Read the Pages
  • Fundamental Machine Learning
    • Foundational Work of ML: Linear/Logistic Regression
    • Early-stage of AI: Perceptron and ADALINE
    • What is Deep Learning?: Artificial Neural Network to Deep Neural Network
    • Challenges in Training Deep Neural Network and the Latest Solutions
  • Modern AI Systems: An In-depth Guide to Cutting-edge Technologies and Applications
  • Few Shot Learning
    • Overview on Meta Learning
    • Prototypical Networks for Few-shot Learning
    • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
  • Natural Language Process
    • * Tokenization and Stemming, Lemmatization, Stop-word Removal: Core Fundamental of NLP
    • Bag-of-Words
  • Front-end Development
    • Overview on Front-end Development
    • Learning React Basic
      • React Component: How They are Rendered and Behave in Browser
      • State and Context: A Key Function to Operate the React Application
      • Design Pattern for Higher React Programming
  • Songwriting
    • A Comprehensive Guide to Creating Memorable Melodies through Motif and Phrasing
  • Sound Engineering
    • How to Install and Load Virtual Studio Instruments
    • A Guide to Audio Signal Chains and Gain Staging
    • Equalizer and Audible Frequency: How to Adjust Tone of the Signal
    • Dynamic Range: the Right Way to Compress your Sample
    • Acoustic Space Perception and Digital Reverberation: A Comprehensive Analysis of Sound Field Simulat
  • Musical Artistry
    • What is Artistry: in Perspective of Modern Pop/Indie Artists
    • Visualizing as Musical Context: Choose Your Aesthetic
    • Analysis on Modern Personal Myth and How to Create Your Own
    • Instagram Management: Business Approach to Your Social Account
  • Art Historiography
    • Importance of Art Historiography: Ugly and Beauty Across Time and Space
    • Post-internet Art, New Aesthetic and Post-Digital Art
    • Brutalism and Brutalist Architecture
Powered by GitBook
On this page
  1. Natural Language Process

Bag-of-Words

Previous* Tokenization and Stemming, Lemmatization, Stop-word Removal: Core Fundamental of NLPNextOverview on Front-end Development

Last updated 3 months ago

단어 κ°€λ°© λͺ¨λΈ(Bag-of-Words Model)은 μžμ—°μ–΄ 처리(NLP)μ—μ„œ 널리 μ“°μ΄λŠ” κ°„λ‹¨ν•˜κ³  직관적인 λ‘œμ§μž…λ‹ˆλ‹€. 글씨 자료λ₯Ό μˆ«μžν˜•μœΌλ‘œ λŒ€ν‘œν•˜κΈ° μœ„ν•΄ μ“°μž…λ‹ˆλ‹€. μ΄λŠ” λ¬Έλ§₯ λΆ„λ₯˜, 뢄석, 정보 νšŒμˆ˜μ— 자주 μ“°μ΄λŠ” μž‘μ—…μž…λ‹ˆλ‹€.

  • 단어 κ°€λ°© λͺ¨λΈμ€ 문법, μ„œμˆœ, λ¬Έλ§₯을 λ¬΄μ‹œν•˜κ³  λ‹¨μ–΄μ˜ κ°€λ°©μœΌλ‘œμ¨ 글씨λ₯Ό λŒ€ν‘œν•©λ‹ˆλ‹€.

  • 각 λ¬Έμ„œλŠ” λΉˆλ„μˆ˜λ‚˜ 단어 카운트의 λ²‘ν„°λ‘œ λ³€ν™˜λ©λ‹ˆλ‹€.

  • κ΅¬ν˜„μ΄ κ°„λ‹¨ν•˜κ³  μ‰½μŠ΅λ‹ˆλ‹€. 그리고 μž‘κ±°λ‚˜ 쀑간 크기의 데이터셋에 무리 없이 잘 μž‘λ™ν•©λ‹ˆλ‹€.

  • 문법 λΆ„λ₯˜λ‚˜ 의미 뢄석에 νš¨κ³Όμ μž…λ‹ˆλ‹€.

κ΅¬ν˜„μ΄ κ°„λ‹¨ν•˜κ³  μ‰½μŠ΅λ‹ˆλ‹€. 그리고 μž‘κ±°λ‚˜ 쀑간 크기의 데이터셋에 무리 없이 잘 μž‘λ™ν•©λ‹ˆλ‹€.

How Does the Bag-of-Words Model Works

  • 토큰화(tokenization)λŠ” ν•œ κΈ€μžλ₯Ό λ‹¨μ–΄μ˜ ν˜•νƒœλ‘œ λΆ„λ¦¬ν•˜λŠ” 것을 λ§ν•©λ‹ˆλ‹€.

  • 단어μž₯ 생성(vocabulary/dictionary creation)λŠ” λͺ¨λ“  κ°œλ³„μ  단어λ₯Ό 말 λ­‰μΉ˜λ‘œ λ§Œλ“œλŠ” 사전을 λ§Œλ“­λ‹ˆλ‹€.

  • 벑터화(vectorization)λŠ” 단어μž₯에 μžˆλŠ” 각 단어에 ν•΄λ‹Ή ν•˜λŠ” 차원에 μ•Œλ§žμ€ 값을 μ€λ‹ˆλ‹€.

    • μ΄λ•Œ, μ‘΄μž¬μ— κ΄€ν•΄μ„œ 값을 μ£ΌλŠ” 것을 쑴재 벑터화(presence vectorization), λ‹¨μ–΄μ˜ λΉˆλ„μˆ˜μ— λ§žμ€ 카운트 값을 μ£ΌλŠ” 것을 단어 λΉˆλ„μˆ˜(word’s frequency)μž…λ‹ˆλ‹€.

μ œκ°€ 직접 μž‘μ„±ν•œ λ₯Ό ν™•μΈν•˜μ„Έμš”!

κ΅¬ν˜„ μ½”λ“œ