์ตœ๊ทผ Kakao์—์„œ Khaiii๋ฅผ ๋ฐœํ‘œํ•ด์„œ ์กฐ๊ธˆ ์ธ๊ธฐ๊ฐ€ ์•ฝํ•ด์ง€๊ธด ํ–ˆ์ง€๋งŒ, ์—ฌ์ „ํžˆ Khaiii์™€ ๋งž๋จน๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” mecab์„ค์น˜๋ฅผ ์ง„ํ–‰ํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

1. ์šฐ์„  mecab ํ™ˆํŽ˜์ด์ง€์—์„œ ์ตœ์‹  ๋ฒ„์ „์„ ๋‹ค์šด๋กœ๋“œ ํ•ฉ๋‹ˆ๋‹ค. 

    ํ˜„์žฌ๋Š” mecab-0.996-ko-0.9.2.tar.gz ์ด ํŒŒ์ผ์ด ์ตœ์‹ ์ด๋„ค์š”.

 

https://bitbucket.org/eunjeon/mecab-ko/downloads/

 

eunjeon / mecab-ko / Downloads — Bitbucket

 

bitbucket.org

2. ํŒŒ์ผ์„ ๋‹ค์šด๋ฐ›์•„์„œ ํŠน์ • ๋””๋ ‰ํ† ๋ฆฌ์— ์ €์žฅ ํ•˜๊ณ  ์••์ถ•์„ ํ’€์–ด ์ค๋‹ˆ๋‹ค.

tar -zxvf mecab-*-ko-*.tar.gz

3. ํ•ด๋‹น ๋””๋ ‰ํ† ๋ฆฌ๋กœ ์ด๋™ํ•˜์—ฌ configure/make/make install ์„ ์ง„ํ–‰ ํ•ฉ๋‹ˆ๋‹ค.

cd mecab-0.996-ko-0.9.2
./configure
make
make check
sudo make install

4. ์„ค์น˜ ๋ฒ„์ „์„ ํ™•์ธํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒ ํ•ฉ๋‹ˆ๋‹ค.

mecab --version

5. ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋กœ๋”ฉํ•ด์ฃผ๊ณ  ๋‹ค์‹œ ๋ฒ„์ „์„ ํ™•์ธํ•˜๋ฉด ์ •์ƒ์ ์œผ๋กœ ํ‘œ์‹œ ๋ฉ๋‹ˆ๋‹ค.

sudo ldconfig
mecab --version

** ์ด ๋ฐฉ๋ฒ• ์™ธ์— ์•„๋ž˜ ๋ช…๋ น์–ด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ•œ๋ฒˆ์— ์„ค์น˜ ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

pip install python-mecab-ko

 

6. ๋‹ค์Œ์€ ํ•œ๊ตญ์–ด ์‚ฌ์ „์„ ์„ค์น˜ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์šฐ์„  ์‚ฌ์ „์˜ ์ตœ์‹  ๋ฒ„์ „์„ ๋‹ค์šด๋ฐ›์•„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์••์ถ•์„ ํ’€์–ด ์ค๋‹ˆ๋‹ค.

ํ˜„์žฌ์˜ ์ตœ์‹  ๋ฒ„์ „์€ mecab-ko-dic-2.1.1-20180720.tar.gz ์ด๊ตฐ์š”.

https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/

tar -zxvf mecab-ko-dic-2.1.1-20180720.tar.gz
 

eunjeon / mecab-ko-dic / Downloads — Bitbucket

 

bitbucket.org

 

7. ์••์ถ•์„ ํ‘ผ ๋””๋ ‰ํ† ๋ฆฌ๋กœ ์ด๋™ํ•˜์—ฌ make๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

cd mecab-ko-dic-2.1.1-20180720
./configure
make
sudo make install

8. ํ…Œ์ŠคํŠธ๋ฅผ ํ•ด๋ด…๋‹ˆ๋‹ค. Khaiii์ฒ˜๋Ÿผ ์•„๋ž˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•œ ํ›„ ๋ถ„์„ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

mecab -d /usr/local/lib/mecab/dic/mecab-ko-dic

์„ค์น˜๊ฐ€ ์ž˜ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

9. Python์—์„œ ์ž˜ ๋˜๋Š”์ง€ ํ…Œ์ŠคํŠธ ํ•ด ๋ด…๋‹ˆ๋‹ค.

import mecab
mecab = mecab.MeCab()

mecab.morphs('์˜๋“ฑํฌ๊ตฌ์ฒญ์—ญ์— ์žˆ๋Š” ๋ง›์ง‘ ์ข€ ์•Œ๋ ค์ฃผ์„ธ์š”.')
# ['์˜๋“ฑํฌ๊ตฌ์ฒญ์—ญ', '์—', '์žˆ', '๋Š”', '๋ง›์ง‘', '์ข€', '์•Œ๋ ค', '์ฃผ', '์„ธ์š”', '.']

 

์ด์ƒ์œผ๋กœ ๋งˆ์น˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ตœ๊ทผ์—...๊ตฌ๊ธ€์—์„œ ๋ฐœํ‘œํ•œ BERT์—ดํ’์ด ์žฅ๋‚œ์ด ์•„๋‹™๋‹ˆ๋‹ค.

๊ฒŒ๋‹ค๊ฐ€ LG CNS์—์„œ SQuaD ๋ฐ์ดํ„ฐ์…‹์„ ํ•œ๊ธ€๋กœ ๋Œ€์ฒด ํ•  ์ˆ˜ ์žˆ๋Š” KorQuAD ๋ฐ Leader Board๋ฅผ ์ œ๊ณตํ•ด์ฃผ์–ด์„œ,

๊ตญ๋‚ด ์œ ์ˆ˜์˜ AI ๊ฐœ๋ฐœํŒ€์ด ๋งŽ์ด ์ฐธ์—ฌ๋ฅผ ํ–ˆ๋˜ ์—ดํ’์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

 

BERT์— ๋Œ€ํ•ด ๊ถ๊ธˆํ•˜์‹œ๋ฉด ์•„๋ž˜ ๊ธฐ์‚ฌ๋ฅผ ์ฝ์–ด๋ณด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

http://www.aitimes.kr/news/articleView.html?idxno=13117

 

์ „, ๋ฆฌ๋”๋ณด๋“œ์— 3๋“ฑ๊นŒ์ง€๋ฐ–์— ์—†๋˜ ์‹œ์ ˆ์— BERT๋ฅผ ์ด์šฉํ•ด์„œ 4๋“ฑ์ด๋ผ๋„ ํ•ด๋ณด์ž ๋ผ๊ณ  ํ–ˆ๋‹ค๊ฐ€...

ํ”„๋กœ์ ํŠธ๋•Œ๋ฌธ์—, ๋ฏธ๋ฃจ๋‹ค, ์ง€๊ธˆ์€ ๊ฐ„์‹ ํžˆ 22๋“ฑ์— ๋จธ๋ฌผ๋ €๋„ค์š”.

https://korquad.github.io/

 

KorQuAD

desktop_mac Getting Started KorQuAD๋Š” ํ•œ๊ตญ์–ด Machine Reading Comprehension์„ ์œ„ํ•ด ๋งŒ๋“  dataset์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  ์งˆ์˜์— ๋Œ€ํ•œ ๋‹ต๋ณ€์€ ํ•ด๋‹น Wikipedia ์•„ํ‹ฐํด ๋ฌธ๋‹จ์˜ ์ผ๋ถ€ ํ•˜์œ„ ์˜์—ญ์œผ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. Stanford Question Answering Dataset(SQuAD) v1.0๊ณผ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ๊ตฌ์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ „์ฒด ๋ฐ์ดํ„ฐ๋Š” 1,560 ๊ฐœ์˜ Wikipedia article์— ๋Œ€ํ•ด 10,645

korquad.github.io

(๊ธ€์„ ์“ฐ๋Š” ๋„์ค‘์— ๋ณด๋‹ˆ 23๋“ฑ์œผ๋กœ ๋ฐ€๋ ธ๋„ค์š”...ใ… ใ… )

 

์ด๋ฒˆ์—, ์ถ”๊ฐ€์ ์œผ๋กœ ํ•ด๋ณผ ์ˆ˜ ์žˆ๋Š” ์•„์ด๋””์–ด๋ฅผ ์–ป์–ด, KorQuAD์˜ ๋ฆฌ๋”๋ณด๋“œ ์ˆœ์œ„๋ฅผ ๋†’์—ฌ๋ณด๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ์ค‘ ํ•˜๋‚˜์˜ ๋ฐฉ๋ฒ•์ด default๋กœ ์„ค์ •๋˜์–ด ์žˆ๋Š” BERT์˜ multilingual tokenizer๋ฅผ ๋Œ€์ฒด ํ•˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

ํ•œ๊ตญ์–ด์˜ ๊ธฐ์กด ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋Š” konlpy, mecab๋“ฑ ์—ฌ๋Ÿฌ๊ฐ€์ง€๊ฐ€ ์žˆ๋Š”๋ฐ ์นด์นด์˜ค์—์„œ Khaiii๋ผ๋Š” ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค.

Khaiii๋Š” ๋”ฅ๋Ÿฌ๋‹(CNN)๊ธฐ๋ฐ˜์˜ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ์ธ๋ฐ, ๊ธฐ์กด์— ์„ฑ๋Šฅ์ด ๋†’๋˜ Mecab๊ณผ ๋™์ผํ•˜๊ฑฐ๋‚˜ ์ข€ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋•Œ๋ฌธ์—, ์šฐ์„  Khaiii๋ฅผ ์„ค์น˜ํ•˜๋Š” ๋ฒ•์„ ๊ธฐ๋กํ•˜๋ฉฐ ์ง„ํ–‰ํ•˜๋ ค๊ณ  ํ•˜๋Š”๋ฐ, 

์ง€๊ธˆ ํ™˜๊ฒฝ์ด Linux์ด๋‹ค ๋ณด๋‹ˆ, Linux ํ™˜๊ฒฝ ์œ„์ฃผ๋กœ ์ง„ํ–‰์„ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

์šฐ์„  ๊ธฐ๋ณธ ๋นŒ๋“œ ํ™˜๊ฒฝ์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Ubuntu(Mint Linux) 18.04

Python 3.6

gcc 7.xx

 

1. Khaii git ์ €์žฅ์†Œ๋ฅผ clone ํ•ฉ๋‹ˆ๋‹ค.

git clone https://github.com/kakao/khaiii.git

2. build๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ด๋™ ํ•ฉ๋‹ˆ๋‹ค.

cd khaiii
mkdir build
cd build

3. Cmake๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. (์ปคํ”ผํ•œ์ž” ํƒ€ ๊ฐ€์ง€๊ณ  ์˜ต๋‹ˆ๋‹ค.)

cmake ..

4. Make๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. (์ปคํ”ผ๋ฅผ ํ•œ์ž” ๋” ํƒ€ ๊ฐ€์ง€๊ณ  ์˜ต๋‹ˆ๋‹ค.)

make all

 ์„ฑ๊ณต์ ์œผ๋กœ ๋นŒ๋“œ๊ฐ€ ๋˜๋ฉด build ๋””๋ ‰ํ„ฐ๋ฆฌ ์•„๋ž˜์— ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŒŒ์ผ์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.

  • bin: ๋””๋ ‰ํ„ฐ๋ฆฌ
    • khaiii: ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ
  • lib: ๋””๋ ‰ํ„ฐ๋ฆฌ
    • libkhaiii.so: shared ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (๋งฅ OS์˜ ๊ฒฝ์šฐ libkhaiii.dylib)
    • libkhaiii.so.X
    • libkhaiii.so.X.Y
  • test: ๋””๋ ‰ํ„ฐ๋ฆฌ
    • khaiii: ํ…Œ์ŠคํŠธ ํ”„๋กœ๊ทธ๋žจ

5. make resource : bin ๋””๋ ‰ํ† ๋ฆฌ์— ์ƒ์„ฑ๋œ khaiiiํŒŒ์ผ์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•ด๋‹น ํ”„๋กœ๊ทธ๋žจ์—์„œ ์‚ฌ์šฉํ•˜๋Š” 

                                      ๋ฆฌ์†Œ์Šค ๋“ค์„ ๋นŒ๋“œ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

make resource

6. ํ…Œ์ŠคํŠธ๋ฅผ ํ•ด ๋ด…๋‹ˆ๋‹ค.

./bin/khaiii --rsc-dir=./share/khaiii

        ๋ช…๋ น์–ด๋ฅผ ์น˜๋ฉด "...PoS tagger opened"๋ผ๋Š” ๋ฉ”์‹œ์ง€ ๋‹ค์Œ์— ์•„๋ฌด ๋ณ€ํ™”๊ฐ€ ์—†๋Š”๋ฐ, 

       ์—ฌ๊ธฐ์— ํ…Œ์ŠคํŠธ ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•˜๊ณ  ์—”ํ„ฐ๋ฅผ ์น˜๋ฉด ํ˜•ํƒœ์†Œ ๋ถ„์„์ด ๋œ ๊ฒฐ๊ณผ๊ฐ€ ์ถœ๋ ฅ์ด ๋ฉ๋‹ˆ๋‹ค.

7. ์ •์ƒ ์ž‘๋™์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

ctest

7. python๊ณผ ์—ฐ๋™์„ ์œ„ํ•ด make๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. (python binding)

make package_python

8. pip install ์‹คํ–‰

cd package_python
pip install  . 

9.  Python ๋ฐ”์ธ๋”ฉ ํ…Œ์ŠคํŠธ

      ๋งˆ์ง€๋ง‰์œผ๋กœ, python ๋ฐ”์ธ๋”ฉ์ด ์ž˜ ๋˜์–ด ์žˆ๋Š”์ง€ ํ…Œ์ŠคํŠธ ํ•ฉ๋‹ˆ๋‹ค.

      ์•„๋ž˜ ์†Œ์Šค๋ฅผ python์œผ๋กœ ์‹คํ–‰์‹œ์ผœ๋ด…๋‹ˆ๋‹ค.

      ** ์—ฌ๊ธฐ์„œ ์ฃผ์˜ํ•  ์ ์€ ํ˜„์žฌ์˜ ๋””๋ ‰ํ† ๋ฆฌ(./package_python)์„ ๋ฒ—์–ด๋‚˜์„œ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

           ํ˜„์žฌ์˜ ๋””๋ ‰ํ† ๋ฆฌ์—์„œ ์‹คํ–‰ํ•˜๋ฉด ํ˜„์žฌ์˜ buildํŒŒ์ผ์„ ์‹คํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์˜์กด์„ฑ ๋ฌธ์ œ๊ฐ€

           ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

from khaiii import KhaiiiApi
api = KhaiiiApi()
for word in api.analyze('์•ˆ๋…•, ์„ธ์ƒ.'):
    print(word)

์œ„ ์ด๋ฏธ์ง€ ์ฒ˜๋Ÿผ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ค๋ฉด ์ž˜ ๋˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

 

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

Anaconda ์„ค์น˜ ์ž์ฒด๋Š” ๊ทธ๋ฆฌ ์–ด๋ ค์šด ์ ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋žตํ•ฉ๋‹ˆ๋‹ค.

์‹ค๋ฌด์—์„œ ์ œ๋ฒ• ์œ ์šฉํ–ˆ๋˜ ํŒ์„ ๊ณต์œ  ํ•ฉ๋‹ˆ๋‹ค.


1. ํŒจํ‚ค์ง€ ๊ด€๋ฆฌ

      ๊ฐ€. ํŒจํ‚ค์ง€ ์„ค์น˜

              > conda install numpy scipy pandas

    ๋‚˜. ํŒจํ‚ค์ง€ ์ œ๊ฑฐ

          > conda remove ํŒจํ‚ค์ง€๋ช…

    ๋‹ค. ํŒจํ‚ค์ง€ Update

          > conda update ํŒจํ‚ค์ง€๋ช…

    ๋ผ. ๋ชจ๋“ ํŒจํ‚ค์ง€ Update

          > conda update --all

    ๋งˆ. ์„ค์น˜๋œ ํŒจํ‚ค์ง€ ํ™•์ธ

          > conda list

    ๋ฐ”. ํŒจํ‚ค์ง€ ์ด๋ฆ„์„ ์ž˜ ๋ชจ๋ฅผ๋•Œ ๊ฒ€์ƒ‰

          > conda search *beautiful*

                  



2. ํ™˜๊ฒฝ ๊ด€๋ฆฌ

    ๊ฐ€. ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ

          > conda create -n [๊ฐ€์ƒํ™˜๊ฒฝ๋ช…] [๊ธฐ๋ณธ์„ค์น˜ํ•  ํŒจํ‚ค์ง€] 

             '๊ธฐ๋ณธ์„ค์น˜ํ•  ํŒจํ‚ค์ง€'๋Š” ์—ฌ๋Ÿฌ ํŒจํ‚ค์ง€๋ฅผ ํ•œ๊บผ๋ฒˆ์— ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

    ๋‚˜. ํŒจํ‚ค์ง€์˜ ํŠน์ • ๋ฒ„์ „์„ ์„ค์น˜ํ•˜์—ฌ ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ

          > conda create -n my_env python=3.6

    ๋‹ค. ๊ฐ€์ƒํ™˜๊ฒฝ ํ™œ์„ฑํ™”

          > source activate [๊ฐ€์ƒํ™˜๊ฒฝ๋ช…]

    ๋ผ. ๊ฐ€์ƒํ™˜๊ฒฝ ๋น„ํ™œ์„ฑํ™”

          > source deactivate


3. ๊ฐ€์ƒํ™˜๊ฒฝ ์ €์žฅ ๋ฐ ๊ณต์œ 

     ๊ฐ€. ๊ฐ€์ƒํ™˜๊ฒฝ export

           > conda env export > [ํŒŒ์ผ์ด๋ฆ„].yaml

              ์ด ํŒŒ์ผ์„ ๋ฐฑ์—…ํ•˜๊ฑฐ๋‚˜ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์—๊ฒŒ ๊ฐ™์€ ํ™˜๊ฒฝ์„ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋‹ค.

     ๋‚˜. ๊ณต์œ ๋ฐ›์€ yamlํŒŒ์ผ๋กœ๋ถ€ํ„ฐ ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ

           > conda env create -f [ํŒŒ์ผ์ด๋ฆ„].yaml

       ๋‹ค. ๊ฐ€์ƒํ™˜๊ฒฝ ๋ชฉ๋ก ํ™•์ธ

           > conda env list

                          

                  ํ˜„์žฌ ํ™œ์„ฑํ™” ๋˜์–ด ์žˆ๋Š” ๊ฐ€์ƒํ™˜๊ฒฝ๋ช… ์•ž์—๋Š” * ํ‘œ์‹œ๊ฐ€ ๋ถ™๋Š”๋‹ค.

     ๋ผ. ๊ฐ€์ƒํ™˜๊ฒฝ ์ œ๊ฑฐํ•˜๊ธฐ

            > conda env remove -n [๊ฐ€์ƒํ™˜๊ฒฝ๋ช…]

              





# Download the dataset. It's small, only about 6 MB.
if not os.path.exists('./ml-1m'):
url = 'http://files.grouplens.org/datasets/movielens/ml-1m.zip'
response = requests.get(url, stream=True)
total_length = response.headers.get('content-length')
bar = tqdm.tqdm_notebook(total=int(total_length))
with open('./ml-1m.zip', 'wb') as f:
for data in response.iter_content(chunk_size=4096):
f.write(data)
bar.update(4096)
zip_ref = zipfile.ZipFile('./ml-1m.zip', 'r')
zip_ref.extractall('.')
zip_ref.close()


+ Recent posts