์ตœ๊ทผ์—...๊ตฌ๊ธ€์—์„œ ๋ฐœํ‘œํ•œ BERT์—ดํ’์ด ์žฅ๋‚œ์ด ์•„๋‹™๋‹ˆ๋‹ค.

๊ฒŒ๋‹ค๊ฐ€ LG CNS์—์„œ SQuaD ๋ฐ์ดํ„ฐ์…‹์„ ํ•œ๊ธ€๋กœ ๋Œ€์ฒด ํ•  ์ˆ˜ ์žˆ๋Š” KorQuAD ๋ฐ Leader Board๋ฅผ ์ œ๊ณตํ•ด์ฃผ์–ด์„œ,

๊ตญ๋‚ด ์œ ์ˆ˜์˜ AI ๊ฐœ๋ฐœํŒ€์ด ๋งŽ์ด ์ฐธ์—ฌ๋ฅผ ํ–ˆ๋˜ ์—ดํ’์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

 

BERT์— ๋Œ€ํ•ด ๊ถ๊ธˆํ•˜์‹œ๋ฉด ์•„๋ž˜ ๊ธฐ์‚ฌ๋ฅผ ์ฝ์–ด๋ณด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

http://www.aitimes.kr/news/articleView.html?idxno=13117

 

์ „, ๋ฆฌ๋”๋ณด๋“œ์— 3๋“ฑ๊นŒ์ง€๋ฐ–์— ์—†๋˜ ์‹œ์ ˆ์— BERT๋ฅผ ์ด์šฉํ•ด์„œ 4๋“ฑ์ด๋ผ๋„ ํ•ด๋ณด์ž ๋ผ๊ณ  ํ–ˆ๋‹ค๊ฐ€...

ํ”„๋กœ์ ํŠธ๋•Œ๋ฌธ์—, ๋ฏธ๋ฃจ๋‹ค, ์ง€๊ธˆ์€ ๊ฐ„์‹ ํžˆ 22๋“ฑ์— ๋จธ๋ฌผ๋ €๋„ค์š”.

https://korquad.github.io/

 

KorQuAD

desktop_mac Getting Started KorQuAD๋Š” ํ•œ๊ตญ์–ด Machine Reading Comprehension์„ ์œ„ํ•ด ๋งŒ๋“  dataset์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  ์งˆ์˜์— ๋Œ€ํ•œ ๋‹ต๋ณ€์€ ํ•ด๋‹น Wikipedia ์•„ํ‹ฐํด ๋ฌธ๋‹จ์˜ ์ผ๋ถ€ ํ•˜์œ„ ์˜์—ญ์œผ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. Stanford Question Answering Dataset(SQuAD) v1.0๊ณผ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ๊ตฌ์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ „์ฒด ๋ฐ์ดํ„ฐ๋Š” 1,560 ๊ฐœ์˜ Wikipedia article์— ๋Œ€ํ•ด 10,645

korquad.github.io

(๊ธ€์„ ์“ฐ๋Š” ๋„์ค‘์— ๋ณด๋‹ˆ 23๋“ฑ์œผ๋กœ ๋ฐ€๋ ธ๋„ค์š”...ใ… ใ… )

 

์ด๋ฒˆ์—, ์ถ”๊ฐ€์ ์œผ๋กœ ํ•ด๋ณผ ์ˆ˜ ์žˆ๋Š” ์•„์ด๋””์–ด๋ฅผ ์–ป์–ด, KorQuAD์˜ ๋ฆฌ๋”๋ณด๋“œ ์ˆœ์œ„๋ฅผ ๋†’์—ฌ๋ณด๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ์ค‘ ํ•˜๋‚˜์˜ ๋ฐฉ๋ฒ•์ด default๋กœ ์„ค์ •๋˜์–ด ์žˆ๋Š” BERT์˜ multilingual tokenizer๋ฅผ ๋Œ€์ฒด ํ•˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

ํ•œ๊ตญ์–ด์˜ ๊ธฐ์กด ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋Š” konlpy, mecab๋“ฑ ์—ฌ๋Ÿฌ๊ฐ€์ง€๊ฐ€ ์žˆ๋Š”๋ฐ ์นด์นด์˜ค์—์„œ Khaiii๋ผ๋Š” ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค.

Khaiii๋Š” ๋”ฅ๋Ÿฌ๋‹(CNN)๊ธฐ๋ฐ˜์˜ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ์ธ๋ฐ, ๊ธฐ์กด์— ์„ฑ๋Šฅ์ด ๋†’๋˜ Mecab๊ณผ ๋™์ผํ•˜๊ฑฐ๋‚˜ ์ข€ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋•Œ๋ฌธ์—, ์šฐ์„  Khaiii๋ฅผ ์„ค์น˜ํ•˜๋Š” ๋ฒ•์„ ๊ธฐ๋กํ•˜๋ฉฐ ์ง„ํ–‰ํ•˜๋ ค๊ณ  ํ•˜๋Š”๋ฐ, 

์ง€๊ธˆ ํ™˜๊ฒฝ์ด Linux์ด๋‹ค ๋ณด๋‹ˆ, Linux ํ™˜๊ฒฝ ์œ„์ฃผ๋กœ ์ง„ํ–‰์„ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

์šฐ์„  ๊ธฐ๋ณธ ๋นŒ๋“œ ํ™˜๊ฒฝ์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Ubuntu(Mint Linux) 18.04

Python 3.6

gcc 7.xx

 

1. Khaii git ์ €์žฅ์†Œ๋ฅผ clone ํ•ฉ๋‹ˆ๋‹ค.

git clone https://github.com/kakao/khaiii.git

2. build๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ด๋™ ํ•ฉ๋‹ˆ๋‹ค.

cd khaiii
mkdir build
cd build

3. Cmake๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. (์ปคํ”ผํ•œ์ž” ํƒ€ ๊ฐ€์ง€๊ณ  ์˜ต๋‹ˆ๋‹ค.)

cmake ..

4. Make๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. (์ปคํ”ผ๋ฅผ ํ•œ์ž” ๋” ํƒ€ ๊ฐ€์ง€๊ณ  ์˜ต๋‹ˆ๋‹ค.)

make all

 ์„ฑ๊ณต์ ์œผ๋กœ ๋นŒ๋“œ๊ฐ€ ๋˜๋ฉด build ๋””๋ ‰ํ„ฐ๋ฆฌ ์•„๋ž˜์— ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŒŒ์ผ์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.

  • bin: ๋””๋ ‰ํ„ฐ๋ฆฌ
    • khaiii: ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ
  • lib: ๋””๋ ‰ํ„ฐ๋ฆฌ
    • libkhaiii.so: shared ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (๋งฅ OS์˜ ๊ฒฝ์šฐ libkhaiii.dylib)
    • libkhaiii.so.X
    • libkhaiii.so.X.Y
  • test: ๋””๋ ‰ํ„ฐ๋ฆฌ
    • khaiii: ํ…Œ์ŠคํŠธ ํ”„๋กœ๊ทธ๋žจ

5. make resource : bin ๋””๋ ‰ํ† ๋ฆฌ์— ์ƒ์„ฑ๋œ khaiiiํŒŒ์ผ์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•ด๋‹น ํ”„๋กœ๊ทธ๋žจ์—์„œ ์‚ฌ์šฉํ•˜๋Š” 

                                      ๋ฆฌ์†Œ์Šค ๋“ค์„ ๋นŒ๋“œ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

make resource

6. ํ…Œ์ŠคํŠธ๋ฅผ ํ•ด ๋ด…๋‹ˆ๋‹ค.

./bin/khaiii --rsc-dir=./share/khaiii

        ๋ช…๋ น์–ด๋ฅผ ์น˜๋ฉด "...PoS tagger opened"๋ผ๋Š” ๋ฉ”์‹œ์ง€ ๋‹ค์Œ์— ์•„๋ฌด ๋ณ€ํ™”๊ฐ€ ์—†๋Š”๋ฐ, 

       ์—ฌ๊ธฐ์— ํ…Œ์ŠคํŠธ ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•˜๊ณ  ์—”ํ„ฐ๋ฅผ ์น˜๋ฉด ํ˜•ํƒœ์†Œ ๋ถ„์„์ด ๋œ ๊ฒฐ๊ณผ๊ฐ€ ์ถœ๋ ฅ์ด ๋ฉ๋‹ˆ๋‹ค.

7. ์ •์ƒ ์ž‘๋™์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

ctest

7. python๊ณผ ์—ฐ๋™์„ ์œ„ํ•ด make๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. (python binding)

make package_python

8. pip install ์‹คํ–‰

cd package_python
pip install  . 

9.  Python ๋ฐ”์ธ๋”ฉ ํ…Œ์ŠคํŠธ

      ๋งˆ์ง€๋ง‰์œผ๋กœ, python ๋ฐ”์ธ๋”ฉ์ด ์ž˜ ๋˜์–ด ์žˆ๋Š”์ง€ ํ…Œ์ŠคํŠธ ํ•ฉ๋‹ˆ๋‹ค.

      ์•„๋ž˜ ์†Œ์Šค๋ฅผ python์œผ๋กœ ์‹คํ–‰์‹œ์ผœ๋ด…๋‹ˆ๋‹ค.

      ** ์—ฌ๊ธฐ์„œ ์ฃผ์˜ํ•  ์ ์€ ํ˜„์žฌ์˜ ๋””๋ ‰ํ† ๋ฆฌ(./package_python)์„ ๋ฒ—์–ด๋‚˜์„œ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

           ํ˜„์žฌ์˜ ๋””๋ ‰ํ† ๋ฆฌ์—์„œ ์‹คํ–‰ํ•˜๋ฉด ํ˜„์žฌ์˜ buildํŒŒ์ผ์„ ์‹คํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์˜์กด์„ฑ ๋ฌธ์ œ๊ฐ€

           ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

from khaiii import KhaiiiApi
api = KhaiiiApi()
for word in api.analyze('์•ˆ๋…•, ์„ธ์ƒ.'):
    print(word)

์œ„ ์ด๋ฏธ์ง€ ์ฒ˜๋Ÿผ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ค๋ฉด ์ž˜ ๋˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

 

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

+ Recent posts