GiNZA(1) - インストール

自然言語処理を行うライブラリであるGiNZAを利用してみます。

GiNZAの概要

GiNZAは日本語の自然言語処理ライブラリで、次のような用途で使われます。

  • 情報抽出
    大量の自然言語の文章から、特定の条件に合致した情報を抽出します。
  • 自然言語理解
    発話文章から発話者がどんなタスクを要求しているのかを推測します。
  • 深層学習の前処理
    自然言語の文章を深層学習の入力データに変換します。
    自然言語処理の深層学習の推論で利用します。

インストール

GiNZAをインストールするには次のコマンドを実行します。

[Google Colaboratory]

1
2
# GiNZAのインストール
!pip install ginza==4.0.5

次のようなログが出力されていれば、インストールは成功しています。

[実行結果]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
Collecting ginza==4.0.5
Downloading ginza-4.0.5.tar.gz (20 kB)
Collecting spacy<3.0.0,>=2.3.2
Downloading spacy-2.3.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.4 MB)
|████████████████████████████████| 10.4 MB 5.5 MB/s
Collecting ja_ginza<4.1.0,>=4.0.0
Downloading ja_ginza-4.0.0.tar.gz (51.5 MB)
|████████████████████████████████| 51.5 MB 16 kB/s
Collecting SudachiPy>=0.4.9
Downloading SudachiPy-0.5.4.tar.gz (86 kB)
|████████████████████████████████| 86 kB 5.0 MB/s
Collecting SudachiDict-core>=20200330
Downloading SudachiDict-core-20210802.tar.gz (9.1 kB)
Collecting thinc<7.5.0,>=7.4.1
Downloading thinc-7.4.5-cp37-cp37m-manylinux2014_x86_64.whl (1.0 MB)
|████████████████████████████████| 1.0 MB 43.1 MB/s
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (0.4.1)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (3.0.5)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (2.0.5)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (0.8.2)
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (1.0.5)
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (1.19.5)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (1.1.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (2.23.0)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (1.0.5)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (57.4.0)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (1.0.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.0.0,>=2.3.2->ginza==4.0.5) (4.62.3)
Requirement already satisfied: importlib-metadata>=0.20 in /usr/local/lib/python3.7/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy<3.0.0,>=2.3.2->ginza==4.0.5) (4.8.1)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.20->catalogue<1.1.0,>=0.0.7->spacy<3.0.0,>=2.3.2->ginza==4.0.5) (3.6.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.20->catalogue<1.1.0,>=0.0.7->spacy<3.0.0,>=2.3.2->ginza==4.0.5) (3.7.4.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.0.0,>=2.3.2->ginza==4.0.5) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.0.0,>=2.3.2->ginza==4.0.5) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.0.0,>=2.3.2->ginza==4.0.5) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.0.0,>=2.3.2->ginza==4.0.5) (2021.5.30)
Collecting sortedcontainers~=2.1.0
Downloading sortedcontainers-2.1.0-py2.py3-none-any.whl (28 kB)
Collecting dartsclone~=0.9.0
Downloading dartsclone-0.9.0-cp37-cp37m-manylinux1_x86_64.whl (473 kB)
|████████████████████████████████| 473 kB 34.1 MB/s
Requirement already satisfied: Cython in /usr/local/lib/python3.7/dist-packages (from dartsclone~=0.9.0->SudachiPy>=0.4.9->ginza==4.0.5) (0.29.24)
Building wheels for collected packages: ginza, ja-ginza, SudachiDict-core, SudachiPy
Building wheel for ginza (setup.py) ... done
Created wheel for ginza: filename=ginza-4.0.5-py3-none-any.whl size=15895 sha256=05809609d54412282a3aca677bf262183e269cff4ea2eaa92b35fe485ed4104c
Stored in directory: /root/.cache/pip/wheels/ba/a9/a2/c1165c004f6dcb415b7a7d145aa4511b5024b5fb1f2eb0c0ea
Building wheel for ja-ginza (setup.py) ... done
Created wheel for ja-ginza: filename=ja_ginza-4.0.0-py3-none-any.whl size=51530813 sha256=f6d2a89bfa1850b356ef103dca344535c6472c5615cf8c7d3e0bf2f0bb964e42
Stored in directory: /root/.cache/pip/wheels/a8/f5/4a/5d4877342f912e0b7209d8a65e7ce39fe2c1a3c2511d59acfb
Building wheel for SudachiDict-core (setup.py) ... done
Created wheel for SudachiDict-core: filename=SudachiDict_core-20210802-py3-none-any.whl size=71418512 sha256=5d1f8362b682fef55a4ca03c8d4a590ed74a910a220d1d51947aac52c6d9a1c6
Stored in directory: /root/.cache/pip/wheels/91/e8/21/e80d212743835d87bb5e7eca81b6abef6d8cb67a294007a837
Building wheel for SudachiPy (setup.py) ... done
Created wheel for SudachiPy: filename=SudachiPy-0.5.4-cp37-cp37m-linux_x86_64.whl size=872116 sha256=de7e07689323107cc31919c53b64a7bad008113ae8e094daabcab14dbf5a8957
Stored in directory: /root/.cache/pip/wheels/6b/5b/8b/ce1f543c9e9af590fdc62e8344fda5a3950c60c0d21c83174e
Successfully built ginza ja-ginza SudachiDict-core SudachiPy
Installing collected packages: thinc, sortedcontainers, dartsclone, SudachiPy, spacy, SudachiDict-core, ja-ginza, ginza
Attempting uninstall: thinc
Found existing installation: thinc 7.4.0
Uninstalling thinc-7.4.0:
Successfully uninstalled thinc-7.4.0
Attempting uninstall: sortedcontainers
Found existing installation: sortedcontainers 2.4.0
Uninstalling sortedcontainers-2.4.0:
Successfully uninstalled sortedcontainers-2.4.0
Attempting uninstall: spacy
Found existing installation: spacy 2.2.4
Uninstalling spacy-2.2.4:
Successfully uninstalled spacy-2.2.4
Successfully installed SudachiDict-core-20210802 SudachiPy-0.5.4 dartsclone-0.9.0 ginza-4.0.5 ja-ginza-4.0.0 sortedcontainers-2.1.0 spacy-2.3.7 thinc-7.4.5

メニューから「ランタイム → ランタイムを再起動」を選択し、Google Colaboratoryを再起動しておきます。

トークン化

きちんとインストールできているかどうかを確認するために、トークン化を行ってみます。

[Google Colaboratory]

1
2
3
4
5
6
7
import spacy

nlp = spacy.load('ja_ginza')
doc = nlp('リモートワークできない職場なのでフリーランスになりました。')

for token in doc:
print(token)

実行結果は以下の通りです。

[実行結果]

1
2
3
4
5
6
7
8
9
10
11
12
13
リモートワーク
でき
ない
職場



フリーランス

なり
まし


きちんとトークン化できました。

次回からは、GiNZAを使って形態素解析を行います。