Transformers(2) - インストール/テキスト分類

今回は、Transformersのインストールとテキスト分類を行います。

Transformersのインストール

Transformersのインストールを行うためには次のコマンドを実行します。

実行環境としてはGoogle Colaboratoryを想定しています。

[Google Colaboratory]

1
!pip install transformers[ja]==4.4.2

次のような実行結果が表示されればインストールは成功しています。

[実行結果]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
Collecting transformers[ja]==4.4.2
Downloading transformers-4.4.2-py3-none-any.whl (2.0 MB)
|████████████████████████████████| 2.0 MB 13.1 MB/s
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (3.0.12)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (21.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (2019.12.20)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (4.8.1)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (1.19.5)
Collecting tokenizers<0.11,>=0.10.1
Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
|████████████████████████████████| 3.3 MB 56.6 MB/s
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (4.62.2)
Collecting sacremoses
Downloading sacremoses-0.0.45-py3-none-any.whl (895 kB)
|████████████████████████████████| 895 kB 55.7 MB/s
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (2.23.0)
Collecting fugashi>=1.0
Downloading fugashi-1.1.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (490 kB)
|████████████████████████████████| 490 kB 60.5 MB/s
Collecting ipadic<2.0,>=1.0.0
Downloading ipadic-1.0.0.tar.gz (13.4 MB)
|████████████████████████████████| 13.4 MB 212 kB/s
Collecting unidic>=1.0.2
Downloading unidic-1.0.3.tar.gz (5.1 kB)
Collecting unidic-lite>=1.0.7
Downloading unidic-lite-1.0.8.tar.gz (47.4 MB)
|████████████████████████████████| 47.4 MB 79 kB/s
Requirement already satisfied: wasabi<1.0.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from unidic>=1.0.2->transformers[ja]==4.4.2) (0.8.2)
Requirement already satisfied: plac<2.0.0,>=1.1.3 in /usr/local/lib/python3.7/dist-packages (from unidic>=1.0.2->transformers[ja]==4.4.2) (1.1.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[ja]==4.4.2) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[ja]==4.4.2) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[ja]==4.4.2) (2021.5.30)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[ja]==4.4.2) (2.10)
Requirement already satisfied: typing-extensions>=3.6.4 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers[ja]==4.4.2) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers[ja]==4.4.2) (3.5.0)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->transformers[ja]==4.4.2) (2.4.7)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers[ja]==4.4.2) (1.15.0)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers[ja]==4.4.2) (1.0.1)
Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers[ja]==4.4.2) (7.1.2)
Building wheels for collected packages: ipadic, unidic, unidic-lite
Building wheel for ipadic (setup.py) ... done
Created wheel for ipadic: filename=ipadic-1.0.0-py3-none-any.whl size=13556723 sha256=868f02ff1674a53b19d8180bdf7a9473a57ab3be9cb1551291fdcb03bfb98bc5
Stored in directory: /root/.cache/pip/wheels/33/8b/99/cf0d27191876637cd3639a560f93aa982d7855ce826c94348b
Building wheel for unidic (setup.py) ... done
Created wheel for unidic: filename=unidic-1.0.3-py3-none-any.whl size=5506 sha256=8977d77c05cd5ec120e9f048171cc7fe4804c6191cb8c3dc209a1eae232c6aa8
Stored in directory: /root/.cache/pip/wheels/23/30/0b/128289fb595ef4117d2976ffdbef5069ef83be813e88caa0a6
Building wheel for unidic-lite (setup.py) ... done
Created wheel for unidic-lite: filename=unidic_lite-1.0.8-py3-none-any.whl size=47658836 sha256=05af03831b7babce0692a95538db2a5616e083838fd8a98d6336cdf8e6f9a009
Stored in directory: /root/.cache/pip/wheels/de/69/b1/112140b599f2b13f609d485a99e357ba68df194d2079c5b1a2
Successfully built ipadic unidic unidic-lite
Installing collected packages: tokenizers, sacremoses, unidic-lite, unidic, transformers, ipadic, fugashi
Successfully installed fugashi-1.1.1 ipadic-1.0.0 sacremoses-0.0.45 tokenizers-0.10.3 transformers-4.4.2 unidic-1.0.3 unidic-lite-1.0.8

テキスト分類

テキスト分類は、テキストを事前に定義されたカテゴリに分類する処理となります。

今回は文章を「ポジティブ」か「ネガティブ」かに分類してみます。

タスク名に‘sentiment-analysis’を指定したパイプラインを作成し、そのパイプラインに任意の文章を渡します。

[Google Colaboratory]

1
2
3
4
5
6
7
8
9
10
from transformers import pipeline

# テキスト分類のパイプラインを準備
nlp = pipeline('sentiment-analysis')

# テキスト
text = 'This movie was very interesting.'

# 推論
print(nlp(text))

[実行結果]

1
2
3
4
5
6
7
8
9
Downloading: 100%
629/629 [00:00<00:00, 10.1kB/s]
Downloading: 100%
268M/268M [00:06<00:00, 45.4MB/s]
Downloading: 100%
232k/232k [00:00<00:00, 266kB/s]
Downloading: 100%
48.0/48.0 [00:00<00:00, 850B/s]
[{'label': 'POSITIVE', 'score': 0.9997621178627014}]

スコア99以上で‘POSITIVE’な文章であると判断できました。


少し文章を変更して再度テキスト分類を行います。

[Google Colaboratory]

1
2
3
4
5
6
7
8
9
10
from transformers import pipeline

# テキスト分類のパイプラインを準備
nlp = pipeline('sentiment-analysis')

# テキスト
text = 'This movie was very silly.'

# 推論
print(nlp(text))

[実行結果]

1
[{'label': 'NEGATIVE', 'score': 0.9983341693878174}]

スコア99以上で‘NEGATIVE’な文章であると判断できました。

次回は、質問応答を行います。