今回は、Transformersのインストールとテキスト分類を行います。
Transformersのインストール
Transformersのインストールを行うためには次のコマンドを実行します。
実行環境としてはGoogle Colaboratoryを想定しています。
[Google Colaboratory]
1
| !pip install transformers[ja]==4.4.2
|
次のような実行結果が表示されればインストールは成功しています。
[実行結果]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
| Collecting transformers[ja]==4.4.2 Downloading transformers-4.4.2-py3-none-any.whl (2.0 MB) |████████████████████████████████| 2.0 MB 13.1 MB/s Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (3.0.12) Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (21.0) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (2019.12.20) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (4.8.1) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (1.19.5) Collecting tokenizers<0.11,>=0.10.1 Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB) |████████████████████████████████| 3.3 MB 56.6 MB/s Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (4.62.2) Collecting sacremoses Downloading sacremoses-0.0.45-py3-none-any.whl (895 kB) |████████████████████████████████| 895 kB 55.7 MB/s Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers[ja]==4.4.2) (2.23.0) Collecting fugashi>=1.0 Downloading fugashi-1.1.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (490 kB) |████████████████████████████████| 490 kB 60.5 MB/s Collecting ipadic<2.0,>=1.0.0 Downloading ipadic-1.0.0.tar.gz (13.4 MB) |████████████████████████████████| 13.4 MB 212 kB/s Collecting unidic>=1.0.2 Downloading unidic-1.0.3.tar.gz (5.1 kB) Collecting unidic-lite>=1.0.7 Downloading unidic-lite-1.0.8.tar.gz (47.4 MB) |████████████████████████████████| 47.4 MB 79 kB/s Requirement already satisfied: wasabi<1.0.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from unidic>=1.0.2->transformers[ja]==4.4.2) (0.8.2) Requirement already satisfied: plac<2.0.0,>=1.1.3 in /usr/local/lib/python3.7/dist-packages (from unidic>=1.0.2->transformers[ja]==4.4.2) (1.1.3) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[ja]==4.4.2) (3.0.4) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[ja]==4.4.2) (1.24.3) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[ja]==4.4.2) (2021.5.30) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers[ja]==4.4.2) (2.10) Requirement already satisfied: typing-extensions>=3.6.4 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers[ja]==4.4.2) (3.7.4.3) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers[ja]==4.4.2) (3.5.0) Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->transformers[ja]==4.4.2) (2.4.7) Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers[ja]==4.4.2) (1.15.0) Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers[ja]==4.4.2) (1.0.1) Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers[ja]==4.4.2) (7.1.2) Building wheels for collected packages: ipadic, unidic, unidic-lite Building wheel for ipadic (setup.py) ... done Created wheel for ipadic: filename=ipadic-1.0.0-py3-none-any.whl size=13556723 sha256=868f02ff1674a53b19d8180bdf7a9473a57ab3be9cb1551291fdcb03bfb98bc5 Stored in directory: /root/.cache/pip/wheels/33/8b/99/cf0d27191876637cd3639a560f93aa982d7855ce826c94348b Building wheel for unidic (setup.py) ... done Created wheel for unidic: filename=unidic-1.0.3-py3-none-any.whl size=5506 sha256=8977d77c05cd5ec120e9f048171cc7fe4804c6191cb8c3dc209a1eae232c6aa8 Stored in directory: /root/.cache/pip/wheels/23/30/0b/128289fb595ef4117d2976ffdbef5069ef83be813e88caa0a6 Building wheel for unidic-lite (setup.py) ... done Created wheel for unidic-lite: filename=unidic_lite-1.0.8-py3-none-any.whl size=47658836 sha256=05af03831b7babce0692a95538db2a5616e083838fd8a98d6336cdf8e6f9a009 Stored in directory: /root/.cache/pip/wheels/de/69/b1/112140b599f2b13f609d485a99e357ba68df194d2079c5b1a2 Successfully built ipadic unidic unidic-lite Installing collected packages: tokenizers, sacremoses, unidic-lite, unidic, transformers, ipadic, fugashi Successfully installed fugashi-1.1.1 ipadic-1.0.0 sacremoses-0.0.45 tokenizers-0.10.3 transformers-4.4.2 unidic-1.0.3 unidic-lite-1.0.8
|
テキスト分類
テキスト分類は、テキストを事前に定義されたカテゴリに分類する処理となります。
今回は文章を「ポジティブ」か「ネガティブ」かに分類してみます。
タスク名に‘sentiment-analysis’を指定したパイプラインを作成し、そのパイプラインに任意の文章を渡します。
[Google Colaboratory]
1 2 3 4 5 6 7 8 9 10
| from transformers import pipeline
# テキスト分類のパイプラインを準備 nlp = pipeline('sentiment-analysis')
# テキスト text = 'This movie was very interesting.'
# 推論 print(nlp(text))
|
[実行結果]
1 2 3 4 5 6 7 8 9
| Downloading: 100% 629/629 [00:00<00:00, 10.1kB/s] Downloading: 100% 268M/268M [00:06<00:00, 45.4MB/s] Downloading: 100% 232k/232k [00:00<00:00, 266kB/s] Downloading: 100% 48.0/48.0 [00:00<00:00, 850B/s] [{'label': 'POSITIVE', 'score': 0.9997621178627014}]
|
スコア99以上で‘POSITIVE’な文章であると判断できました。
少し文章を変更して再度テキスト分類を行います。
[Google Colaboratory]
1 2 3 4 5 6 7 8 9 10
| from transformers import pipeline
# テキスト分類のパイプラインを準備 nlp = pipeline('sentiment-analysis')
# テキスト text = 'This movie was very silly.'
# 推論 print(nlp(text))
|
[実行結果]
1
| [{'label': 'NEGATIVE', 'score': 0.9983341693878174}]
|
スコア99以上で‘NEGATIVE’な文章であると判断できました。
次回は、質問応答を行います。