ソーシャルメディア分析 scikit-learn

7月 7, 2023

ソーシャルメディア分析

scikit-learnを使用して、ソーシャルメディアのコメントや投稿から感情分析を行い、その結果をグラフ化する例を提供します。

以下のコードは、サンプルデータを生成し、感情分析を実行して結果をグラフ化する方法を示しています。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

# サンプルデータの生成
np.random.seed(42)
comments = [
    "この映画は素晴らしいです！",
    "最悪の映画でした...",
    "絶対におすすめできません。",
    "興奮しました！",
    "つまらないストーリーでした。",
    "感動的な結末でした。",
]

labels = [1, 0, 0, 1, 0, 1]  # 1: positive, 0: negative

for _ in range(94):  # サンプルデータを100個に合わせる
    comment = "サンプルコメント"
    sentiment = np.random.randint(2)
    comments.append(comment)
    labels.append(sentiment)

# テキストデータを数値特徴に変換
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(comments)

# データセットをトレーニングセットとテストセットに分割
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)

# ロジスティック回帰モデルの学習と予測
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# 正解率の計算
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# 混同行列の計算
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

# 混同行列をグラフ化
labels = ['Negative', 'Positive']
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
plt.title("Confusion Matrix - Sentiment Analysis")
plt.colorbar()
tick_marks = np.arange(len(labels))
plt.xticks(tick_marks, labels, rotation=45)
plt.yticks(tick_marks, labels)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
for i in range(len(labels)):
    for j in range(len(labels)):
        plt.text(j, i, str(cm[i, j]), ha='center', va='center', color='white')
plt.show()

このコードでは、追加の94個のサンプルデータを生成し、合計100個のデータセットを作成します。

その後、感情分析を実行して、正解率と混同行列が計算されます。最後に、混同行列がカラーマップを使用してグラフ化されます。

グラフは、x軸に予測ラベル（Negative、Positive）、y軸に真のラベル（Negative、Positive）を持つ混同行列を表示します。

グラフ上の各セルには、各カテゴリに分類されたコメントの数が表示されます。

[実行結果]

このグラフは、感情分析の結果を視覚化するために使用される混同行列を示しています。

x軸とy軸のラベルは予測されたラベル（Negative、Positive）と真のラベル（Negative、Positive）を示しています。各セルには、それぞれのラベルに分類されたコメントの数が表示されています。

例えば、左上のセルは真のラベルがNegativeであり、予測されたラベルもNegativeであるコメントの数を示しています。

同様に、右下のセルは真のラベルがPositiveであり、予測されたラベルもPositiveであるコメントの数を示しています。

このグラフを通じて、感情分析モデルの予測結果が真のラベルとどのように一致するかを可視化することができます。