アルツハイマー病の診断 scikit-learn

7月 19, 2023

アルツハイマー病の診断

アルツハイマー病の診断には、様々な特徴量が関与しますが、ここではシンプルな例として、年齢と一部の生化学的指標（例えば、血液中の特定のタンパク質レベル）を特徴量として使用するとします。

以下に、scikit-learnを使用した分類問題の解き方を示します。

まず、必要なライブラリをインポートします。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
import seaborn as sns

次に、仮想的なデータセットを作成します。

ここでは、年齢とタンパク質レベルの2つの特徴量を持つ1000人の患者データを生成します。

また、各患者がアルツハイマー病であるかどうかをランダムに決定します。

1
2
3

np.random.seed(0)
X = np.random.randint(60, 100, (1000, 2))  # Age and Protein level
y = np.random.choice([0, 1], 1000)  # 0: Healthy, 1: Alzheimer's

データを訓練セットとテストセットに分割します。

1	X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

ランダムフォレスト分類器を訓練します。

1 2	clf = RandomForestClassifier(n_estimators=100, random_state=0) clf.fit(X_train, y_train)

テストセットでの予測と評価を行います。

1 2	y_pred = clf.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

最後に、混同行列をグラフ化します。

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

このコードは、ランダムに生成したデータに対してランダムフォレスト分類器を訓練し、テストデータでの予測精度を計算し、混同行列を表示します。

[実行結果]

実際の問題では、適切な特徴量の選択と前処理、モデルの選択とチューニングが重要となります。