AnyTrading - ビットコイン投資を強化学習で実行　日足編②

August 18, 2020

ビットコインの １日足 のデータでの投資シミュレーション２回目です。

強化学習のパラメータ

ソースはこれまでの応用なので割愛し、強化学習のパラメータだけを下記に示します。

学習アルゴリズム(前回と同様)
PPO2
参照する直前データ数(前回と同様)
50
学習データ
[2017-07-13 ～ 2018-05-11]　１日足データ(前回と同様)
検証データ
[2018-08-19 ～ 2019-06-15]　１日足データ(100日分移動)

投資結果

実行結果は以下の通りです。

[コンソール出力]

info: {'total_reward': 1225670000.0, 'total_profit': 1.0301971417446745, 'position': 0}
info: {'total_reward': 2672370000.0, 'total_profit': 0.9421943631409182, 'position': 0}
info: {'total_reward': -996930000.0, 'total_profit': 0.6864210748446301, 'position': 1}
info: {'total_reward': -5494530000.0, 'total_profit': 0.5774381110281871, 'position': 1}
info: {'total_reward': 7793260000.0, 'total_profit': 1.902538947650555, 'position': 0}
info: {'total_reward': -2393360000.0, 'total_profit': 0.828989892185724, 'position': 1}
info: {'total_reward': -815410000.0, 'total_profit': 0.7578885784881692, 'position': 0}
info: {'total_reward': 839630000.0, 'total_profit': 0.9834818344678051, 'position': 1}
info: {'total_reward': 2794730000.0, 'total_profit': 1.1658569300925843, 'position': 1}
info: {'total_reward': -5354750000.0, 'total_profit': 0.5246427962755218, 'position': 0}

[出力画像]

投資結果（トータル報酬）を表にまとめてみます。

No.	トータル報酬（前回）	トータル報酬（今回）
①	737,820,000円	1,225,670,000円
②	4,451,760,000円	2,672,370,000円
③	4,724,240,000円	-996,930,000円
④	-3,133,420,000円	-5,494,530,000円
⑤	7,880,400,000円	7,793,260,000円
⑥	2,833,180,000円	-2,393,360,000円
⑦	2,268,160,000円	-815,410,000円
⑧	1,437,600,000円	839,630,000円
⑨	-3,185,920,000円	2,794,730,000円
⑩	-5,817,080,000円	-5,354,750,000円

１０種類の学習済みモデルの結果は５勝５敗とイマイチでした。

No.①、No.②、No.⑤、No.⑧の学習済みモデルは２連勝となっています。