{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using External Data & Machine Learning\n",
"\n",
"[](https://colab.research.google.com/github/MiniXC/simple-back/blob/master/docs/intro/data_sources.ipynb)\n",
"\n",
"The tutorial so far only showed you how to build a strategy using price data, which in reality is nigh impossible to make profitable if you don't have the resources big players have. This tutorial will show you to use external data ([/r/worldnews headlines](https://www.kaggle.com/aaron7sun/stocknews) on kaggle) to predict the S&P500. This won't be a good strategy, but hopefully it will give you the tools to come up with one."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"**Warning:**\n",
" \n",
"The following API is very likely to be reworked, and this tutorial is a work in progress.\n",
"\n",
"
"
],
"text/plain": [
" Date News\n",
"0 2016-07-01 A 117-year-old woman in Mexico City finally re...\n",
"1 2016-07-01 IMF chief backs Athens as permanent Olympic host\n",
"2 2016-07-01 The president of France says if Brexit won, so...\n",
"3 2016-07-01 British Man Who Must Give Police 24 Hours' Not...\n",
"4 2016-07-01 100+ Nobel laureates urge Greenpeace to stop o..."
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(worldnews_url).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DataProvider\n",
"We now have our news dataset. To help with caching and preventing time leaks, you can extend the `DataProvider` class and implement its `get` method."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from simple_back.data_providers import DataProvider"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"class WorldnewsProvider(DataProvider):\n",
" def __init__(self, url, debug=True):\n",
" super().__init__(debug=debug)\n",
" if self.in_cache(url):\n",
" self.df = self.get_cache(url)\n",
" else:\n",
" self.df = pd.read_csv(worldnews_url)\n",
" self.df['Date'] = pd.to_datetime(self.df['Date'])\n",
" self.df = self.df.set_index('Date').sort_index(ascending=False)\n",
" self.set_cache(url, self.df)\n",
"\n",
" @property\n",
" def name(self):\n",
" return \"Reddit /r/worldnews\"\n",
" \n",
" def dates(self, symbol=None):\n",
" return self.df.index\n",
" \n",
" def get(self, datetime, symbol):\n",
" print(datetime, symbol)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"**Note:**\n",
"\n",
"Note that we set `debug` to `True`. This will disable caching while still allowing to implement it.\n",
"Caching can be very annoying when developing, so we recommend you only set this to `False` when your data provider is done.\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"news = WorldnewsProvider(worldnews_url)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two ways of accessing a providers data in a backtest. By calling the provider (`()`) and by getting a specific symbol from the provider (`[somesymbol]`). If we don't need the provider to fetch information with different names (e.g. sentiment for different stocks), we can just ignore `symbol`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2020-06-23 11:11:27.890553+00:00 somesymbol\n"
]
}
],
"source": [
"news['somesymbol']"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2020-06-23 11:11:27.890553+00:00 None\n"
]
}
],
"source": [
"news()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"**Note:**\n",
"\n",
"At the moment, our data provider is not part of a backtest, which is why its date is set to the current time. This enables you to use a data provider outside of backtests for real-time strategies.\n",
" \n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will get to the actual data: we always return all headlines from the closest day (back in time)."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"class WorldnewsProvider(DataProvider):\n",
" def __init__(self, url, debug=True):\n",
" super().__init__(debug=debug)\n",
" if self.in_cache(url):\n",
" self.df = self.get_cache(url)\n",
" else:\n",
" self.df = pd.read_csv(worldnews_url)\n",
" self.df['Date'] = pd.to_datetime(self.df['Date'])\n",
" self.df = self.df.set_index('Date').sort_index(ascending=False)\n",
" self.set_cache(url, self.df)\n",
"\n",
" @property\n",
" def name(self):\n",
" return \"Reddit /r/worldnews\"\n",
" \n",
" def dates(self, symbol=None):\n",
" return self.df.index\n",
" \n",
" def get(self, datetime, symbol):\n",
" latest_date = None\n",
" for date in self.dates():\n",
" if date < datetime.date():\n",
" latest_date = date\n",
" break\n",
" return self.df.loc[latest_date]"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"news = WorldnewsProvider(worldnews_url)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
News
\n",
"
\n",
"
\n",
"
Date
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
2016-07-01
\n",
"
A 117-year-old woman in Mexico City finally re...
\n",
"
\n",
"
\n",
"
2016-07-01
\n",
"
IMF chief backs Athens as permanent Olympic host
\n",
"
\n",
"
\n",
"
2016-07-01
\n",
"
The president of France says if Brexit won, so...
\n",
"
\n",
"
\n",
"
2016-07-01
\n",
"
British Man Who Must Give Police 24 Hours' Not...
\n",
"
\n",
"
\n",
"
2016-07-01
\n",
"
100+ Nobel laureates urge Greenpeace to stop o...
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" News\n",
"Date \n",
"2016-07-01 A 117-year-old woman in Mexico City finally re...\n",
"2016-07-01 IMF chief backs Athens as permanent Olympic host\n",
"2016-07-01 The president of France says if Brexit won, so...\n",
"2016-07-01 British Man Who Must Give Police 24 Hours' Not...\n",
"2016-07-01 100+ Nobel laureates urge Greenpeace to stop o..."
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"news().head()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
News
\n",
"
\n",
"
\n",
"
Date
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
2009-12-19
\n",
"
b'Sarah Palin kicked out of hospital fundraise...
\n",
"
\n",
"
\n",
"
2009-12-19
\n",
"
b\"General Electric is using England's draconia...
\n",
"
\n",
"
\n",
"
2009-12-19
\n",
"
b\"'A young woman walks into a bar, drinks too ...
\n",
"
\n",
"
\n",
"
2009-12-19
\n",
"
b'Drug giant GE Healthcare uses UK libel law t...
\n",
"
\n",
"
\n",
"
2009-12-19
\n",
"
b'George Orwell put fish and chips first among...
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" News\n",
"Date \n",
"2009-12-19 b'Sarah Palin kicked out of hospital fundraise...\n",
"2009-12-19 b\"General Electric is using England's draconia...\n",
"2009-12-19 b\"'A young woman walks into a bar, drinks too ...\n",
"2009-12-19 b'Drug giant GE Healthcare uses UK libel law t...\n",
"2009-12-19 b'George Orwell put fish and chips first among..."
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datetime import datetime\n",
"# the backtester will later set current_datetime as follows\n",
"news.current_datetime = datetime(2009, 12, 20, 0, 0)\n",
"news().head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Target DataProvider\n",
"If we want to use this data to train a machine learning model, we will also need `target` values to predict. We can best do this by extending a `DailyDataProvider`. This type of provider is tied to events instead of times and is meant to provide a value on every trading day."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"from simple_back.data_providers import DailyPriceProvider, YahooFinanceProvider\n",
"from dateutil.relativedelta import relativedelta"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"class TargetDataProvider(DailyPriceProvider):\n",
" def __init__(self, wordlnews_url, debug=True):\n",
" self.prices = YahooFinanceProvider(debug=debug)\n",
" self.news = WorldnewsProvider(worldnews_url, debug=debug)\n",
" super().__init__(debug=debug)\n",
" \n",
" @property\n",
" def name(self):\n",
" return \"Target Price Change\"\n",
" \n",
" def get(self, symbol, date, event):\n",
" x = []\n",
" y = []\n",
" if isinstance(date, slice):\n",
" start_date = date.start\n",
" date = date.stop\n",
" while date > self.news.dates().min() and (start_date is None or date > start_date):\n",
" self.prices.set_date_event(date, event)\n",
" try:\n",
" yesterday_df = self.prices[symbol].iloc[-1].copy()\n",
" if len(yesterday_df) > 0:\n",
" change = yesterday_df['close'] - yesterday_df['open']\n",
" if change > 0:\n",
" change = 'positive'\n",
" else:\n",
" change = 'negative'\n",
" x.append(self.news(datetime.combine(date, datetime.min.time())))\n",
" y.append(change)\n",
" except ValueError:\n",
" pass\n",
" date = date - relativedelta(days=1)\n",
" else:\n",
" self.prices.set_date_event(date, event)\n",
" yesterday_df = self.prices[symbol].iloc[-1].copy()\n",
" x = self.news(datetime.combine(date, datetime.min.time()))\n",
" change = yesterday_df['close'] - yesterday_df['open']\n",
" if change > 0:\n",
" change = 'positive'\n",
" else:\n",
" change = 'negative'\n",
" y = change\n",
" return x, y"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"target = TargetDataProvider(worldnews_url)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"from datetime import date\n",
"\n",
"x, y = target['^GSPC', date(2014,12,28)]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"( News\n",
" Date \n",
" 2014-12-27 Boy, 14, escapes ISIS by volunteering to suici...\n",
" 2014-12-27 Britain has surpassed France as the world's 5t...\n",
" 2014-12-27 N. Korea calls Obama 'monkey,' blames U.S. for...\n",
" 2014-12-27 \"The DNA of every animal in world history will...\n",
" 2014-12-27 Sweden to scrap new election: confirmed,\n",
" 'positive')"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x.head(), y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets now get all data. We will have to set debug to false or every price will be downloaded repeatedly."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"target = TargetDataProvider(worldnews_url, debug=False)\n",
"x, y = target['^GSPC']"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"( News\n",
" Date \n",
" 2016-07-01 A 117-year-old woman in Mexico City finally re...\n",
" 2016-07-01 IMF chief backs Athens as permanent Olympic host\n",
" 2016-07-01 The president of France says if Brexit won, so...\n",
" 2016-07-01 British Man Who Must Give Police 24 Hours' Not...\n",
" 2016-07-01 100+ Nobel laureates urge Greenpeace to stop o...\n",
" 2016-07-01 Brazil: Huge spike in number of police killing...\n",
" 2016-07-01 Austria's highest court annuls presidential el...\n",
" 2016-07-01 Facebook wins privacy case, can track any Belg...\n",
" 2016-07-01 Switzerland denies Muslim girls citizenship af...\n",
" 2016-07-01 China kills millions of innocent meditators fo...\n",
" 2016-07-01 France Cracks Down on Factory Farms - A viral ...\n",
" 2016-07-01 Abbas PLO Faction Calls Killer of 13-Year-Old ...\n",
" 2016-07-01 Taiwanese warship accidentally fires missile t...\n",
" 2016-07-01 Iran celebrates American Human Rights Week, mo...\n",
" 2016-07-01 U.N. panel moves to curb bias against L.G.B.T....\n",
" 2016-07-01 The United States has placed Myanmar, Uzbekist...\n",
" 2016-07-01 S&P revises European Union credit rating t...\n",
" 2016-07-01 India gets $1 billion loan from World Bank for...\n",
" 2016-07-01 U.S. sailors detained by Iran spoke too much u...\n",
" 2016-07-01 Mass fish kill in Vietnam solved as Taiwan ste...\n",
" 2016-07-01 Philippines president Rodrigo Duterte urges pe...\n",
" 2016-07-01 Spain arrests three Pakistanis accused of prom...\n",
" 2016-07-01 Venezuela, where anger over food shortages is ...\n",
" 2016-07-01 A Hindu temple worker has been killed by three...\n",
" 2016-07-01 Ozone layer hole seems to be healing - US &...,\n",
" 'positive')"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[0], y[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now get a pair of the data we want to use for prediction and the target change for any given day. We will use this data to train a classifier. For ease of use, we will use TFIDF to convert the documents to vectors. We will train LightGBM to classify them."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"!pip install xgboost"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"from sklearn.preprocessing import label_binarize\n",
"from sklearn.metrics import f1_score, confusion_matrix, accuracy_score, precision_score, recall_score\n",
"import lightgbm as lgb"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"def to_tfidf(x):\n",
" x = [' '.join(day['News']) for day in x]\n",
" return TfidfVectorizer().fit_transform(x)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"x_tfidf = to_tfidf(x)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"y_target = label_binarize(y, classes=['negative','positive']).flatten()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1367, 1367)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(x), len(y)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"x_train, x_test = x_tfidf[:1000], x_tfidf[-367:]\n",
"y_train, y_test = y_target[:1000], y_target[-367:]"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"import xgboost as xgb\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0]\ttrain-error:0.38700\teval-error:0.50136\n",
"Multiple eval metrics have been passed: 'eval-error' will be used for early stopping.\n",
"\n",
"Will train until eval-error hasn't improved in 10 rounds.\n",
"[1]\ttrain-error:0.37300\teval-error:0.46049\n",
"[2]\ttrain-error:0.34400\teval-error:0.45504\n",
"[3]\ttrain-error:0.32700\teval-error:0.47139\n",
"[4]\ttrain-error:0.32700\teval-error:0.45777\n",
"[5]\ttrain-error:0.31700\teval-error:0.44414\n",
"[6]\ttrain-error:0.32200\teval-error:0.44959\n",
"[7]\ttrain-error:0.32500\teval-error:0.45777\n",
"[8]\ttrain-error:0.31300\teval-error:0.46866\n",
"[9]\ttrain-error:0.31800\teval-error:0.45504\n",
"[10]\ttrain-error:0.31400\teval-error:0.45777\n",
"[11]\ttrain-error:0.31000\teval-error:0.46049\n",
"[12]\ttrain-error:0.31100\teval-error:0.46321\n",
"[13]\ttrain-error:0.30300\teval-error:0.45777\n",
"[14]\ttrain-error:0.30500\teval-error:0.46321\n",
"[15]\ttrain-error:0.29500\teval-error:0.46321\n",
"Stopping. Best iteration:\n",
"[5]\ttrain-error:0.31700\teval-error:0.44414\n",
"\n"
]
}
],
"source": [
"# read in data\n",
"dtrain = xgb.DMatrix(x_train, label=y_train)\n",
"dtest = xgb.DMatrix(x_test, label=y_test)\n",
"# specify parameters via map\n",
"param = {\n",
" 'max_depth':3,\n",
" 'eta':0.01,\n",
" 'objective': 'binary:logistic',\n",
" 'booster': 'gbtree',\n",
" 'verbosity': 1,\n",
" 'subsample': .9,\n",
"}\n",
"num_round = 100\n",
"metrics = [(dtrain, 'train'), (dtest, 'eval')]\n",
"bst = xgb.train(param, dtrain, num_round, evals=metrics, early_stopping_rounds=10)\n",
"# make prediction\n",
"preds = bst.predict(dtest)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.6780303030303031, 0.5367847411444142)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f1_score(y_test, np.round(preds)), accuracy_score(y_test, np.round(preds))"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.549079754601227, 0.8861386138613861)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"precision_score(y_test, np.round(preds)), recall_score(y_test, np.round(preds))"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.heatmap(confusion_matrix(y_test, np.round(preds), normalize='pred'), annot=True, fmt='.2f')\n",
"plt.xlabel('Predicted')\n",
"plt.ylabel('Actual');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prediction Data Provider & Backtesting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the majority class, 'positive' occurs much more frequently (not too surprising with a market index) and is predicted correctly 54% of the time. Downside is only predicted correctly 40% of the time, which means our strategy should only use buy signals instead of sells ones."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"class PredictionDataProvider(DailyPriceProvider):\n",
" def __init__(self, wordlnews_url, debug=True):\n",
" self.prices = YahooFinanceProvider(debug=debug)\n",
" self.news = WorldnewsProvider(worldnews_url, debug=debug)\n",
" super().__init__(debug=debug)\n",
" \n",
" @property\n",
" def name(self):\n",
" return \"Predict Price Change\"\n",
" \n",
" def train(self, x, y):\n",
" total_len = x.shape[0]\n",
" test_size = int(x.shape[0]*0.2)\n",
" x_train, x_test = x[:total_len-test_size], x[-test_size:]\n",
" y_train, y_test = y[:total_len-test_size], y[-test_size:]\n",
" # read in data\n",
" dtrain = xgb.DMatrix(x_train, label=y_train)\n",
" dtest = xgb.DMatrix(x_test, label=y_test)\n",
" # specify parameters via map\n",
" param = {\n",
" 'max_depth':3,\n",
" 'eta':0.01,\n",
" 'objective': 'binary:logistic',\n",
" 'booster': 'gbtree',\n",
" 'verbosity': 0,\n",
" 'subsample': .9,\n",
" }\n",
" num_round = 100\n",
" metrics = [(dtrain, 'train'), (dtest, 'eval')]\n",
" bst = xgb.train(param, dtrain, num_round, evals=metrics, early_stopping_rounds=10, verbose_eval=False)\n",
" self.bst = bst\n",
" \n",
" def predict(self, x):\n",
" return self.bst.predict(x)\n",
" \n",
" def get(self, symbol, date, event):\n",
" x, y = [], []\n",
" if isinstance(date, slice):\n",
" start_date = date.start\n",
" date = date.stop\n",
" while date > self.news.dates().min() and (start_date is None or date > start_date):\n",
" self.prices.set_date_event(date, event)\n",
" try:\n",
" yesterday_df = self.prices[symbol].iloc[-1].copy()\n",
" if len(yesterday_df) > 0:\n",
" change = yesterday_df['close'] - yesterday_df['open']\n",
" if change > 0:\n",
" change = 'positive'\n",
" else:\n",
" change = 'negative'\n",
" x.append(self.news(datetime.combine(date, datetime.min.time())))\n",
" y.append(change)\n",
" except ValueError:\n",
" pass\n",
" date = date - relativedelta(days=1)\n",
" else:\n",
" raise ValueError()\n",
" self.vec = TfidfVectorizer()\n",
" x = self.vec.fit_transform([' '.join(day['News']) for day in x])\n",
" y = label_binarize(y, classes=['negative','positive']).flatten()\n",
" self.train(x, y)\n",
" current_news = self.news(datetime.combine(date+relativedelta(days=1), datetime.min.time()))\n",
" x = self.vec.transform([' '.join(current_news['News'])])\n",
" return self.predict(xgb.DMatrix(x))"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.5199328], dtype=float32)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"target = PredictionDataProvider(worldnews_url, debug=False)\n",
"target['^GSPC', :date(2015,1,1)]"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"from simple_back.backtester import BacktesterBuilder\n",
"\n",
"builder = (\n",
" BacktesterBuilder()\n",
" .balance(10_000)\n",
" .calendar('NYSE')\n",
" .compare(['^GSPC']) # strategies to run\n",
" .live_progress() # show a progress bar using tqdm\n",
" .live_plot() # we assume we are running this in a Jupyter Notebook\n",
" .data(target)\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"bt = builder.build()\n",
"for day, event, b in bt['2009-1-1':'2015-1-1']:\n",
" pred = b.data['Predict Price Change']['^GSPC',:day][0]\n",
" b.add_metric('pred', pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This backtest needs quite some time, but once it has completed, all our predictions are cached. This notebook is very poorly optimized, e.g. we are vectorizing training data anew each iteration."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"bt = builder.build()\n",
"for day, event, b in bt['2009-1-1':'2015-1-1']:\n",
" pred = b.data['Predict Price Change']['^GSPC',:day][0]\n",
" if event == 'open':\n",
" if pred > .5:\n",
" b.long('^GSPC', percent=1)\n",
" if pred < .5:\n",
" b.short('^GSPC', percent=1)\n",
" if event == 'close':\n",
" b.pf.liquidate()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our backtest performs worse than the S&P500, but correctly predicts some drawdowns in the end of 2011."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}