Evaluating Startup Predictions with Backtesting and Portfolio Simulation

cover
7 Aug 2024

Authors:

(1) Mark Potanin, a Corresponding (authorpotanin.m.st@gmail.com);

(2) Andrey Chertok, (a.v.chertok@gmail.com);

(3) Konstantin Zorin, (berzqwer@gmail.com);

(4) Cyril Shtabtsovsky, (cyril@aloniq.com).

Abstract and 1. Introduction

2 Related works

3 Dataset Overview, Preprocessing, and Features

3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset

3.3 Features

4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest

4.2 Backtest settings

4.3 Results

4.4 Capital Growth

5 Other approaches

5.1 Investors ranking model

5.2 Founders ranking model and 5.3 Unicorn recommendation model

6 Conclusion

7 Further Research, References and Appendix

4 Model Training, Evaluation, and Portfolio Simulation

A representation of the model’s architecture is visualized in Figure 1.

4.1 Backtest

The backtest period during which we tested the model spans from 2016-01-01 to 2022-01-01, and the model was retrained every 3 months. Actually, it is a hyperparameter that could be tuned depending on the time/accuracy trade-off. In each iteration of the backtest, the time window under consideration is defined by the start and end dates. For example, the first iteration considers the window with a start date of 2016-01-01 and an end date of 2016-04-01. Companies that attracted Round B or C are selected as the "test" set for this window.

The model is trained on the dataset described in Section 3. However, the entire dataset cannot be used for training since it would be incorrect to train on companies founded in the future to predict the success of companies in the past. Therefore, only companies founded before the start of the current time window (i.e., before 2016-01-01 in the first iteration) are considered for training. Additionally, the success of a company (IPO/ACQ/UNICORN) may occur in the future relative to the current window. To train the model, only companies with success event occurred before the start of the current time window are considered.

This approach is designed to ensure the integrity of the backtesting process, avoiding any influence from future events. However, the drawback of this approach is the limited number of training examples at the beginning of the backtest (i.e., in the first iterations in 2016-2017). Consequently, the predictive power of the model is lower at the beginning of the backtest compared to the end. The backtest yields an array of test companies with a score assigned to them, indicating the level of success predicted by the model.

The model is retrained every 3 months during the backtest, resulting in a total of 25 prediction windows. A sorted list of predictions is generated for each window. Finally, all predictions from all windows are compiled into one table, representing the complete backtest of predictions for the period from 2016-01-01 to 2022-01-01. This table passes to the optimization algorithm.

A decision has been made to construct a monthly portfolio based on the backtest results. Therefore, we can conduct the backtest with a window of 1 month, covering the periods from 2016-01-01 to 2016-02-01, from 2016-02-01 to 2016-03-01, and so on, by adding or removing companies in our portfolio every month. Again, using the example of one period, for example from 2018-01-01 to 2018-02-01, let us describe the process of selecting companies to be included in the portfolio. We take a slice of the backtest predictions in this period and sort them by score, which represents the model’s assessment of the success of each company. As the size of our portfolio is limited, for instance, to 30 companies, there is no need to fill it entirely in the first months. Thus, the logic for adding companies is as follows:

• In each month, we select the top 3 companies from the sorted list of predictions. But, a cut-off threshold for the predicted score has also been established. However, the choice of the optimal threshold is an empirical task and requires careful consideration. With the augmentation of the training dataset over time, the model becomes more confident in its predictions. Therefore, it makes sense to increase the threshold when moving along the backtest time. One way to do this is to set the threshold as a function that takes into account the size of the train dataset and other relevant factors.

• Every month we verify the current portfolio:

success: if the company has achieved a success event (IPO/ACQ/unicorn) during the month, it is removed from the active portfolio and marked with this flag.

longtime: if the company has not attracted any rounds within the last 730 days (2 years, a configurable parameter), it is removed from the portfolio and marked with this flag.

still_in: if the company is still in the portfolio at the end of the backtest, it is marked with this flag. These are companies that were recently added to the portfolio (in 2021-2022) and for which we cannot yet make a decision on their success.

The result is a dataset that simulates our venture fund during the period 2016-2022, and we collected (as well as filtered) companies in it every month. The resulting dataset contains the following fields:

• uuid - a unique company identifier

• name - the name of the company

• enter_series_date - the date of the round in which the fund entered the company

• enter_series_value - the valuation of the company at the time of entry (if available)

• score - the company score at the time of entry (if available)

• added - the date when the company was added to the portfolio

• last_series_date - the date of the last round of funding, which could be an IPO, acquisition, or the round in which the company became a unicorn

• last_series_value - the valuation of the company at the time of the last round of funding, if available

• exit_reason - the reason for the fund’s exit from the company (if applicable)

• expired - the date when the fund exited the company (due to success or expiration of the holding period)

The reader may wonder why we retrain the model every 3 months while building the portfolio with a one-month interval. Essentially, at the beginning of the training set, we include all companies until 2016-01-01. The test set consists of companies that received rounds B or C funding during the period from 2016-01-01 to 2016-04-01. We make predictions and add them to the overall table. Then, we expand the training data until 2016-04-01, and the test period becomes from 2016-04-01 to 2016-07-01, and so on. In the end, we have a complete test table covering the period from 2016-01-01 to 2022-01-01.

After that, we go through this table with a one-month step, simulating our venture fund’s behavior and assembling the portfolio. The fact that we first collect all predictions and then go through them to construct the portfolio is simply a matter of optimization. We do not look into the future in any way.

This paper is available on arxiv under CC 4.0 license.