Authors:
(1) Mark Potanin, a Corresponding (authorpotanin.m.st@gmail.com);
(2) Andrey Chertok, (a.v.chertok@gmail.com);
(3) Konstantin Zorin, (berzqwer@gmail.com);
(4) Cyril Shtabtsovsky, (cyril@aloniq.com).
Table of Links
3 Dataset Overview, Preprocessing, and Features
3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset
4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest
5 Other approaches
5.2 Founders ranking model and 5.3 Unicorn recommendation model
7 Further Research, References and Appendix
4.4 Capital Growth
Traditional metrics utilized in machine learning may not be directly transferable to the AI investor due to changes in data availability over time and class imbalance in the dataset. Therefore, we assess the model’s performance based on the presence of well-known companies in the resulting portfolio and the financial growth of the companies. In this subsection, we focus on the latter assessment.
To calculate the PnL of the success of companies, we need the company valuation during entry and exit rounds. The valuation of companies that exited due to longtime is set to zero. For companies marked as STILL_IN, we use their last known valuation since they are the youngest companies in the portfolio. The PnL is divided into realized and unrealized components. The unrealized PnL illustrates the current cumulative valuation of the portfolio, incorporating the presently known rounds, in contrast, the realized PnL denotes the cumulative sum garnered by exiting thriving companies and consequent capital growth. Results with exit reasons and valuations are presented in Table 2. Unfortunately, we didn’t have valuation data for all companies. There is a column ’Used in Capital Growth’ that shows whether the company was used to calculate the PnL.
We present cumulative PnL and the current portfolio size over time in Figure 2, with a step size of 1 month. The sharp rise in the middle of 2021 corresponds to the exit from Revolut. The companies that remained in the portfolio at the end of 2021 are all marked as STILL_IN. Overall, the PnL graph shows a positive trend, indicating the financial growth of the portfolio over time
To evaluate the algorithm via conventional machine learning metrics, we employ cross-validation for time-series analysis with a 1-year test window, spanning the years from 2016 to 2022. Within this test window, we focus on companies that secured B or C funding rounds during a given year and subsequently achieved success. Furthermore, to ensure the integrity of our analysis, the training dataset for each fold exclusively comprises companies whose success or failure status was known prior to the commencement of the test window. Standard binary classification metrics can be used to evaluate the performance of the model, and Recall is of particular interest to us. The minimization of False Negatives (FN) holds greater significance than that of False Positives (FP) in order to circumvent the omission of successful companies. Finally, in Table 1 we present metrics that have been averaged across 6 folds for a comprehensive evaluation of our predictive model’s performance:
This paper is available on arxiv under CC 4.0 license.