The Role of Autoencoders and Centroid Analysis in Predicting Startup Outcomes

cover
8 Aug 2024

Authors:

(1) Mark Potanin, a Corresponding (authorpotanin.m.st@gmail.com);

(2) Andrey Chertok, (a.v.chertok@gmail.com);

(3) Konstantin Zorin, (berzqwer@gmail.com);

(4) Cyril Shtabtsovsky, (cyril@aloniq.com).

Abstract and 1. Introduction

2 Related works

3 Dataset Overview, Preprocessing, and Features

3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset

3.3 Features

4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest

4.2 Backtest settings

4.3 Results

4.4 Capital Growth

5 Other approaches

5.1 Investors ranking model

5.2 Founders ranking model and 5.3 Unicorn recommendation model

6 Conclusion

7 Further Research, References and Appendix

5 Other approaches

5.1 Investors ranking model

All investors could be scored in terms of frequency, amount, and field of investments. Also, an investor could be an indicator of a company’s potential failure or success. This scoring was carried out in three stages:

  1. Through an autoencoder model with several modalities, we created vector representations for each investor

  2. According to experts’ estimates, we select a group of top investors, and further create the centroid of this group in the vector space

  3. We rank investors according to distance from the centroid

An elevated score corresponds to a proximate alignment with top investors. Results are presented in Table 4. If the lead investor of a company has a low score, it could be an indicator that such a company should be excluded from consideration.

Example: Company 14W has a score of 0.9 and invests in IT companies, incl. unicorns (for example, European travel management startup TravelPerk).

This paper is available on arxiv under CC 4.0 license.