- Industry: Finance
- Team size: 3 experts
- Technologies: Data Crawling, fastText, gensim, kNN, Clusterization, Text Classification
- Services: A visual representation of groups of companies based on similarity and criteria chosen by the user
- Expertise delivered: Development from scratch, support
This project was developed for a leading financial data provider that covers the global venture capital, private equity, and public markets. We remained its business partner for over 13 years, starting with a small team and growing into a primary full-cycle technology provider. A product manager from a financial data provider analyzed the market and while their competitors already had some kind of Market Map solution, we were tasked with perfecting this idea and developing it as a separate service with a visual interface, flexible functionality, and additional features.
The client has a database with descriptions of the companies and their verticals. Each company is assigned a number of keywords. For a Market Map solution, we should take these descriptions, verticals, and keywords and clusterize them. The process of clusterization must be repeated numerous times, because each company operates in different areas. Each industry and vertical has its own hierarchy with multiple levels. Depending on the required extent of details, hierarchies must be adjustable.
We must turn company descriptions into vectors and clusterize them with Machine Learning algorithms. But this is where the first challenge comes in — each group must be named correctly. Manual processing will take a lot of time; hence, picking and adjusting the right ML algorithms is the only option. The second challenge is choosing the right hierarchy among industry codes, sectors, and verticals. Additionally, not all companies have a full or completely accurate description — it’s hard to group such companies correctly.
This project was taken over by the DataDev team, which was working on ML solutions. DataDev Team Lead Yuriy Batora and two Data Scientists — Oleksii Shashliuk and Denys Stupak — comprise the team.
“There is a big number of companies doing different things, which can be split by both industries and verticals. ‘Industries’ is a basic classification, like Healthcare, Aircraft, or Banking. ‘Verticals’ is a logic classification of businesses with custom parameters, created by a customer. For example, a client may be searching for drones, which is a sub-industry related to Aircraft, so drones or quadcopters are a ‘vertical” of Aircraft. Each industry has a certain amount of verticals. So, our Market Map Solution allows searching not only by industries but by verticals too, which is broadening the selection of businesses significantly.”
– Oleksii Shashliuk, Data Scientist
When the user is looking for an opportunity to invest only by industry, he can miss out on some not-so-obvious choices of companies categorized according to verticals. The main business value of Market Maps is to group businesses related to what they actually do at the moment, not by the industry they represent.
“Basically, the solution consists of three Machine Learning tasks — vectorization, clusterization, and naming the resulting clusters. For vectorization, we tried fastText, Word2Vec, and Doc2Vec. We had settled with the fastText as the best option. The problem with clusterization is deciding the right number of them; we dealt with it through trial and error. Our current way of clusterization is great, but there is room to grow here. The naming of the clusters is a problem because we have around 600,000 keywords and 400,000 of them are used only once.”
– Oleksii Shashliuk, Data Scientist
Yuriy, Denys, and Oleksii applied the best practices of Artificial Intelligence to solve the challenges. They developed a solution that can handle all three Machine Learning tasks, but as the project is growing some updates are planned to optimize the processes.
The Market Map Solution had been developed in a span of 9 months and now is a fully functional service with more than 2 million companies in the database. The project is in its fourth iteration and is supported by our DataDev team. A daily refresh of companies is provided. For example, when 1,000 new companies emerge the DataDev team sorts them according to the existing hierarchies. However, a complete rebuild of hierarchies occurs after a certain amount of time has passed or a number of companies and industries are added — this is in order to provide more accurate classification. Users have access to the Market Map service through a website interface and can customize their request as much as they want, including editing the results for their convenience. This is what the interface looks like:
36% of the users who ran the Companies and Deals search also decided to run this feature. Market Maps were created in 12% of all C&D searches. Corporate users loved the service, as in 17.5% of C&D searches market maps were created.
“This is excellent and very helpful. I think we can turn around mapping for each of the reps on either Top 20 or current 9 touch cadence targets.”
– Payroll services company
“We work with an application of SaaS company. We want to show them the ecosystem, who the players are. We want to use your market maps to get a good starting place and add companies to it… My main usage in this is to slice it up by segments and then I’ll move companies around. And then I’d download.”
– Consulting firm
“Visually it is very appealing. If we were diligencing an opportunity we could use it to get a feel for the direct competitors of the company and also a couple sectors over see if there any players with deep pockets and the ability to quickly transition into the map of the company we’re looking at, this would be a helpful tool.”
– VC Firm
ARE YOU INTERESTED IN DEVELOPING ML-BASED SOFTWARE SOLUTION?
Contact our experts to get a free consultation and time&budget estimate for your project.Contact Us