- Industry: Finance, Investments, Technology
- Technologies: Data Crawling, Word2Vec (Doc2Vec), gensim, Clusterization, and Text Classification
- Partnership period: 6 months
- Team size: 3 experts
- Software products: Provide functionality for crawling and aggregating news for trend detection purposes
- Expertise delivered: Web development services, Custom software development, implementation, and support
Don't have time to read?
Book a free meeting with our experts to discover how we can help you save time and money.Book a Meeting
The client is a leading financial data provider that covers the global venture capital, private equity, and public markets. We have remained their business partner for over 13 years, starting with a small team and growing into a primary full-cycle technology provider.
At the time the client approached us for assistance with the project under review, one of our teams had already been crawling financial news for this company for a while. Over this period, the number of information sources crawled had grown from a total of approximately 100 to 2,000-3,000. It became impossible for our team to process such a vast amount of information by relying only on human experts. It was decided to automate this process. While working on the solution, it became clear that the process had clear business value for our client and could thus be implemented as an additional microservice-enabled capability for them.
Developing a new microservice with a visual interface on the website that could help the client’s customers find trends to invest in. Broadening the client’s functionality with News Trend Detection Service will help acquire new clients, as well as win more trust from the existing ones.
The biggest challenge was definitely the volume of information to be processed — around 50,000 articles per day, which made it 18 million ones per year. The right set of Machine Learning algorithms had to be used and further refined to handle this volume properly. From the technical perspective, the team needed more powerful servers to analyze such a gigantic amount of information, as Machine Learning algorithms require a lot of processing power.
Need help with software development services?
Book a free meeting with our experts to find out how we can help you to build your project according to your business vision.Book a Meeting
The ML-oriented DataDev team took over this project. It consists of three top experts: Yuriy Batora in the position of Team Lead, and two Data Scientists — Oleksii Shashliuk and Denys Stupak. Yuriy Batora and a software developer took the initiative to find an ML-solution for the large amount of data collected by our partner. Everyone understood that there was great potential in the automatic processing of unused information. So, the first demo was created in three months. After the project was approved, the DataDev team was formed to handle the development.
“Our first move was to create software for the classification of news, grouping them into various topics, and deciding which of the news items were suitable for their further processing by human experts. At that moment we had around 20 specialists for the task, and it was important to reduce the workload on them. But while exploring the Machine Learning technology, we realized that it was able to interpret the context of news items and classify them in much smaller groups.”
– Yuriy Batora, Team Lead
By using the clusterization method and adjusting parameters, the team managed to train a model to group news items in a specific way. As a result, trend detection became possible: our Machine Learning algorithms could now group topics for a certain time span. As a topic became more popular, news on that topic was grouped, the related keywords were extracted, and a description for a new “trend” that used these keywords was created.
The next step was to run a trend through another set of ML algorithms to determine whether the trend was a declining, constant, or rising one. For example, this is how the research team was able to easily spot trends that became 1,000% more popular over the past month and take action as quickly as possible.
“We had used Word2Vec, and later Doc2Vec — ML-based technologies for text vectorization. Both the technologies convert text into a mathematical vector that represents the essence of the text. Word2Vec leverages Machine Learning in the form of Neural Networks to describe human speech in multiple dimensions. After Word2Vec and Doc2Vec do the magic, Clusterization Machine Learning algorithm groups the results.”
– Yuriy Batora, Team Lead
As mentioned previously, Machine Learning requires a lot of computing power. Initially, the project team used PCs, but then switched to the servers that were in use with the client at the time. However, it still took up to 24 hours to process a batch of news, which was too long. Because of this, the DataDev project team requested for a client to purchase GPU Farm, a video card-based system, capable of properly making fast calculations for complex algorithms. As a result, this solution expedited the process from almost a day to two to three hours, while also creating more opportunities for timely adjustments and adaptation of other algorithms.
The News Trend Detection Service, called Emerging Spaces, was completed within six months. It was the first project for our client that provided their customers with info upon request. Here is what the system’s user interface looks like:
Now, Emerging Spaces processes around 60,000 news topics a day, this number is constantly on the rise. The workload for manual data processing has decreased by 50%, which has improved the precision and speed of news analysis for human experts. Combined with another ML-based service that helps researchers contextualize news faster, Emerging Spaces has increased the overall efficiency of analysts dramatically.
Since the clients of a financial data provider are interested in discovering new trends in which to invest, we have added this functionality to the platform. The service that our team created has become a popular feature among the platform’s customers. At present, they can receive more holistic, precise, and valuable business insights. Currently, the DataDev team continues to support and expand the project, constantly receiving feedback from the users.
“I’m looking for investment opportunities and going thematically by the way you split up, it almost feels like you’ve tailored it to the stuff we care about.”
– VC Firm
“I just noticed your new Emerging Spaces. I didn’t know this industry exists! It was interesting to look through. Blockchain Real Estate, who knew! It’s interesting because it flips the thought process into this is interesting because it will grow. You see the biggest deals in fintech like oh yes they’ll be a lot of movement, but the insect foods there’s more opportunity. I’m going to look into investing in one of those companies now. This report says it’s going to become a $30B dollar industry in the next 10 years and right now it’s only $21M capital invested.”
– Director of Research
ARE YOU INTERESTED IN DEVELOPING ML-BASED SOFTWARE SOLUTION?
Contact our experts to get a free consultation and time&budget estimate for your project.Contact Us