Summary

  • Industry: Finance, Investments, Technology.
  • Location: Headquartered in the U.S.
  • Technologies: Data Crawling, Word2Vec (Doc2Vec), gensim, Clusterization, and Text Classification.
  • Project Duration: 6 months
  • Size of the project team: 3 experts
  • Software products:  Provide functionality for crawling and aggregating news for trend detection purposes.
  • Expertise delivered: Custom Software Development, implementation, and support.

Challenge

One of SPD Technology’s major, strategic accounts, a US-headquartered global SAAS provider of data, research, and technology was in need of further automating their data collection and processing.

The company’s searchable investment database holds information on millions of companies and deals and hundreds of thousands of investors and funds. In order to detect both promising investment trends and investment-worthy business entities within them, the client needs to monitor a significant number of financial and other websites and clusterize this information.

Don't have time to read?

Book a free meeting with our experts to discover how we can help you.

Book a Meeting

By the time the need to automate the process more fully arose, the company had been our client for several years, and we had already been crawling a limited number of news sources for them for some time.

However, the number of information sources crawled had grown from a total of approximately 100 to 2000-3000, making it impossible to process them manually. Jointly with the client, we arrived at the decision to develop a micro service-enabled capability (the Emerging Spaces news trend detection service) that would allow efficiently processing the required number of news sources and the related vast volumes of information (around 50,000 articles per day, or around 18 million ones per year). The new functionality needed to have a visual interface to become a part of the client’s platform their B2B customers could use to find attractive investment trends.

Emerging Spaces

Expanding the functionality of the client’s platform with the news trend detection service was also regarded as a move to gain new clients, as well as win more trust from the existing ones.

Solution

To implement the project, SPD Technology created a 3-strong project team that consisted of 2 Data Scientists (one of them also being the Team Lead), and one ML software engineer.

One of the more imposing challenges our project team was facing was certainly the sheer volume of the information the functionality under development was intended to process. We needed to find the right combination of Machine Learning algorithms to enable the processing of the target volumes of data. Besides, the process had to involve a great deal of force learning, as the results delivered by the ML algorithms had to be continually verified by human experts for these algorithms to be further enhanced.

As the project required a great deal of computing power, our experts switched from their PCs to the servers that were in use with the client. However, it would still take us up to 24 hours to train the model for the daily batch of around 50000 news items, which was way too time-consuming.

Because of this, our project team requested the client to purchase a GPU Farm, – a video card-based system, capable of properly making fast calculations for complex algorithms. As a result, this sped up the process from almost a day to two to three hours, while also creating more opportunities for timely adjustments and adaptation of other algorithms.

It took our project team a total of 3 months to deliver a product demo and hand it over for further integration to our product development team engaged in developing the client’s platform. While developing the solution, our project team interacted closely with the project actors on the client’s side. They would approve all the interim deliverables for our team to proceed.

Delivering the complete solution has taken our project team around 6 months.

Technical Solution

We have used the following tech stack to implement the project:

  • Word2Vec
  • Doc2Vec
  • Python
  • Keras

Initially, our experts created software for the classification of news items, grouping them into topics and identifying the news items that are suitable for their further processing by human experts.

By using the clusterization method and adjusting parameters, our project team managed to train a model to group news items in a specific way. As a result, trend detection became possible: our Machine Learning algorithms could now group the topics for a certain time period.

As a topic grew more popular, the algorithms grouped the news items that were related to this topic, extracted the related keywords, and created a description for the new trend that used these keywords.

The next step our experts took was to run a trend through another set of ML algorithms to determine whether the trend was a declining, constant, or rising one. For example, this is how the research team managed to easily identify those trends that had become 1,000% more popular over the past month and needed to be attended to as quickly as possible.

Result

On budget and within an optimal timeframe, the client has received a cutting-edge solution that has fully fulfilled their business needs.
Currently, the Emerging Spaces trend detection service processes around 60,000 news topics per day, and this number is constantly on the rise.

Implementing the solution has reduced the workload related to the manual data processing by 50%, which has improved the precision and speed of news analysis by human experts. Combined with another ML-based service that helps researchers contextualize news items faster, Emerging Spaces has dramatically increased the overall efficiency of new analysis.

The service that our SPD Technology project team has built has become a popular feature among their platform’s users. Presently, these users can receive more holistic, precise, and valuable business insights to find promising trends to invest in.

Currently, our project team continues to support and expand the project, constantly receiving feedback from the solution’s users.

Ready to speed up your Software Development?

Explore the solutions we offer to see how we can assist you!

Schedule a Call