Content aggregation and management is an entirely automatic process which performs web crawling, contextual analysis and text clustering in order to present similar articles, blogposts multimedia content, and related comments from the social media together. Our crawler is able automatically to identify news sources and raise content (titles, text, multimedia) from them.
With a focus on big data crawling, our search index is one of the largest ones in Europe. The interface is extremely user friendly and features intuitive filters and reporting functionalities.
Content clustering is perfectly positioned to build on its unique technical and R&D capabilities to deliver increased intelligence insights. It operates on top of the large text repository and delivers groups of similar texts (clusters) that correspond to how an event is reported by the different digital media sources.
Palo analyses thousands of texts from various sources and types and finally organizes them into related clusters using an innovative statistical framework based on widely used techniques in science and engineering. Clustering has been proven to be a useful technique for information retrieval by discovering interesting information kernels and distributions in the underlying data. Consequently, it can provide information retrieval systems with the potential to alleviate users, while browsing and detecting quickly the needed information. In our case, we combine text and data from various sources and types and, based on weighted characteristic features, we process them in a way that finally the user will read the whole coverage of a topic with different levels of hierarchy articles. The more input data share we have, the closer they relate to each other in the analysis and they fully cover the topic.