Short name: Scaling NewSum
Long name: Scaling NewSum – Big Data Text Clustering & Summarization using N-Gram graphs
Call: F4Fp-SME-COD1 (see call details)
Proposal number: F4Fp-SME-COD1-06
SUMMARY REMARKS & TESTBEDS
NewSum is a set of commercial software products developed after years of research on multi-document, multi-lingual (i.e. applicable to many languages) summarization. It a) collects information from multiple heterogeneous textual sources (utilizing crawling software components), b) groups information referring to the same topic (clustering components) and c) automatically extracts a summary for each topic (summarization components).
The objective of this project is to collect information that will allow NewSum to improve the quality of services it offers and reach strategic business goals. A set of experiments will be designed and run on the Tengu big data testbed with the goal to measure and evaluate a) the accuracy of candidate clustering components, b) the effectiveness (summary quality) of alternative summarization components and c) the overall speed over a ranging number of input data. The outcome of the Stage 1 experiments will provide feedback to elaborate on Stage 2 experiments, which will examine the scalability of the system in a cross-lingual summarization setting.