During last years we have seen an impressive growth and diffusion of applications shared and used by a huge amount of users around the world, like for example social networks, web portals or elearning platforms. Such systems produce in general a large amount of data, normally stored in its raw format in log file systems and databases. To prevent an unmanageable growing of the necessary space to store data and the breakdown of data usability, such data can be condensed and summarized to improve reporting performance and reduce the system load. This data summarization reduces the amount of space that is required to store software data but produces, as a side effect, a decrease of their informative capability due to an information loss. In this work the problem of summarizing data obtained by the log systems of applications with a lot of users is studied. In particular a model to represent these raw data as temporal events collected in time sequences is proposed, methods to reduce the data size, collapsing the descriptions of more events in a unique descriptor or in a smaller set of descriptors, are provided and the optimal summarization problem is posed.
Data Summarization Model for User Action Log Files
GENTILI, ELEONORA;MILANI, Alfredo;POGGIONI, VALENTINA
2012
Abstract
During last years we have seen an impressive growth and diffusion of applications shared and used by a huge amount of users around the world, like for example social networks, web portals or elearning platforms. Such systems produce in general a large amount of data, normally stored in its raw format in log file systems and databases. To prevent an unmanageable growing of the necessary space to store data and the breakdown of data usability, such data can be condensed and summarized to improve reporting performance and reduce the system load. This data summarization reduces the amount of space that is required to store software data but produces, as a side effect, a decrease of their informative capability due to an information loss. In this work the problem of summarizing data obtained by the log systems of applications with a lot of users is studied. In particular a model to represent these raw data as temporal events collected in time sequences is proposed, methods to reduce the data size, collapsing the descriptions of more events in a unique descriptor or in a smaller set of descriptors, are provided and the optimal summarization problem is posed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.