Pr. Francisco Herrera
Pr. Francisco Herrera


Dept.  Computer Sciences and Artificial Intelligence

Research group "Soft Computing and Intelligent Information Systems"

University of Granada

Plenary Session: The bridge between Big Data and Smart Data: Big Data Preprocessing


Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. 

The term of smart data is increasingly being used to refer to the challenge of transforming raw data into quality data that can be later processed to obtain valuable insights. To get quality data is the foundation for god data analytics. Big data are characterized by high dimensionality and large sample size, and they bring to noise, spurious correlations …, requiring to be processed to get smart data, quality data. A big data practitioner must make data quality a priority to rely on data to take actions, make decisions, or predict outcomes. 

Data preprocessing is the knowledge extraction area including those tasks that transform the original data to hold the valuable data, quality data. Data preprocessing  techniques adapt the data to fulfill the input demands of each data mining algorithm. Data preprocessing is an often neglected but major step in the data mining process. Data preprocessing includes data preparation methods for cleaning, transformation or managing imperfect data (missing values and noise data) and  data reduction techniques, which aim at reducing the complexity of the data, including feature and instance selection  and discretization.  

In this talk we present the connection between big data and smart data through big data preprocessing. The design of data preprocessing algorithms for big data requires to redesign the methods adapting them to the new paradigms such as MapReduce and the directed  acyclic graph model using Apache Spark. We will discuss the current approaches presenting real cases of study and some research challenges.


Francisco Herrera (SM'15) received his M.Sc. in Mathematics in 1988 and Ph.D. in Mathematics in 1991, both from the University of Granada, Spain. He is currently a Professor in the Department of Computer Science and Artificial Intelligence at the University of Granada. 

He has been the supervisor of 40 Ph.D. students. He has published more than 300 journal papers that have received more than 50000 citations (Scholar Google, H-index 115). He is coauthor of the books "Genetic Fuzzy Systems" (World Scientific, 2001)  and "Data Preprocessing in Data Mining" (Springer, 2015), "The 2-tuple Linguistic Model. Computing with Words in Decision Making" (Springer, 2015),  "Multilabel Classification. Problem analysis, metrics and techniques" (Springer, 2016), "Multiple Instance Learning. Foundations and Algorithms" (Springer, 2016). 

He currently acts as Editor in Chief of the international journals "Information Fusion" (Elsevier) and “Progress in Artificial Intelligence (Springer). He acts as editorial member of a dozen of journals. 

He received several honors and awards: ECCAI Fellow 2009, IFSA  Fellow 2013, 2010 Spanish National Award on Computer Science ARITMEL to the "Spanish Engineer on Computer Science", International Cajastur "Mamdani" Prize for Soft Computing (Fourth Edition, 2010), IEEE Transactions on Fuzzy System Outstanding 2008 and 2012  Paper Award  (bestowed in 2011 and 2015 respectively),  2011  Lotfi A. Zadeh Prize Best paper Award of the  International Fuzzy Systems Association, 2013 AEPIA Award to a scientific career in Artificial Intelligence, 2014 XV Andalucía Research Prize Maimónides  and Andalucia Medal 2017 (by the regional government of Andalucía). He has been selected as a 2014 Thomson Reuters Highly Cited Researcher (in the fields of Computer Science and Engineering, respectively).