TY - JOUR
T1 - Feature construction as a bi-level optimization problem
AU - Hammami, Marwa
AU - Bechikh, Slim
AU - Louati, Ali
AU - Makhlouf, Mohamed
AU - Said, Lamjed Ben
N1 - Publisher Copyright:
© 2020, Springer-Verlag London Ltd., part of Springer Nature.
PY - 2020/9/1
Y1 - 2020/9/1
N2 - Feature selection and construction are important preprocessing techniques in data mining. They allow not only dimensionality reduction but also classification accuracy and efficiency improvement. While feature selection consists in selecting a subset of relevant feature from the original feature set, feature construction corresponds to the generation of new high-level features, called constructed features, where each one of them is a combination of a subset of original features. Based on these definitions, feature construction could be seen as a bi-level optimization problem where the feature subset should be defined first and then the corresponding (near) optimal combination of the selected features should be found. Motivated by this observation, we propose, in this paper, a bi-level evolutionary approach for feature construction. The basic idea of our algorithm, named bi-level feature construction genetic algorithm (BFC-GA), is to evolve an upper-level population for the task of feature selection, while optimizing the feature combinations at the lower level by evolving a follower population. It is worth noting that for each upper-level individual (feature subset), a whole lower-level population is optimized to find the corresponding (near) optimal feature combination (constructed feature). In this way, BFC-GA would be able to output a set of optimized constructed features that could be very informative to the considered classifier. A detailed experimental study has been conducted on a set of commonly used datasets with varying dimensions. The statistical analysis of the obtained results shows the competitiveness and the outperformance of our bi-level feature construction approach with respect to many state-of-the-art algorithms.
AB - Feature selection and construction are important preprocessing techniques in data mining. They allow not only dimensionality reduction but also classification accuracy and efficiency improvement. While feature selection consists in selecting a subset of relevant feature from the original feature set, feature construction corresponds to the generation of new high-level features, called constructed features, where each one of them is a combination of a subset of original features. Based on these definitions, feature construction could be seen as a bi-level optimization problem where the feature subset should be defined first and then the corresponding (near) optimal combination of the selected features should be found. Motivated by this observation, we propose, in this paper, a bi-level evolutionary approach for feature construction. The basic idea of our algorithm, named bi-level feature construction genetic algorithm (BFC-GA), is to evolve an upper-level population for the task of feature selection, while optimizing the feature combinations at the lower level by evolving a follower population. It is worth noting that for each upper-level individual (feature subset), a whole lower-level population is optimized to find the corresponding (near) optimal feature combination (constructed feature). In this way, BFC-GA would be able to output a set of optimized constructed features that could be very informative to the considered classifier. A detailed experimental study has been conducted on a set of commonly used datasets with varying dimensions. The statistical analysis of the obtained results shows the competitiveness and the outperformance of our bi-level feature construction approach with respect to many state-of-the-art algorithms.
KW - Bi-level optimization
KW - Data classification
KW - Evolutionary algorithms
KW - Feature construction
UR - http://www.scopus.com/inward/record.url?scp=85079758771&partnerID=8YFLogxK
U2 - 10.1007/s00521-020-04784-z
DO - 10.1007/s00521-020-04784-z
M3 - Article
AN - SCOPUS:85079758771
SN - 0941-0643
VL - 32
SP - 13783
EP - 13804
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 17
ER -