TY - JOUR
T1 - Smooth quantile regression and distributed inference for non-randomly stored big data
AU - Wang, Kangning
AU - Jia, Jiaojiao
AU - Polat, Kemal
AU - Sun, Xiaofei
AU - Alhudhaif, Adi
AU - Alenezi, Fayadh
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2023/4/1
Y1 - 2023/4/1
N2 - In recent years, many distributed algorithms towards big data quantile regression have been proposed. However, they all rely on the data are stored in random manner. This is seldom in practice, and the violation of this assumption can seriously degrade their performance. Moreover, the non-smooth quantile loss brings inconvenience in both computation and theory. To solve these issues, we first propose a convex and smooth quantile loss, which converges to the quantile loss uniformly. Then a novel pilot sample surrogate smooth quantile loss is constructed, which can realize communication-efficient distributed quantile regression, and overcomes the non-randomly distributed nature of big data. In theory, the estimation consistency and asymptotic normality of the resulting distributed estimator are established. The theoretical results guarantee that the new method is adaptive to the situation where the data are stored in any arbitrary way, and can work well just as all the data were pooled on a single machine. Numerical experiments on both synthetic and real data verify the good performance of the new method.
AB - In recent years, many distributed algorithms towards big data quantile regression have been proposed. However, they all rely on the data are stored in random manner. This is seldom in practice, and the violation of this assumption can seriously degrade their performance. Moreover, the non-smooth quantile loss brings inconvenience in both computation and theory. To solve these issues, we first propose a convex and smooth quantile loss, which converges to the quantile loss uniformly. Then a novel pilot sample surrogate smooth quantile loss is constructed, which can realize communication-efficient distributed quantile regression, and overcomes the non-randomly distributed nature of big data. In theory, the estimation consistency and asymptotic normality of the resulting distributed estimator are established. The theoretical results guarantee that the new method is adaptive to the situation where the data are stored in any arbitrary way, and can work well just as all the data were pooled on a single machine. Numerical experiments on both synthetic and real data verify the good performance of the new method.
KW - Big data
KW - Communication efficiency
KW - Distributed algorithm
KW - Quantile regression
UR - https://www.scopus.com/pages/publications/85144014600
U2 - 10.1016/j.eswa.2022.119418
DO - 10.1016/j.eswa.2022.119418
M3 - Article
AN - SCOPUS:85144014600
SN - 0957-4174
VL - 215
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 119418
ER -