煤矿综采设备运行状态大数据清洗建模

Big data cleaning modeling of operation status of coal mine fully—mechanized coal mining equipment

  • 摘要: 针对煤矿综采设备运行状态数据量大、数据存在噪声和缺失值等问题,建立了一种基于MapReduce的煤矿综采设备运行状态大数据清洗模型。该模型采用双MapReduce协同工作:通过第1个MapReduce对数据中的噪声点和缺失值进行修正,输出多个清洗后的数据文件;通过第2个MapReduce对多个清洗后的数据文件按采集时间及日期进行排序,并合并成单个数据文件输出。实验结果表明,该模型能有效剔除噪声数据和补全缺失数据,具有较好的数据清洗效果。

     

    Abstract: In view of problems of large amount of data and noise and missing values existed in data of operation status of coal mine fully—mechanized coal mining equipment, a big data cleaning model of operation status of coal mine fully—mechanized coal mining equipment based on MapReduce was established. The model is composed of dual MapReduce. Noise points and missing values in data are corrected and multiple cleaned data files are output through the first MapReduce. The multiple cleaned data files are sorted according to collection time and date and combined into a single data file through the second MapReduce. The experimental results show that the model can effectively eliminate noise data and complement missing data with good data cleaning effect.

     

/

返回文章
返回