Analysis of Machine Learning in Earth Science Using Network Visualization and Natural Language Processing

Published in American Geophysical Union Winter Meeting, 2019

Recommended citation: Zheng L, Albayrak R, Teng W, Khayat M, Pham L (2019). "Analysis of Machine Learning in Earth Science Using Network Visualization and Natural Language Processing." American Geophysical Union. https://agu.confex.com/agu/fm19/meetingapp.cgi/Paper/602898

Machine learning (ML) is being increasingly utilized in Earth science research. Benefits of ML include efficiency, reduction of human error, and ability to extract hidden patterns within data. However, the mutual lack of each other’s domain knowledge by ML and Earth science stands as a barrier to timely and effective implementation. Earth science, in particular, faces challenges in generating sample data, compared to those of traditional ML problems such as face recognition or stock predictions, where data is abundant and not lacking in ground truth, which is necessary for labeling. Earth science data are more varying in formats, such as HDF5 and image resolutions, and are not standardized across instruments, even within a given Earth science discipline. Previous studies have been done to outline the specific challenges that Earth science faces with ML, while others have focused on using existing publications to mine information efficiently. Other resources such as Scikit-Learn have developed decision trees for choosing appropriate machine learning algorithms, but application within Earth science subjects becomes much more complex. For the current study, we propose a methodology and tool that aids in implementation of ML in Earth science using natural language processing (NLP). Our work comprises three main parts: (1) analyzing existing publications related to ML and Earth science, using natural language processing; (2) extracting from the publications information on ML models subjects in Earth Science; and (3) visualizing the extracted relationships as a network graph. The resulting network graph should aid the Earth science communities in applying optimal ML algorithms and guiding data preparation through visualization of similar studies. The network graph and analysis of document similarity will be the basis of our next step, which is to develop a decision tree for selecting optimal machine learning methodologies for specified Earth science applications.

See presentation here

Zheng L, Albayrak R, Teng W, Khayat M, Pham L (2019). "Analysis of Machine Learning in Earth Science Using Network Visualization and Natural Language Processing." American Geophysical Union.