Text Analytics for Reconnaissance (TAR)

The semi-supervised GAN structure with knowledge distillation

As an intermediate pipeline for the TAR’s automated post-seismic recovery report through natural language processing, my GAN structure serves as a classifier for sentences.

Due to the non-differentiability of text, instead of letting the Generator output discrete data, I made it learn the continuous vector representation of sentences in the same embedding manifold as the discriminator, inspired by the concept of Knowledge Distillation by Hinton et al.’s “Distilling the Knowledge in a Neural Network”. As shown in the first illustration, the Generator’s output (smooth vector representations of words) is directly routed to the convolutional layers, bypassing the Discriminator’s embedding layer.

Abstract: Post-hazard reconnaissance for natural disasters (e.g., earthquakes) is important for understanding the performance of the built environment, speeding up the recovery, enhancing resilience and making informed decisions related to current and future hazards. Natural language processing (NLP) is used in this study for the purposes of increasing the accuracy and efficiency of natural hazard reconnaissance through automation. The study particularly focuses on (1) automated data (news and social media) collection hosted by the Pacific Earthquake Engineering Research (PEER) Center server, (2) automatic generation of reconnaissance reports, and (3) use of social media to extract post-hazard information such as the recovery time. Obtained results are encouraging for further development and wider usage of various NLP methods in natural hazard reconnaissance.