When Sandy made landfall in New York and along the New Jersey shoreline, the storm itself had already devastated islands in the Caribbean and moved along the eastern seaboard causing damage in coastal states along the way. In a much similar manner, the tweet stream about the storm ebbed and flowed and moved along with it. By the time the storm struck New York and New Jersey, the Twitter data stream was already heavy with tweets about its impact and its approach to the northeast.
Finding sense in a tweet storm is sometimes like trying to hold back the storm itself. The magnitude of the data that emerges in social media is only equal to the number of questions asked about information needs, situational information and the integrity of that information.
Accurate analysis of data using standardized statistical methods in scientific studies is critical to determining the validity of empirical research [source]. But in the emerging paradigm of the use of social media during disaster, there is little in terms of documented good practices for data collection and analysis. What facts can be derived from the data? Is the data ‘good’ enough to analyze? What types of questions or statistics can be applied in a manner that would allow ongoing empirical research for future events against past events.
Researchers are only beginning to explore these questions. There is much more work to be done.
But today, we are very pleased to release the report Analysis of Twitter Data during Hurricane Sandy. The report provides a unique snapshot about the tweets emerging in the initial days just before and after the storm made landfall in New York. This study was a focus as much on how data should be handled during collection in order to preserve data integrity.
One of the most interesting statistics we found was that the top 4 publishing modes accounted for 80% of the geocoded tweets; the top 8 account for 90% of tweets. These should be kept in mind when considering any type of device-specific content program. Another statistic showed a drop in geocoded information, this did not come as a surprise but it was useful to see it charted. The tweet traffic remained the same but the actual geocoded information dropped by half, which could have been in part, attributed to users turning off their GPS signals to extend battery life. What we did confirm was that partnering with Statistics without Borders in future events is a wise move. Statistics without Borders volunteers responsiveness and willingness to study and return the results on the data was great. They are one of our partners in the Digital Humanitarian Network and we look forward to working with them on the next event. Also, special gratitude to Joanna Lane from NY VOST who provided expertise and guidance and direction for this report and Cathy Furlong for putting together such a great team.
Special thanks to the following for contributing their time and dialogue to the preparation of this report
Team selection Cathy Furlong, Statistics without Borders
GIS and heat map results Paige Stover, Statistics without Borders
Network Relationships Joshua Saxe, Statistics without Borders
Analytics & data considerations by Tim B. Gravelle, Statistics without Borders
Additional guidance and recommendations by Joanna Lane, NY VOST
TweetTracker developed by Shamanth Kumar, Fred Morstatter and Dr. Huan Liu Arizona State University DMML Lab under a grant from the Office of Naval Research
Storm surge data acquired from AccuWeather