We've collected 3,165 product recalls issued from the Food and Drug Administration from January 2012 until August 2016 reflecting a subset of regulatory actions taken by the from FDA during that time. Also included in the app is a database of 7,260 published articles from the food and beverage industry digital magazine foodbusinessnews.net
To create the app, first all data was read into R in plain text format, then was preprocessed to remove numbers, change all letters to their lower case variant, and omit stop words. Both the articles and recall descriptions were ready for searching at this point. To categorize the articles into easily understandable topics, Latent Dirichlet Allocation was performed using the 'lda' library in R. Lastly, for visualization, plot.ly was leveraged to easily produce interactive time series plots of the word frequency across time.
With a work flow centered on R's NLP capabilities, RShiny's web frame work, and plot.ly's interactive graphics, an analyst interested in better understanding their industry, but is short on time, can create a powerful social listening tool that will synthesis large amounts of data into useful graphics to help them form a high level understanding of the business landscape.