Data Analysis


Word Clouds in the Wild: Over the summer of 2020 and the winter of 2021, my partner and I worked with Professor Eric Alexander of the Computer Science department on a qualitative data visualization project called ‘Word Clouds in the Wild’ wherein we examined commonly used data visualization techniques, specifically word clouds generated using common data visualization tools like Voyant, Wordle, etc. The broader questions that the project sought to answer is ‘What are word clouds and when is something classified as a word cloud?’ and ‘Why do word clouds matter?’. In order to create a data visualization platform capable of superior topic modeling and gist forming techniques, we implemented modifications to current tools like Wordle, Voyant, etc, to overcome their underlying biases in order to create a platform capable of superior topic modeling and gist forming techniques. This was also my first time working on a research paper for a national conference on Human Computer Interaction conference under the professor I had been working with. This is ongoing work that will continue in iterations over the next several years. 
The post-it Miro board for our word cloud feature hierarchies
The post-it Miro board for our word cloud feature hierarchies
A cool word cloud that we discovered during the coding process
A cool word cloud that we discovered during the coding process
A graphic on the website showing the racial gap across struck jurors for black defendants
A graphic on the website showing the racial gap across struck jurors for black defendants
A graphic on the website showing the state struck rates over time by race and gender
A graphic on the website showing the state struck rates over time by race and gender

Biases and the U.S. Jury Selection System: In this project for my Data Science course, my partner and I analyzed a dataset of peremptory strikes in the Fifth Circuit Court District of Mississippi from 1992 through 2017. The dataset contains information about 418 trials that Evans’ office persecuted during 26 years with more than 115,000 pages of court records and jury selection transcripts included in the database. The data collected in this investigation is a very limited sample of the entire jury selection system in the United States. Nonetheless, it is considered one of the most complete datasets available and can shed light on the impact of race in the jury selection system, and hopefully help transition towards a less biased system. This was a fun way to delve into data science backed by visual graphics and was also a helpful introduction to shiny applications which has been very useful for projects after.

Access the website here.

County-level Investigation of Factors Associated with COVID-19 Cases and Deaths: The ongoing pandemic greatly affected my daily life, personal views on the current legislation, and how we function a society. As of June 2021, the United States alone had 33.4 million total confirmed cases and 597 thousand deaths. The pandemic has highlighted existing racial health inequities, flaws in our healthcare system, a lack of PPE and dwindling hospital capacity, and the politicization of public health practices such as wearing face masks or vaccinations. In this interactive shiny application, we investigated the relationship between COVID-19-related deaths and cases (per capita, cumulative, and new since the previous month over time) and the frequency of mask-usage, vaccine hesitancy, political affiliation, total hospital beds, and regional differences by county. Not only did this analysis shed light on certain inequalities exposed by this pandemic, but also showed how public health and political affiliation have become intertwined, or how seemingly unrelated factors may play a role in how a county is affected by COVID-19. It was great to work on something substantial and informative while also providing users with a good user experience.

Access the website here.

Time Series plot constructed from the data for variable y-axis variables
Time Series plot constructed from the data for variable y-axis variables
A map of how COVID-19 cases and deaths change over time from February 2020 to the date the project was turned in. Users have the option to facet by any categorical variable or filter by a single state.
A map of how COVID-19 cases and deaths change over time from February 2020 to the date the project was turned in. Users have the option to facet by any categorical variable or filter by a single state.
css.php