Saturday, February 24, 2018

Data Update #2

Lead

When the RMS Titanic sank in 1912, 62% of the first class passengers survived while only 25% of the third class passengers survived; this suggests that the socioeconomic status of the passengers played a major role in their final moments aboard the Titanic.

Excel Workbook: Raw Data & Small Slice

My excel workbook is divided into two different sheets: the original raw data and a smaller slice. The smaller slice of data shows the total number of passengers in each class divided into two different coded categories: 0 for died and 1 for survived. Below the total is the share of survived passengers for each class which is the percentage of each class's survival rate.

Original Data Set

The original dataset can be found here. The main source of information for the dataset is from the Encyclopedia Titanica

Supporting Article

A JSTOR Daily article by Nashwa Khan published on June 2, 2016 titled "What the Titanic Reveals About Class and Life Expectancy" discusses the link between socioeconomic status and the lower survival rate for third class passengers aboard the Titanic. The article highlights how as a passenger's socioeconomic status declined so did their likelihood of survival. The article suggests using the event of the Titanic sinking as a lens to help analyze the relationship between socioeconomic status and life expectancy.

Monday, February 12, 2018

Data Update #1

1. What dataset will you use for your final report?

The dataset I will be using for my final report is the passenger manifest from the RMS Titanic. The Titanic dataset includes a variety of information about the passengers aboard the Titanic. It is important to note that this dataset does not include any information about the crew of the Titanic.

2. Describe the dataset. What kind of data does it contain?

The dataset contains information regarding passengers' names, sex, age, fare (British pound), and passenger class. The dataset also has information on whether or not the passengers survived the fatal incident. Other information that is included for some but not all passengers is ticket number, home destination, cabin, boat number, and body identification number.

3. Is there anything about your data that you don’t understand? (i.e.
what a column heading means). How will you find this out?

There were two variables that I was unable to understand at first. One column heading was titled sibsp and another column heading was titled parch. However, I was able to find a key to the dataset that provided variable descriptions. Sibsp is the number of siblings and/or spouses aboard the Titanic and parch is the number of parents and/or children aboard the Titanic.

4. What are some questions you hope to answer with your data? List at
least three. (you don’t need the answers at this point)

There are a few questions I would like to answer with this dataset:

  • What is the percentage of women versus men that survived? 
  • Which passenger class had the highest survival rate? Which passenger class had the lowest?
  • Of the cities from which passengers embarked, which had the highest survival rate? Which city had the lowest?
  • What was the percentage of first-class women versus third-class women that survived? Compare to the percentage of survival for first-class and third-class men.
  • How many passengers had siblings or spouses aboard with them?
  • How many passengers had parents or children aboard with them?