Monday, February 12, 2018

Data Update #1

1. What dataset will you use for your final report?

The dataset I will be using for my final report is the passenger manifest from the RMS Titanic. The Titanic dataset includes a variety of information about the passengers aboard the Titanic. It is important to note that this dataset does not include any information about the crew of the Titanic.

2. Describe the dataset. What kind of data does it contain?

The dataset contains information regarding passengers' names, sex, age, fare (British pound), and passenger class. The dataset also has information on whether or not the passengers survived the fatal incident. Other information that is included for some but not all passengers is ticket number, home destination, cabin, boat number, and body identification number.

3. Is there anything about your data that you don’t understand? (i.e.
what a column heading means). How will you find this out?

There were two variables that I was unable to understand at first. One column heading was titled sibsp and another column heading was titled parch. However, I was able to find a key to the dataset that provided variable descriptions. Sibsp is the number of siblings and/or spouses aboard the Titanic and parch is the number of parents and/or children aboard the Titanic.

4. What are some questions you hope to answer with your data? List at
least three. (you don’t need the answers at this point)

There are a few questions I would like to answer with this dataset:

  • What is the percentage of women versus men that survived? 
  • Which passenger class had the highest survival rate? Which passenger class had the lowest?
  • Of the cities from which passengers embarked, which had the highest survival rate? Which city had the lowest?
  • What was the percentage of first-class women versus third-class women that survived? Compare to the percentage of survival for first-class and third-class men.
  • How many passengers had siblings or spouses aboard with them?
  • How many passengers had parents or children aboard with them?



6 comments:

  1. Hi Hannah,

    This is a really cool dataset to explore. Just curious were the locations on the ship that were first class or third class in more or less dangerous areas? If so thats gonna be a really interesting question to answer and explore even further. Great Job :)

    ReplyDelete
  2. I like the data set you’ve chosen. I would have never thought about gathering data from a list of passengers on board the RMS Titanic. I’ll definitely follow-up on this as the list of questions you have down are great, and will be very informative.

    ReplyDelete
  3. I think that this is one of the most creative ideas for a data set. It provides a ton of interesting questions and the way you set up the response is clear and points to this being a very interesting post to continue to learn more

    ReplyDelete
  4. You chose a really neat dataset to work with! It will be interesting to see what revealing information you might find in this, that hasn't really been explored yet.

    ReplyDelete
  5. Gah I wish I chose this data set!!! Definitely an intriguing set of data to explore with great questions you have. The class survival rate would be the most interesting question to explore for myself, personally. I look forward to seeing the answer to your questions.

    ReplyDelete
  6. I agree with everyone else. This is a cool dataset, and I'm honestly a bit jealous that I didn't get to it first. Other than that, these are great questions.

    As a side note, I feel like the "sibsp" and "parch" headings of the original dataset would be easier to understand if slashes were used (sib/sp, for example).

    ReplyDelete