pkb contents > data science | just under 1184 words | updated 12/30/2017

1. What is data science?

1.1. Data science lifecycles

Via Mason and Wiggins (2010):

OSEMN Model Obtain Scrub Explore Model iNterpret
Alt Terms Acquire Clean Analyze Apply
Wrangle
Skills & Tools
  • Plain text
  • CSV
  • JSON
  • XML/HTML
  • Query DB
  • Query API
  • REST
  • Encoding
  • Filter data
  • Extract data
  • Extract values
  • Replace values
  • Handle NULL, missing data
  • Convert formats
  • Summary stats
  • Visualization
  • Clustering
  • Classification
  • Regression
  • Dimension reduction
  • Conclusion
  • Implications
  • Communication

1.1.1. Obtain

1.1.2. Scrub

1.1.3. Explore

See notes on data visualization.

[https://medium.com/@eytanadar/banning-exploration-in-my-infovis-class-9578676a4705] (https://medium.com/ @eytanadar/banning-exploration-in-my-infovis-class-9578676a4705 )

1.1.4. Model

See notes on models, statistics, machine learning, and text analytics.

1.1.5. iNterpret

1.2. What is a data scientist?

1.2.1. Responsibilities

Per Sharda et al. (2014, p. 300):

1.2.2. Skills

Per Sharda et al. (2014, p. 299):

SOFT

HARD

2. Data science tools

R, Python, Bash, SQL on MySQL, Spark, Excel, Tableau are most common; see 2016 Data Science Salary Survey and 2016 Stack Overflow Developer Survey.

2.1. Why command line for data science?

Per Janssens (2015):

2.2. Workflow management tools

3. Sources

3.1. Cited

Janssens, J. (2015). Data science at the command line: Facing the future with time-tested tools. Sebastopol, CA: O'Reilly.

Mason, H. & Wiggins, C. (2010). A taxonomy of data science [blog post]. dataists. Retrieved from http://www.dataists.com/2010/09/a-taxonomy-of-data-science/

Sharda, R., Delen, D., & Turban, E. (2014). Business intelligence: A managerial perspective on analytics (3rd ed.). New York City, NY: Pearson.

3.2. References

3.3. Read

3.4. Unread