Data Scientist Using R (1) - Introduction

Hui Lin

3/29/2017

Introduction

Outline

What is data science?

For a long time I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt. … All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.

What is data science?

“This coupling of scientific discovery and practice involves the collection, management, processing, analysis, visualization, and interpretation of vast amounts of heterogeneous data associated with a diverse array of scientific, translational, and interdisciplinary applications.”

What is data scientist?

What is data scientist?

Here is a list of definitions for a “data scientist”:

%&^%$*(^)…..

What is “hard-core pornography”?

“I know it when I see it. (Potter Stewart)”

Brief History

Driving Forces

  1. The formal theories of math/stat
  2. Acceleration developments in computers and display devices
  3. The challenge, in many fields, of more and ever larger bodies of data
  4. The emphasis on quantification in an ever wider variety of disciplines

Is it science? Totally?

There are diverse views as to what makes a science, but three constituents will be judged essential by most, viz:
(a1) intellectual content,
(a2) organization in an understandable form,
(a3) reliance upon the test of experience as the ultimate standard of validity

Is it science? Totally?

“Science is knowledge which we understand so well that we can teach it to a computer.”

What questions can data science answer?

Types of Questions

Types of Learning

Types of Algorithm

Data Scientist Skill Set

General process of data analytics?

Discussion