Course Outline

Introduction

  • The Data Science Process
  • Roles and responsibilities of a Data Scientist

Preparing the Development Environment

  • Libraries, frameworks, languages and tools
  • Local development
  • Collaborative web-based development

Data Collection

  • Different Types of Data
    • Structured 
      • Local databases
      • Database connectors
      • Common formats: xlxs, XML, Json, csv, ...
    • Un-Structured
      • Clicks, censors, smartphones
      • APIs
      • Internet of Things (IoT)
      • Documents, pictures, videos, sounds
  • Case study: Collecting large amounts of unstructured data continuosly

Data Storage

  • Relational databases
  • Non-relational databases
  • Hadoop: Distributed File System (HDFS)
  • Spark: Resilient Distributed Dataset (RDD)
  • Cloud storage

Data Preparation

  • Ingestion, selection, cleansing, and transformation
  • Ensuring data quality - correctness, meaningfulness, and security
  • Exception reports

Languages used for Preparation, Processing and Analysis

  • R language
    • Introduction to R
    • Data manipulation, calculation and graphical display
  • Python
    • Introduction to Python
    • Manipulating, processing, cleaning, and crunching data

Data Analytics

  • Exploratory analysis
    • Basic statistics
    • Draft visualizations
    • Understand data 
  • Causality
  • Features and transformations
  • Machine Learning
    • Supervised vs unsurpevised
    • When to use what model
  • Natural Language Processing (NLP)

Data Visualization

  • Best Practices
  • Selecting the right chart for the right data
  • Color pallets
  • Taking it to the next level
    • Dashboards
    • Interactive Visualizations
  • Storytelling with data

Summary and Conclusion

Requirements

  • A general understanding of database concepts
  • A basic understanding of statistics
 35 Hours

Testimonials (2)

Related Categories