Online Course – Certified Professional Specialization in NoSQL, Big Data, and Spark from IBM Institute

Jumpstart your big data career. Master the fundamentals of NoSQL, big data, and Apache Spark with market-ready, practical skills in machine learning and data engineering.

Suggested by: Coursera (What is Coursera?)

Professional Certificate

Beginners

No prior knowledge required

Time to complete the course

7-day free trial

No unnecessary risks

Skills you will acquire in the course

  • Cloud databases
  • Mongo DB
  • Cassandra
  • NoSQL
  • Claudent
  • Machine learning
  • Machine learning pipelines
  • Data Engineer
  • Spark ML
  • Apache Spark
  • Big data
  • Spark SQL
  • Apache Dope

What you will learn in the course

Courses for which the course is suitable

  • Data Engineer
  • Software developer
  • Information Systems Architect
  • Data Scientist
  • IT Manager

The specialization – a series of 3 courses

Data engineers and professionals with NoSQL skills are in high demand in the data management industry. This specialization is designed for those interested in developing foundational skills for working with Big Data, Apache Spark, and NoSQL databases. Three information-packed courses cover popular databases like MongoDB and Apache Cassandra, as well as the widely used Apache Hadoop toolkit for Big Data, and of course, the Apache Spark analytics engine for large-scale data processing.

Overview

  • Overview of different categories of NoSQL databases (not just SQL)
  • Practical work with several databases, including:
    • IBM Cloudant
    • MongoDB
    • Cassandra
  • Data management tasks, such as:
    • Creating database data
    • Their restoration
    • Data entry
    • updating
    • Deletion
    • Queries
    • Index
    • Aggregation
    • Data sharing
  • Basic knowledge of Big Data technologies such as:
    • Hadoop
    • MapReduce
    • HDFS
    • Hive
    • HBase
  • More in-depth knowledge of Apache Spark, including:
    • Spark DataFrames
    • Spark SQL
    • PySpark
    • Spark API
    • Scaling with Kubernetes
  • Working with Spark Structured Streaming and Spark ML to perform ETL (Extract, Transform, Load) processing and machine learning tasks.

Hands-on Learning Project

The emphasis in this specialization is on hands-on learning. Therefore, each course includes hands-on labs to practice and apply the NoSQL and Big Data skills learned during class.

First course
  • Working with multiple NoSQL databases – MongoDB, Apache Cassandra and IBM Cloudant to perform a variety of tasks:
    • Creating a database
    • Adding documents
    • Data queries
    • Using the HTTP API
    • Performing Create, Read, Update & Delete (CRUD) operations
    • Limiting and sorting records
    • Index
    • Aggregation
    • reconstruction
    • Using the CQL shell
    • Keyspace operations
    • Additional operations on tables
Second course
  • Launching a Hadoop cluster using Docker and running Map Reduce jobs.
  • You will explore working with Spark using Jupyter logs on a Python kernel.
  • Building Spark skills using DataFrames, Spark SQL, and scaling jobs using Kubernetes.
Third course
  • Using Spark for ETL processing.
  • Training and running machine learning models using IBM Watson.

This specialization is suitable for beginners in the field of NoSQL and Big Data – even if you are already working as a data engineer, developer, information systems architect, data scientist, or IT manager.

Details of the courses that make up the specialization

Introduction to NoSQL databases

  • Course 1
    • 18 hours
    • 4.6 (293 ratings)

Course details

What you’ll learn:
  • Distinguish between the four main categories of NoSQL databases.
  • Describe the characteristics, benefits, limitations, and applications of popular tools for processing big data.
  • Perform common tasks using MongoDB, including create, read, update, and delete (CRUD) operations.
  • Perform Keyspace, Table, and CRUD operations in Cassandra.
Skills you will acquire
  • Category: Cloud Database
  • Category: Mongodb
  • Category: Cassandra
  • Category: NoSQL
  • Category: Cloudant

Introduction to Big Data with Spark and Hadoop

  • Course 2
    • 19 hours
    • 4.4 (377 ratings)

Course details

What you’ll learn:
  • Explain the impact of big data, including use cases, tools, and processing methods.
  • Describe the Apache Hadoop architecture, ecosystem, methods, and related user applications, including Hive, HDFS, HBase, Spark, and MapReduce.
  • Implement Spark programming fundamentals, including parallel programming fundamentals for DataFrames, datasets, and Spark SQL.
  • Use Spark RDDs and datasets, optimize Spark SQL using Catalyst and Tungsten, and take advantage of Spark’s development options and runtime levels.
Skills you will acquire
  • Category: Big Data
  • Category: SparkSQL
  • Category: SparkML
  • Category: Apache Hadoop
  • Category: Apache Spark

Machine Learning with Apache Spark

  • Course 3
    • 15 hours
    • 4.5 (79 ratings)

Course details

What you’ll learn:
  • Describe ML, explain its role in data engineering, summarize generative AI, discuss the uses of Spark, and analyze ML pipelines and model preservation.
  • Evaluate ML models, distinguish between regression, classification, and distributional models, and compare data engineering pipelines to ML pipelines.
  • Build data analysis processes using Spark SQL, and perform regression, classification, and distribution using SparkML.
  • Implement connection to Spark clusters, build ML pipelines, perform feature extraction and transformation, and save models.
Skills you will acquire
  • Category: Machine Learning
  • Category: Machine Learning Pipelines
  • Category: Data Engineer
  • Category: SparkML
  • Category: Apache Spark