Online Course – University of Alberta Certified Professional Internship in Reinforcement Learning

Master reinforcement learning concepts. Implement a complete RL solution and understand how to leverage AI tools to solve real-world problems.

Suggested by: Coursera (What is Coursera?)

Professional Certificate

Intermediate level

No prior knowledge required

Time to complete the course

7-day free trial

No unnecessary risks

Skills you will acquire in the course

  • Function estimates
  • Artificial Intelligence (AI)
  • Machine learning
  • Reinforcement learning
  • Intelligent systems

What you will learn in the course

Courses for which the course is suitable

  • Game Developer (AI)
  • Customer interaction systems developer
  • Smart Assistant Developer
  • Recommendation system developer
  • Supply Chain Manager
  • Industrial control key
  • Key in the field of financial development
  • Oil and Gas Lines Manager
  • Industrial control systems developer

Internship – 4-part course series

Reinforcement Learning specialization includes 4 courses that explore the power of adaptive learning systems and artificial intelligence (AI). To harness the full potential of artificial intelligence, adaptive learning systems are needed. You will learn how reinforcement learning (RL) solutions help solve real-world problems through trial-and-error interaction, by implementing a complete RL solution from start to finish.

By the end of the internship, learners will understand the fundamentals of many of the modern technologies in artificial intelligence (AI) and will be ready to move on to more advanced courses or apply AI ideation tools to real-world problems. The content will focus on “small-scale” problems to understand the fundamentals of reinforcement learning, while learning from world-renowned experts from the University of Alberta, Faculty of Science.

The tools learned in this specialization can be suitable for:

  • Game Development (AI)
  • Customer Interaction (How a Website Interacts with Customers)
  • Smart assistants
  • Recommendation systems
  • Supply Chain Management
  • Industrial control
  • Financial development
  • Oil and gas lines
  • Industrial control systems

Applied Learning Project

Through programming tasks and quizzes, students:

  • They will build a reinforcement learning system that can make automatic decisions.
  • Understand how RL relates to and fits under the broader umbrella of machine learning, deep learning, supervised and unsupervised learning.
  • Understand the space of RL algorithms (learning through temporal differences, Monte Carlo, Sersa, Q-learning, Policy Gradient, Dina, and more).
  • Understand how to formulate your task as an RL problem, and how to begin implementing a solution.

Details of the courses that make up the specialization

Fundamentals of sustained learning

Course 1

  • 15 hours
  • 4.8 (2,771 ratings)

Course Details

What you’ll learn
  • Describe problems as Markov decision-making processes
  • Understand basic exploration methods and exploration/exploitation balance
  • Understand value functions, as a general tool for making optimal decisions
  • Know how to apply dynamic programming as an effective solution approach to an industrial control problem
The skills you will acquire
  • Category: Function Optimization
  • Category: Artificial Intelligence (AI)
  • Category: Repetitive learning
  • Category: Lam machine
  • Category: Smart Systems

Example-based learning methods

Course 2

  • 22 hours
  • 4.8 (1,228 ratings)

Course Details

What you’ll learn

In this course, you will learn about several algorithms that can learn near-optimal policies based on interaction with the environment – learning from the agent’s personal experience. Learning from practical experience is impressive because it does not require prior knowledge of the dynamics of the environment, but can still achieve optimal behavior. We will discuss the simple but powerful Monte Carlo methods, and time-difference learning methods including Q-learning. We will conclude the course by exploring how we can combine the two worlds: algorithms that can combine model-based planning (similar to dynamic programming) and time-difference updates to dramatically speed up learning.

At the end of this course you will be able to:
  • Understand time-difference learning and Monte Carlo as two of the strategies for estimating value functions from sample experience
  • Understand the importance of exploration when using sampled experience rather than dynamic programming paths within a model
  • Understand the connections between Monte Carlo, dynamic programming, and time-difference learning
  • Implement and apply the TD algorithm, for evaluating value functions
  • Implement and apply the expected Sarsa and Q-learning (two TD methods for control)
  • Understand the distinction between on-policy and off-policy control
  • Understand planning with simulation experience (as opposed to traditional planning strategies)
  • Implement a model-based approach to RL, called Dyna, that uses simulation experience
  • Conduct empirical research to see the improvements in sample efficiency when using Dyna
The skills you will acquire
  • Category: Function Optimization
  • Category: Artificial Intelligence (AI)
  • Category: Repetitive learning
  • Category: Lam machine
  • Category: Smart Systems

Prediction and control with function optimization

Course 3

  • 21 hours
  • 4.8 (820 ratings)

Course Details

What you’ll learn

In this course, you will learn how to solve problems with large, high-dimensional, and potentially infinite state spaces. You will see that evaluating value functions can be presented as a supervised learning problem—function optimization—that allows you to build agents that carefully balance generalization and differentiation to maximize reward. We will begin this journey by exploring how policy evaluation or prediction methods such as Monte Carlo and TD can be extended to define function optimization. You will learn about feature building techniques for RL and learning representations using neural networks and recurrence. We will conclude this course with an in-depth look at policy gradient methods; a way to learn policies directly without learning a value function. In this course, you will solve two continuous-state control tasks and explore the benefits of policy gradient methods in a continuous-state environment. Prerequisites: This course builds heavily on the foundations of courses 1 and 2, and students should complete these before starting this course. Students should also be comfortable with probability and expectations, basic linear algebra, basic calculus, Python 3.0 (at least one year), and implementing algorithms from pseudocode.

At the end of this course you will be able to:
  • Understand how to use supervised learning approaches to evaluate value functions
  • Understand prediction objectives (value estimation) under function optimization
  • Implement TD with function optimization (state aggregation), in an environment with infinite state space (continuous state space)
  • Understand fixed-base approaches and neural networks for feature construction
  • Implement TD with function optimization using neural networks in a continuous-mode environment
  • Understand the new research challenges when moving to function optimization
  • Compare control discounting problem presentations versus average incentive problem presentations
  • Apply the expected Sarsa and Q-learning with function optimization in a continuous-mode control task
  • Understand objectives for direct policy evaluation (policy gradient objectives)
  • Implement a policy gradient method (called Actor-Critic) in a discrete state environment
The skills you will acquire
  • Category: Function Optimization
  • Category: Artificial Intelligence (AI)
  • Category: Repetitive learning
  • Category: Lam machine
  • Category: Smart Systems

Complete maintained learning system (Capstone)

Course 4

  • 15 hours
  • 4.7 (627 ratings)

Course Details

What you’ll learn

In this final course, you will combine your knowledge from courses 1, 2, and 3 to implement a complete RL solution to a problem. This capstone will allow you to see how each component—problem formulation, algorithm selection, parameter selection, and representation design—fits together into a complete solution, and how to make appropriate choices when implementing RL in the real world. This project will require you to implement both the environment for your problem’s stimulation and a control agent with neural network function optimization. In addition, you will conduct a scientific study of your learning system to develop your ability to evaluate the robustness of RL agents. To use RL in the real world, it is critical to (a) properly formulate the problem as a Markov decision process, (b) select the appropriate algorithms, (c) identify which choices in your implementation will have a large impact on performance, and (d) validate the expected behavior of your algorithms. This capstone is useful for anyone who plans to use RL to solve real-world problems. To succeed in this course, you will need to have completed courses 1, 2, and 3 of this specialization or their equivalent.

At the end of this course you will be able to:
  • Complete an RL solution to the problem, from problem formulation, selection of an appropriate algorithm and implementation, to empirical research on the effectiveness of the solution.
The skills you will acquire
  • Category: Function Optimization
  • Category: Artificial Intelligence (AI)
  • Category: Repetitive learning
  • Category: Lam machine
  • Category: Smart Systems