Online Course – University of Alberta Certified Professional Internship in Reinforcement Learning

Master reinforcement learning concepts. Implement a complete RL solution and understand how to leverage AI tools to solve real-world problems.

Suggested by: Coursera (What is Coursera?)

Start your Coursera.com free trial today

קורס אונליין - התמחות מקצועית מוסמכת בלמידת חיזוק של אוניברסיטת אלברטה

Professional Certificate

Intermediate level

No prior knowledge required

Time to complete the course

7-day free trial

No unnecessary risks

Skills you will acquire in the course

Function estimates
Artificial Intelligence (AI)
Machine learning
Reinforcement learning
Intelligent systems

What you will learn in the course

Courses for which the course is suitable

Game Developer (AI)
Customer interaction systems developer
Smart Assistant Developer
Recommendation system developer
Supply Chain Manager
Industrial control key
Key in the field of financial development
Oil and Gas Lines Manager
Industrial control systems developer

Internship – 4-part course series

Reinforcement Learning specialization includes 4 courses that explore the power of adaptive learning systems and artificial intelligence (AI). To harness the full potential of artificial intelligence, adaptive learning systems are needed. You will learn how reinforcement learning (RL) solutions help solve real-world problems through trial-and-error interaction, by implementing a complete RL solution from start to finish.

By the end of the internship, learners will understand the fundamentals of many of the modern technologies in artificial intelligence (AI) and will be ready to move on to more advanced courses or apply AI ideation tools to real-world problems. The content will focus on “small-scale” problems to understand the fundamentals of reinforcement learning, while learning from world-renowned experts from the University of Alberta, Faculty of Science.

The tools learned in this specialization can be suitable for:

Game Development (AI)
Customer Interaction (How a Website Interacts with Customers)
Smart assistants
Recommendation systems
Supply Chain Management
Industrial control
Financial development
Oil and gas lines
Industrial control systems

Applied Learning Project

Through programming tasks and quizzes, students:

They will build a reinforcement learning system that can make automatic decisions.
Understand how RL relates to and fits under the broader umbrella of machine learning, deep learning, supervised and unsupervised learning.
Understand the space of RL algorithms (learning through temporal differences, Monte Carlo, Sersa, Q-learning, Policy Gradient, Dina, and more).
Understand how to formulate your task as an RL problem, and how to begin implementing a solution.

Details of the courses that make up the specialization

Fundamentals of sustained learning

Course 1

15 hours
4.8 (2,771 ratings)

Course Details

What you’ll learn

Describe problems as Markov decision-making processes
Understand basic exploration methods and exploration/exploitation balance
Understand value functions, as a general tool for making optimal decisions
Know how to apply dynamic programming as an effective solution approach to an industrial control problem

The skills you will acquire

Category: Function Optimization
Category: Artificial Intelligence (AI)
Category: Repetitive learning
Category: Lam machine
Category: Smart Systems

Example-based learning methods

Course 2

22 hours
4.8 (1,228 ratings)

Course Details

What you’ll learn

In this course, you will learn about several algorithms that can learn near-optimal policies based on interaction with the environment – learning from the agent’s personal experience. Learning from practical experience is impressive because it does not require prior knowledge of the dynamics of the environment, but can still achieve optimal behavior. We will discuss the simple but powerful Monte Carlo methods, and time-difference learning methods including Q-learning. We will conclude the course by exploring how we can combine the two worlds: algorithms that can combine model-based planning (similar to dynamic programming) and time-difference updates to dramatically speed up learning.

At the end of this course you will be able to:

Understand time-difference learning and Monte Carlo as two of the strategies for estimating value functions from sample experience
Understand the importance of exploration when using sampled experience rather than dynamic programming paths within a model
Understand the connections between Monte Carlo, dynamic programming, and time-difference learning
Implement and apply the TD algorithm, for evaluating value functions
Implement and apply the expected Sarsa and Q-learning (two TD methods for control)
Understand the distinction between on-policy and off-policy control
Understand planning with simulation experience (as opposed to traditional planning strategies)
Implement a model-based approach to RL, called Dyna, that uses simulation experience
Conduct empirical research to see the improvements in sample efficiency when using Dyna

The skills you will acquire

Category: Function Optimization
Category: Artificial Intelligence (AI)
Category: Repetitive learning
Category: Lam machine
Category: Smart Systems

Prediction and control with function optimization

Course 3

21 hours
4.8 (820 ratings)

Course Details

What you’ll learn

In this course, you will learn how to solve problems with large, high-dimensional, and potentially infinite state spaces. You will see that evaluating value functions can be presented as a supervised learning problem—function optimization—that allows you to build agents that carefully balance generalization and differentiation to maximize reward. We will begin this journey by exploring how policy evaluation or prediction methods such as Monte Carlo and TD can be extended to define function optimization. You will learn about feature building techniques for RL and learning representations using neural networks and recurrence. We will conclude this course with an in-depth look at policy gradient methods; a way to learn policies directly without learning a value function. In this course, you will solve two continuous-state control tasks and explore the benefits of policy gradient methods in a continuous-state environment. Prerequisites: This course builds heavily on the foundations of courses 1 and 2, and students should complete these before starting this course. Students should also be comfortable with probability and expectations, basic linear algebra, basic calculus, Python 3.0 (at least one year), and implementing algorithms from pseudocode.

At the end of this course you will be able to:

Understand how to use supervised learning approaches to evaluate value functions
Understand prediction objectives (value estimation) under function optimization
Implement TD with function optimization (state aggregation), in an environment with infinite state space (continuous state space)
Understand fixed-base approaches and neural networks for feature construction
Implement TD with function optimization using neural networks in a continuous-mode environment
Understand the new research challenges when moving to function optimization
Compare control discounting problem presentations versus average incentive problem presentations
Apply the expected Sarsa and Q-learning with function optimization in a continuous-mode control task
Understand objectives for direct policy evaluation (policy gradient objectives)
Implement a policy gradient method (called Actor-Critic) in a discrete state environment

The skills you will acquire

Category: Function Optimization
Category: Artificial Intelligence (AI)
Category: Repetitive learning
Category: Lam machine
Category: Smart Systems

Complete maintained learning system (Capstone)

Course 4

15 hours
4.7 (627 ratings)

Course Details

What you’ll learn

In this final course, you will combine your knowledge from courses 1, 2, and 3 to implement a complete RL solution to a problem. This capstone will allow you to see how each component—problem formulation, algorithm selection, parameter selection, and representation design—fits together into a complete solution, and how to make appropriate choices when implementing RL in the real world. This project will require you to implement both the environment for your problem’s stimulation and a control agent with neural network function optimization. In addition, you will conduct a scientific study of your learning system to develop your ability to evaluate the robustness of RL agents. To use RL in the real world, it is critical to (a) properly formulate the problem as a Markov decision process, (b) select the appropriate algorithms, (c) identify which choices in your implementation will have a large impact on performance, and (d) validate the expected behavior of your algorithms. This capstone is useful for anyone who plans to use RL to solve real-world problems. To succeed in this course, you will need to have completed courses 1, 2, and 3 of this specialization or their equivalent.

At the end of this course you will be able to:

Complete an RL solution to the problem, from problem formulation, selection of an appropriate algorithm and implementation, to empirical research on the effectiveness of the solution.

The skills you will acquire

Category: Function Optimization
Category: Artificial Intelligence (AI)
Category: Repetitive learning
Category: Lam machine
Category: Smart Systems

Online Course – University of Alberta Certified Professional Internship in Reinforcement Learning

Professional Certificate

Intermediate level

Time to complete the course

7-day free trial

Skills you will acquire in the course

What you will learn in the course

Courses for which the course is suitable

Internship – 4-part course series

The tools learned in this specialization can be suitable for:

Applied Learning Project

Details of the courses that make up the specialization

Fundamentals of sustained learning

Course 1

Course Details

What you’ll learn

The skills you will acquire

Example-based learning methods

Course 2

Course Details

What you’ll learn

At the end of this course you will be able to:

The skills you will acquire

Prediction and control with function optimization

Course 3

Course Details

What you’ll learn

At the end of this course you will be able to:

The skills you will acquire

Complete maintained learning system (Capstone)

Course 4

Course Details

What you’ll learn

At the end of this course you will be able to:

The skills you will acquire

Related products

Online Course – Microsoft 365 Copilot Certified Professional Internship for Leaders from Vanderbilt University

Online Course – Certified Professional Internship in Machine Learning with TensorFlow in the Google Cloud by Google Cloud Institute

Online Course – Certified Professional Internship in Machine Learning with Google Cloud’s TensorFlow

Online Course – Certified Professional Internship in Google Leadership Strategies and Fractal Analytics