Siamo spiacenti, il contenuto di questa pagina non è disponibile nella lingua desiderata.

Salta e vai al contenuto principale


Home Q-learning

Q-learning

Q-learning definition

Q-learning is a type of learning algorithm that allows an agent to learn the best actions to take in various scenarios. An agent is a software program that makes decisions and performs actions within a specific environment to achieve a goal.

See also: intelligent agent, supervised machine learning, unsupervised machine learning, machine learning

How Q-learning works

At its core, Q-learning involves an agent, an environment, and a Q-table. The environment presents different situations or states, and for each state, the agent can perform various actions. The Q-table helps the agent remember the quality or value (hence “Q”) of taking each action in each state based on the rewards it expects to receive.

The agent has no prior knowledge of the environment and explores it, taking actions and observing the outcomes in terms of rewards and new states. After each action, Q-learning updates the Q-values in the table using a formula that includes the reward received for the current action and the highest expected future rewards. This formula, crucial to Q-learning, is the Bellman equation. It balances the immediate reward with the anticipated rewards for future actions while also including a factor that makes future rewards progressively less significant. It ensures that immediate rewards have more influence than distant ones because the future is uncertain.

The agent continuously updates the Q-table based on its experiences. It uses a strategy that mixes exploring new actions randomly to discover their rewards and repeating known actions that have previously yielded high rewards. This exploration-exploitation balance is critical for the agent to learn effectively, avoiding getting stuck in poor behavior patterns.

Over time, as the agent explores more of the environment and updates the Q-table, the values become stable numbers that represent the optimal action to take in each state to maximize rewards. Once trained, the agent can use the Q-table to choose the best actions in any given state.

Q-learning is extremely useful in situations where the model of the environment is unknown or too complex to model directly. It allows the agent to learn from interactions with the environment without needing a predefined model, making it versatile and powerful for various decision-making tasks.