Nowadays, most urban societies have experienced a new phenomenon so-called urban traffic congestion, which is caused by crossing too many vehicles from the same transportation infrastructure at the same time. Traffic congestion has different consequences such as air pollution, decrease in speed, increase in travel time, fuel consumption and even incidents. One of the feasible solutions for bringing off the increase in transportation demand is to improve the existing infrastructure by means of intelligent traffic control systems. From a traffic engineering point of view, a traffic control system consists of physical network, control devices (traffic signals, variable message signs, so forth), the model of transportation demand and control strategy. The focus of this paper is on the latter especially traffic signal control.
Traffic signal control can be modeled by multi-agent systems perfectly because of its distributed and autonomous nature. In this context, drivers and traffic signals are considered distributed, autonomous and intelligent agents. Besides, due to high complexity arising in urban traffic patterns and nonstationarity of traffic environment, developing an optimized multi-agent system by preprogrammed agent’s behavior is most impractical. Therefore, the agents must, instead, discover their knowledge through a learning mechanism by interacting with the environment.
Reinforcement Learning (RL) is a promising approach for training the agent in which optimizes its behavior by interacting with the environment. Each time the agent receives information on the current state of the environment, performs an action in its environment, which may changes the state of the environment, and receives a scalar reward that reflects how appropriate the agent’s behavior has been in the past. The function that indicates the action to take in a certain state is called the policy. The goal of RL is to find a policy that maximizes the long-term reward. Several types of RL algorithms have been introduced and they can be divided into three groups: Actor-Only, Critic-Only and Actor-Critic methods.
Actor-Only methods typically work with a parameterized family of policies over which optimization procedures can be used directly. Often the gradient of the value of a policy with respect to the policy parameters is estimated and then used to improve the policy. The drawback of Actor-Only methods is that the increase of performance is harder to estimate when no value function is learned. Critic-Only methods are based on the idea to first find the optimal value function and then to derive an optimal policy from this value function. This approach undermines the ability of using continuous actions and thus of finding the true optimum. In this research, Actor-Critic reinforcement learning is applied as a learning method for true adaptive traffic signal control. Actor-Critic method is a temporal difference method that has a separate memory structure to explicitly represent the policy independent of the value function. The policy structure is known as the actor, because it is used to select actions and the critic is a state-value function.
In this paper, AIMSUN, which is a microscopic traffic simulator, is used to model traffic environment. AIMSUN models stochastic vehicle flow by employing car-following, Lane Changing and gap acceptance. AIMSUN API was used to construct the state, execute the action, and calculate the signal reward in each traffic light. The state of the each agent is represented by a vector of 1 + P components, where the first component is the phase number and P is the number of entrance streets which goes to intersection. Also, the action of the agent is the duration of the current phase. The immediate reward is defined as the reduction in the total number of cars waiting in all entrance streets. In fact, difference between the total numbers of cars in two successive decision points is used as a signal reward. The reinforcement learning controller is benchmarked against optimized pretimed control. The results indicate that the Actor-Critic controller decreases Queue length, travel time, fuel consumption and air pollution when compared to optimized pretimed controller.