Direct reinforcement learning (RL) algorithms are a subclass of machine learning algorithms that typically don't have multitudes of individual agents as complex adaptive systems (CAS) do, yet they are capable of solving difficult problems. Like complex adaptive systems, they do not require supervised learning, which is very attractive for inclusion in CASTrader. Reinforced learning approachs can be summarized as follows:
A reinforcement learning problem is defined by the elements S, A, T, and R, where S is a set of the environment’s states; A is a set of actions available to the learning agent; R is a reward extracted from the environment by the agent's actions; and T is a state transition function. The state transition and reward functions T and R are potentially unknown to the agent. The objective of the problem is to develop a policy to map from environment states to actions, that maximizes the long-term return. Future rewards are often discounted. The original RL framework was designed for discrete state-action spaces, but often uses an approximation method, to allow for generalization to unseen instances of the continuous state-action space.
Here are some of the important features to keep in mind when comparing reinforcement learning methods to a CAS approach, based on my very limited knowledge of RL:
- RL is a state-based approach. Reinforcement learning algorithms are applied in situations where the state of the environment is known (e.g. - the position of pieces on a chess board) and the sequence of steps leading up to that state are a Markov Process (the future depends only on the current state, not the entire past history). The algorithm determines what the next action should be. Defining states is considered an art.
- RL assumes Markov Processes. It's debatable whether the stock market has the Markov property or not, but CAS do not necessarily care, because they are not directly obsessed with the state of the system like reinforcement learning algorithms typically are. Typically, each agent is focused on a substate, not the big picture; but there appear to be so many design philosophies out there that the distinction is blurred. For instance, some reinforcement learning algorithms use CAS as a sub-algorithm (pdf), and an agent in a CAS could certainly use reinforcement learning algorithms.
- CAS is naturally suited to handling novel situations (states). A CAS approach seems to be a more natural fit for when novel or rare states are encountered, especially when those states act like they a composed of building blocks of substates. CAS agents experienced with those substates as bulding blocks can sometimes interact to provide an answer for the never before encountered or rare state. It's unclear whether reinforcement learning is up to the task of handling novel states as well as CAS sometimes can. So is the stock market repetitive or novel?
- CAS approaches are probably more scalable. Because typical CAS systems don't memorize all the states and actions as reinforcement learning often does, they scale better to large problems (although the addition of neural networks can be used to give a compact representation of states and actions). My impression is that reinforcement learning algorithms are more naturally suited to when states are repeated (although advanced approaches using neural networks tackle novel situations as well).
Here is a brief survey of how researchers have applied reinforcement learning to the financial markets:
- using cooperative "buy" and "sell" agents to simulate trading stocks on the Kospi with some success using technical trading indicators, Q-learning and neural networks to approximate the state space, plus a "turning point" approach to further reduce storage requirements. In "Stock Trading System Using Reinforcement Learning with Cooperative Agents"
- Reinforcement learning for optimized trade execution. In other words, a big institution wants to dump a big block of stock for the highest price possible by using the order book. The authors claim a 50% improvement in trade execution.
- in Three Automated Stock-Trading Agents: A Comparative Study, the researchers competed in a stock trading simulation on the Penn Exchange Simulator. The reinforcment learning agent was the worst of the three, but the paper shows an approach to using reinforcement learning for time-series data. The state space was modelled simply as the current price minus the exponential average price. The action space was defined equally as simple: the volume of shares to sell or buy. The authors were impressed at how well it did considering it's simplicity, and offer suggestions for it's improvement.
- In Performance Functions and Reinforcement Learning for Trading Systems and Portfolios, the authors use a reinforcement learning system to demonstare out-of-sample predictability of the S&P 500. They simulate switching between the SPY and T-Bills from 1970 to 1994 and achieve what appear to be approximately 17% returns (after transaction costs) vs. 10% on a buy and hold of the SPY - not bad. The system seems to switch to T-Bills and even short the market at the most opportune times, such as before the 1974 and 1987 debacles. They use 84 inputs consisting of financial and macroeconomic data. It is interesting to note that the authors use a multi-agent (CAS) approach to achieve these results. The system is discussed further in Learning to Trade via Direct Reinforcement (2001).
Miscellaneous links about reinforcement learning:
- This online book is very readable (I read it last night) and goes into the gory details of basic reinforcement learning systems (although there is a lot of "Basic" info there). A hardcopy can be bought at Amazon.
- For a less detailed overview, try this survey.
- Some very brief introductions are here and here.
- In reading the book cited above, it always seemed that examples are highly beneficial in understanding the concepts.
Although reinforcement learning has some issues, it is certainly a candidate for inclusion in CASTrader. At a minimum, it will provide some diversity of thinking in the agents.
Update: This paper lists some of the differences between reinforcement learning and evolutionary computation(EC), several of which mesh with the ideas above:
- No states (see above)
- No Markovian restrictions (see above)
- Straightforward hierarchical credit assignment (allows agents to be specialists and big picture types and get rewarded for being either)
- Non-hierarchical abstract credit assignment (it's theoretically easier for the machine to find new exploits of it's environment - this is somewhat akin to the novel situations argument above)
- Metalearning Potential (learning to learn rather than having the requirement of being guided to learn)
- EC doesn't really know what the value of various policies it develops are, nor how long it will take to find out. The author recommends the success-story algorithm as a remedy.