Reinforcement learning
Introduction
Reinforcement learning is an area of a semi-supervised learning model in Machine learning. It's based on maximum reward in particular situation. It is used by various software and machines to find he best possible behavior. Reinforcement learning is difference from supervised learning.Reinforcement learning is all about making decision sequentially. In reinforcement learning decision is dependent, So, we give labels to sequences of dependent decisions.This is usually modeled as Markov Decision Process
Main points in Reinforcement learning –
- Input: The input should be an initial state from which the model will start
- Output: There are many possible output as there are variety of solution to a particular problem
- Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output.
- The model keeps continues to learn.
- The best solution is decided based on the maximum reward.
Application of reinforcement learning in real world
01) Resources management in computer clusters
Designing algorithms to allocate limited resources to different tasks is challenging and requires human generated heuristics.Through this reinforcement learning automatically learn to allocate and schedule computer resources to waiting job, it minimize the average job idle or shutdown.
Research paper : https://people.csail.mit.edu/alizadeh/papers/deeprm-hotnets16.pdf
02) Traffic Light Control
It showed superior results than traditional methods and shed a light on the potential uses of multi agent reinforcement learning in designing traffic system.
Research paper : http://web.eecs.utk.edu/~itamar/Papers/IET_ITS_2010.pdf
03) Robotics
There are largest work on applying reinforcement learning in robotics. Trained robot to learn policies to map raw video images to robot's actions. Reinforcement learning component was guided policy search to generate training data that came from its own state distribution.
04 ) Web System Configuration
There are more than 100 configurable parameters in a web system and the process of tuning the parameters requires a skilled operator and numerous trail and - error tests.
Through this we can do autonomic reconfiguration of parameters in multi-tier web system in VM- based dynamic environments.The reconfiguration process can be formulated as finite MDP ( Markov Decision Process )
Research paper : http://ranger.uta.edu/~jrao/papers/ICDCS09.pdf
05)Personalized Recommendations
Previous work of news recommendations faced several challenges including the rapid changing dynamic of news, users get bored easily and Click Through Rate cannot reflect the retention rate of users
06 ) Bidding and Advertising
Researchers from Alibaba Group claimed that their distributed cluster-based multi-agent bidding solution (DCMAB) has achieved promising results and thus they plan to conduct a live test in Taobao platform. The details of the implementation are left to users to investigate.
Generally speaking, Taobao ad platform is a place for merchants to place a bid in order to display ad to the customers. This could be a multi-agent problem because the merchants are bidding against each other and their actions are interrelated.
In here they used merchants and customers were clustered into different groups to reduce computational complexity. The state space of the agents indicated the cost-revenue status of the agents, action space was the bid (continuous), and reward was the revenue caused by the customer cluster.
Research paper https://arxiv.org/pdf/1802.09756.pdf
Through this we learn little bit knowledge about reinforcement learning and it's applications in various industries.
for more reading :
Comments