LWL | Psychology, Machine Learning, and AI

Application of Psychology in Artificial Intelligence and Machine Learning
Siddarth Gupta

05 September 2022 | Download
Student's Published Works


Classical and operant conditioning is the basis of learning in psychology and is imperative to the learning habits of humans. One of the objectives of Artificial Intelligence (AI) is to mimic human learning for the creation of a fully functioning artificial human mind and brain. This research paper aims to induce conditioning into Machine Learning (ML) algorithms to design machine intelligence with commonsense and high learning capabilities. We see how stimulus-response and rewards & punishment play an imperious role in reinforcement learning and deep learning - a subpart of machine learning. Although reinforcement learning is similar to operant conditioning, there are still a lot of complexities of the human mind which pose a hindrance to the invention of General Artificial intelligence which has been featured in science fiction stories for over a century. We discover how building algorithms using logic and theoretical knowledge can be influential in the answer to integrating common sense into the environment of machine learning. When we are using reinforcement learning, it explores and exploits the data in the environment and learns by mistake and error. When we apply classical conditioning, the machine can use a loop into three subparts: the first one associates events that occur simultaneously, the second part collects the evidence that a made association is important, and the third recognizes the important association. The ability to recognize prior associations allows for the formation of both evidence for and against the existence of a given relationship, as well as the creation of more complicated associations by recognizing occurrences of strongly associated event pairs as events in and of themselves.

Keywords: Psychology, AI, Machine Learning


I begin the paper with a preface to conditioning and its types like classical conditioning and instrumental conditioning. Further on I explore the idea of machine learning and various types of machine learning which can be incorporated with conditioning such as reinforcement learning. The paper aims to explore the idea of reward, punishment, and stimulus in Artificial Intelligence for the development of common sense and human-like minds and learning habits. The hypothesis that is proposed in the research paper relates to the fact that humans learn from stimulus-to-stimulus association. And in the second part how reinforcement learning is similar to operant conditioning. I have come up with essentially two hypotheses which outline the project in the paper

Hypothesis A

Classical conditioning can be used as a model of its environment to incorporate common sense partially or completely in a machine learning algorithm.

Hypothesis B

Operant conditioning can be used as a model of its principle to replicate the reward and punishment phenomenon in reinforcement learning

Beginning with the psychology aspect, conditioning is a behavioural process whereby certain feedback becomes more frequent/predictable in a given surrounding as a result of reinforcements (Elmer, 2020). Reinforcements typically are a stimulus or reward for the preferred response. It mainly has two types: Classical and Operant conditioning. Classical conditioning is a type of learning that happens unconsciously. When you learn through classical conditioning, an automatic conditioned response is paired with a specific stimulus (Elmer, 2020). (This creates a behaviour, a very famous example is Pavlov’s dog experiment in which he explores the concept of dog salivation when seeing food to create a conditioned stimulus in which the dog salivates when he hears a bell or some sound that makes him expect a stimulus like food (Elmer, 2020). Operant conditioning is a method of learning that employs rewards and punishments for behaviour (Cherry, 2020). Through operant conditioning, an association is made between a behaviour and a consequence (whether negative or positive) for that behaviour. For illustration, when lab rats press a switch when a green light is on, they admit a food bullet as a price. When they press the switch when a red light is on, they admit a mild electric shock. As a result, they learn to press the switch when the green light is on and avoid the red light (Cherry, 2020).

In the technological aspect, Artificial Intelligence (AI) is the creation of the stimulation of human-level intelligence in a machine to mimic human beings in real-time (Education, 2022). Machine learning (ML) is referred to as the process by which a machine learns and reacts to its environment and surroundings depending on the data it has been provided and the data it has collected (Education, 2022a). There are various types of machine learning for example supervised learning, unsupervised learning, and reinforcement learning (Education, 2022a). In the paper, reinforcement learning is used as the basis of the research considering that it is a model of the reward and punishment mechanism. In reinforcement learning, an agent (the machine/algorithm) explores and exploits its environment through trial and error to receive the desired results (Education, 2022a). Reinforcement learning is mainly used in autonomous driving, healthcare, and various other fields (Education, 2022a).

Classical conditioning application in Artificial intelligence

Classical conditioning can be considered the model of its surroundings and the stimulus, then common sense knowledge can be designed as a model of classical conditioning to be later incorporated into reinforcement learning from stimulus to stimulus (Anirudh, 2019). By building algorithms of associations, we can essentially build a model of the environment and common sense (Anirudh, 2019). The agent - which is the machine - can be conditioned to react to the reward (being the common-sense knowledge) by forming an interdependence between unconditioned stimulus and conditioned stimuli. We see the feedback loop in which the significant associations are made through three steps of the hierarchy: the first phase connects events that happen at the same time, the second portion gathers evidence that a created connection is significant, and the third party acknowledges the significant connection. The capacity to detect past connections allows for the construction of both evidence for and against the presence of a particular link, as well as the creation of more sophisticated associations by treating the occurrences of highly related event pairs as separate events (Anirudh, 2019). This also creates a hierarchical model of data.

Instrumental learning in Reinforcement learning

Knowing that reinforcement learning is used in situations where there is a reward and punishment, instrumental conditioning is the most ideal learning style present in the field of psychology which can be integrated with machine learning (Poddiachyi, 2020). By principle on the surface, both the phenomena, in theory, follow the same ideology of trial and error, as we see the agent learns through exploration of the environment what to avoid and what to exploit in a given situation. The Law of Effect highlights two key aspects of animal learning that are replicated in RL algorithms (Poddiachyi, 2020). To begin, an algorithm must be selectional, which means it must test many actions and choose one based on the results. Second, the algorithms must be associative, which means that they must link specific situations (states) to actions discovered during the selection phase.


The system is implemented as four modules that reflect the system's key processing phases. These modules are pre-processing, recognition, association, and significance. The pre-processing module converts object movement data into predefined events from which patterns are learned. Existing event patterns are recognized by the recognition module, which creates tokens that reflect those patterns. To create new candidate patterns, the association module associates the pattern tokens depending on their time. Finally, the significance module gathers fresh pattern evidence and evaluates whether a pattern is considered to exist based on classical conditioning. The system is built around a feedback loop that connects the pattern recognition, association, and significance modules. The modules and the data that is exchanged between them are depicted in Figure 1.

The system gradually takes time-frames that include bounding boxes for each object of interest in the observed scene as input. The first module examines these boxes and makes a record of the items' spatial connections. The module then identifies certain differences in spatial connections between each frame and utilises these differences as the fundamental event instances that the recognition system uses to recognize patterns of those events.

System Flow Chart

Figure 1: Model System

Materials and Research Methodology

This chapter explores the methodology of research. Papers related to the hypothesis was researched on the internet, particularly over Google Scholar. I have formed results and conclusions through them and attempted to build algorithms by creating hypotheses to be integrated into the machine and deep learning processes to build better reinforcement learning.


As result, it is discovered that hypothesis A doesn't fail due to the fact that the presumptions that the positive correlation of exactness between the classical conditioning model and how well the model performed in certain situations failed but some results were in line with the expectations. The model of the three systems did not necessarily fail as the model worked out of the classical conditioning model of fidelity to learn through association. While the data do include some evidence in favour of the hypothesis, it is not clear enough to proclaim the hypothesis as standing or to declare the hypothesis to be disproved. This opens room for further research and speculation through various algorithms, systems, and models. It is also discovered that the model can be used to teach machine learning about common sense knowledge partially. Furthermore, we discover that hypothesis B is more successful than hypothesis A as the reinforcement learning model is based on rewards and punishments. Reinforcement learning has improved further from the law of effort which is the principle based on instrumental learning (Poddiachyi, 2020). We see that the model has been very successful in terms of instrumental learning but also has some aspects which do not concern the operant conditioning model. The exploration phase for an RL algorithm is the selection process, and there are numerous ways to do it. For instance, consider the greedy policy, which stipulates that an agent selects a random action with a chance of and then chooses greedily (the action that provides the greatest immediate reward) with a probability of 1 (Anirudh, 2019). We're attempting to answer The Exploration-Exploitation Dilemma, which states, to put it simply, when one should stop exploring and start exploiting, by steadily reducing during training. Another intriguing aspect is motivation. It is what impacts the intensity and direction of behaviour in instrumental conditioning (Anirudh, 2019). The food was left outside the box in Thorndike's experiment. When the cat escapes from the box, it gets the food, which reinforces the activities it took to get out. Of course, it has nothing to do with reality, but the reward signal is essentially motivation, and the goal is to make the agent's experience gratifying.

Reinforcement learning is the study of how artificial systems may handle challenges involving instrumental conditioning (Anirudh, 2019). Perhaps less clear is the link between reinforcement learning and classical conditioning. Learning to act in a way that maximizes rewards and minimizes punishments, on the other hand, necessitates the capacity to foresee future rewards and punishments (Anirudh, 2019). As a result, most reinforcement-learning systems include this capability. One method for predicting future reinforcements based on temporal differences accounts well for behavioural (e.g., Sutton & Barto, 1990) and neural (e.g., McClure, Berns, & Montague, 2003; O’Doherty, Dayan, Friston, Critchley, & Dolan, 2003; O’Doherty et al., 2004; Schultz, 1998, 2002; Schultz, Dayan, & Montague, 1997; Schultz & Dickinson, 2000; Suri, 2002) findings on classical conditioning.


As hypothesis A is not a complete failure but partial success, the model can be improved and the results can have a large variety of applications, especially in the field of self-driving cars and where stimuli - stimuli response can be a determining factor and help with the development of AI and creating a general AI or even super-powered one eventually.

Hypothesis B can be used incorporated where reinforcement learning is used and directly be used to create an ecosystem for the machine learning program to learn quickly



Yechiam, E., Busemeyer, J. R., Stout, J. C., Bechara, A. (2005). Using cognitive models to map relations between neuropsychological disorders and human decision-making deficits. Psychological Science, 16(12), 973–978.

Alonso, Eduardo & Schmajuk, Nestor A. (September 2012). “Special Issue on Computational Models of Classical Conditioning Guest Editors’ Introduction”. Learning and Behaviour, 40(3), pages 231–240.

MAIA, T. I. A. G. O. (2009). Reinforcement learning, conditioning, and the brain: Successes and challenges. https://link.springer.com/. Retrieved June 10, 2022, from https://link.springer.com/content/pdf/10.3758%2FCABN.9.4.343.pdf

Cherry, K. (2020, June 4). What is operant conditioning and how does it work? Verywell Mind. Retrieved June 10, 2022, from https://www.verywellmind.com/operant-conditioning-a2-2794863

Elmer, J. (2020, January 8). Classical conditioning: How it works and how it can be applied. Healthline. Retrieved June 10, 2022, from https://www.healthline.com/health/classical-conditioning

Furze, T. A. (2013, July). The Application of Classical Conditioning to the Machine Learning of a Commonsense Knowledge of Visual Events. core.ac.UK. Retrieved June 10, 2022, from https://core.ac.uk/download/pdf/53141335.pdf

Manchev, N. (2021, October). Reinforcement learning introduction: Foundations and use-cases. Reinforcement Learning Introduction: Foundations and Use-Cases. Retrieved June 10, 2022, from https://blog.dominodatalab.com/introduction-to-reinforcement-learning-foundations

Poddiachyi, A. (2020, May 3). Reinforcement learning, brain, and psychology: Classical and instrumental conditioning. Medium. Retrieved June 10, 2022, from https://towardsdatascience.com/reinforcement-learning-brain-and-psychology-part-2-classical-and-instrumental-conditioning-217a4f0a989

The Unlikely Techie. (2020, July 26). Conditioning algorithms: Reinforcement learning - an introduction. The Unlikely Techie. Retrieved June 10, 2022, from https://www.unlikelytechie.com/post/conditioning-algorithms-reinforcement-learning-an-introduction

VK, Anirudh. (2019, July 11). The brains behind AI: How pavlov's Dogs & Weight Loss Tips influenced reinforcement learning. Analytics India Magazine. Retrieved June 10, 2022, from https://analyticsindiamag.com/the-brains-behind-ai-how-pavlovs-dogs-weight-loss-tips-influenced-reinforcement-learning/

Vélez, J. I. (1970, January 1). Machine Learning Based Psychology: Advocating for a data-driven approach. International Journal of Psychological Research. Retrieved June 10, 2022, from https://www.redalyc.org/journal/2990/299067861001/html/

Scott, J. zu. (2019, September 16). Combining AI's power with self-centered human nature could be dangerous. Chatham House – International Affairs Think Tank. Retrieved June 11, 2022, from https://www.chathamhouse.org/2019/03/combining-ais-power-self-centered-human-nature-could-be-dangerous

Frankenfield, J. (2022, March 16). How artificial intelligence works. Investopedia. Retrieved June 11, 2022, from https://www.investopedia.com/terms/a/artificial-intelligence-ai.asp#:~:text=Artificial%20intelligence%20(AI)%20refers%20to,as%20learning%20and%20problem%2Dsolving.

in, B. (2021). Artificial Intelligence. BuiltIn. Retrieved June 11, 2022, from https://builtin.com/artificial-intelligence

Bajaj, P. (2022, June 3). Reinforcement learning. GeeksforGeeks. Retrieved June 11, 2022, from https://www.geeksforgeeks.org/what-is-reinforcement-learning/

Osiński, B. (2022, February 15). What is reinforcement learning? The Complete Guide. deepsense.ai. Retrieved June 11, 2022, from https://deepsense.ai/what-is-reinforcement-learning-the-complete-guide/

B, W. (1913). Apa PsycNet. American Psychological Association. Retrieved June 12, 2022, from https://psycnet.apa.org/record/1926-03227-001

Education, I. C. (2022, July 7). Artificial Intelligence (AI). What-Is-Artificial-Intelligence. https://www.ibm.com/cloud/learn/what-is-artificial-intelligence

Education, I. C. (2022a, July 6). Machine Learning. What Is Machine Learning? https://www.ibm.com/cloud/learn/machine-learning