Applying AI – Computer Vision and NLP

In Artificial Intelligence, Computer Vision, LLMs, Machine Learning, NLP by Prabhu Missier

Two of the areas where Artificial Intelligence is thriving are Computer Vision(CV) and Natural Language Processing. CV tries to replicate the human act of seeing and perceiving the world and NLP tries to communicate with humans just as any two humans would.

Computer Vision
When it comes to CV there are various ways in which Artificial Intelligence works. An AI could be made to learn from static images and try to interpret the differences in pixel intensity. In order to do this the AI could be made to undergo supervised learning where it is shown thousands or more labelled and annotated images and then when it’s learning phase is over asked to identify or interpret similar unseen images. For instance a joint effort between Moorfields Eye Hospital in London and Google’s DeepMind resulted in an AI being trained to diagnose eye diseases. It did this by using Neural networks and being trained on several thousands of eye scans which were annotated by eye specialists. The AI was also able to explain how it arrived at its diagnosis.

Closer to our mundane lives is what happens at Amazon Go stores. Using object detection and tracking, the AI system can track a customer moving through its store. This ensures an efficient shopping process obviating the need for any shop assistants.

In countries like Estonia AI is used to process satellite images of farmlands and sanction government grants or penalize farmers who don’t follow government regulations while using the grants.

Yet another interesting area is in the field of robotics and self-driving cars. CV does not rely solely on photographic images or videos. With self driving cars a complete picture of the environment around a car is developed using radio waves in addition to the traditional lasers and cameras. Reinforcement learning is also used with self-driving where the AI algorithm is rewarded for certain behaviors.
However a deterministic approach is usually taken with self-driving AIs when a decision made by an AI using probabilistic methods may not be the best way forward.

Natural Language Processing
With NLP a human being should be able to communicate with a computer using normal spoken and written language. NLP has made giant strides in the last few years not least due to the tremendous success of Large Language models(LLM) like OpenAI’s ChatGPT, Google’s BERT and Baidu’s ERNIE.
A LLM is trained on voluminous amounts to data. For instance ChatGPT-3 was trained on almost the entire Wikipedia collection in addition to several other repositories and out of print books. It used an amazing 175 billion parameters while being trained and it is rumored that ChatGPT-4 could be using around 100 trillion parameters. These parameters allow a model to communicate at the human level and the more the parameters the better the fidelity of the model.

With LLMs you can generate summaries, carry on chat conversations, complete essays by providing only a prompt and a context, analyze the sentiment of a piece of text and much much more.

NLP also includes the recognition of human voices and responding to human conversational prompts. Amazon’s Alexa is a good example of this.

Bias in training sets
GPT-3 and similar systems use unsupervised learning to consume a large corpus of information, find patterns between the words and then come up with a set of probable words which can follow. Often these LLMs are biased towards the datasets they have been trained on. So it is no wonder that most LLMs today have little to negligible knowledge of non-English sources of information.
Bias is pretty dangerous since results emitted by LLMs can be skewed. For instance a model can be misused by training it to generate fake news.
Attempts have been made to change all this by creating datasets in various languages. There are efforts on in Africa to generate datasets which represent the more than 2000 languages that are in use there.
LLMs learn better when exposed and trained on datasets from different languages. Bias certainly drops and although the syntax of languages may be different semantically there are similarities and the AI can certainly make better recommendations when it learns the variations across languages and by extension across countries and cultures.

Energy consumption
Training all these models on huge datasets consumes a lot of resources since all these models run on computing power across multiple datacenters.
It becomes imperative then that in addition to bias this is another problem that researchers have to contend if AI solutions have to become earth-friendly and not cause more damage than good.
Optimizing the performance of models both while in training and while inferring is an area that needs a lot of focus. Once the initial hype surrounding ChatGPT, PaLM and other LLMs subsides researchers need to take a really hard look at resources being expended to achieve these results and see if it’s worth the toll on the earth’s already limited pool of resources.

References
https://www.amazon.com/b?ie=UTF8&node=16008589011
https://www.masakhane.io/
https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/