Google BERT NLP Technology

Google’s BERT takes NLP to much higher accuracy

As a follow up to my earlier LinkedIn Post of Google’s BERT model on NLP, I am writing this to explain further about BERT and the results of our experiment.

In a recent blog post, Google announced they have open-sourced BERT, their state-of-the-art training technique for natural language processing (NLP) applications. The paper released ( along with the blog is receiving accolades from across the machine learning community. This is because BERT broke several records for how well models can handle language-based tasks and more accurately NLP tasks.

Here are a few highlights that make BERT unique and powerful:

  • BERT stands for Bidirectional Encoder Representations from Transformers. As the name suggests, it uses Bidirectional encoder that allows it to access context from both past and future directions, and unsupervised, meaning it can ingest data that’s neither classified nor labeled. This is unique because previous models looked at a text sequence either from left to right or combined left-to-right and right-to-left training. This method is opposed to conventional NLP models such as word2vec and GloVe, which generate a single, context-free word embedding (a mathematical representation of a word) for each word in their vocabularies.
  • BERT uses Google Transformer, an open source neural network architecture based on a self-attention mechanism that’s optimized for NLP. The transformer method has been gaining popularity due to its training efficiency and superior performance in capturing long-distance dependencies compared to a recurrent neural network (RNN) architecture. The transformer uses attention ( to boost the speed with which these models can be trained.As opposed to directional models, which read the text input sequentially (left-to-right or right-to-left), the Transformer encoder reads the entire sequence of words at once. This characteristic allows the model to learn the context of a word based on all of its surroundings (left and right of the word).
  • In the pre-training process, researchers used a masking approach to prevent words that’s being predicted to indirectly “see itself” in a multi-layer model.  A certain percentage (10-15%) of the input tokens were masked to train the deep bidirectional representation. This method is referred to as a Masked Language Model (MLM).
  • BERT builds upon recent work in pre-training contextual representations — including Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit. BERT is pre-trained on 40 epochs over a 3.3 billion word corpus, including BooksCorpus (800 million words) and English Wikipedia (2.5 billion words). BERT has 24 Transformer blocks, 1024 hidden layers, and 340M parameters. The model runs on cloud TPUs ( for training which enables quick experimentation, debug and to tweak the model
  • It enables developers to train a “state-of-the-art” NLP model in 30 minutes on a single Cloud TPU (tensor processing unit, Google’s cloud-hosted accelerator hardware) or a few hours on a single graphics processing unit.

These are just a few highlights on what makes BERT the best NLP model so far.

Our Experiment:

To evaluate the performance of BERT, we compared BERT to IBM Watson based NER. The test was performed against the same set of annotated large unstructured documents. The model created using BERT and IBM Watson was applied to the annotated large unstructured documents. Below table shows the results we achieved:

Google's BERT Comparison chart


Based on our comparison and what we have seen so far, it is fairly clear that BERT is a breakthrough and a milestone in the use of Machine Learning for Natural Language Processing.

Deep Learning

What is Deep Learning?

Deep Learning is a subset of machine learning that allows machines to do tasks that typically require human like intelligence. The inspiration for deep learning comes from neuroscience, if you look at the architecture of Deep Learning Neural Networks, they are connected in a fundamental way that mirrors the brain. Deep-learning networks are distinguished from the more commonplace neural networks by their depth; that is, the number of node layers through which data passes in a multistep process.

Earlier versions of neural networks were shallow, composed of one input and one output layer, and at most one hidden layer in between. More than three layers (including input and output) qualifies as “deep” learning. So deep as strictly defined means more than one hidden layer.

Neural Network

Deep learning Neural network

In deep-learning networks, each layer of nodes trains on a distinct set of features based on the previous layer’s output. The further you advance into the neural net, the more complex the features your nodes can recognize, since they aggregate and recombine features from the previous layer.

Let’s take a simple example of recognizing hand written numbers from 1 – 10. If 10 people wrote the numbers, the numbers will look very different from each person. For a human brain, it is fairly easy to identify these numbers. For a traditional machine it is impossible to detect and hence Neural Networks are used to mimic the way, neurons in the brain interact. These multiple hidden layers allow a computer to determine the nature of a handwritten digit by providing a way for the neural network to build a rough hierarchy of different features that make up the handwritten digit.

For instance, if the input is an array of values representing the individual pixels in the image of the handwritten figure, the next layer might combine these pixels into lines and shapes, the next layer combines those shapes into distinct features like the loops in an 8 or upper triangle in a 4, and so on. By building a picture of these features, neural networks can determine with a very high level of accuracy the number that corresponds to a handwritten digit. Additionally, the model will learn which links between neurons are critical in making successful predictions during training. Over the course of several training cycles, and with the help of occasional manual tuning, the network will continue to learn and generate better predictions until it reaches desired accuracy.

Thus, Deep learning allows machines to solve complex problems even when using a data set that is very diverse, unstructured and inter-connected. Deep learning networks excel at dealing with vast amount of disparate data. In fact, the larger the amount of data the more efficient Deep learning becomes and the more deep learning algorithms learn, the better they perform.

Few additional links on this topic:
MIT Technology Review:
Cambridge Univerisity paper:

How do Machines Learn?

A good definition by TechEmergence states that “machine learning is the science of getting computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.”

From the definition it is fairly apparent that all forms of machine learning (ML) rely on the availability of data, not just some data but large volumes of data. Therefore, in order to take advantage of ML, access to large sets of well-organized data is critical. As far as machine learning goes, there are several approaches; from a simple decision tree to multilayered neural networks, all depending on the task and amount and type of available data.

There is no one-size-fits-all solution when it comes to a machine learning algorithm. Most times, the best solution is derived when working on real applications with real data because every organization’s data is unique. Solutions are derived by working with domain experts and creating custom neural networks.

There are a few methods to teach the machine with data: supervised learning, unsupervised learning and semi-supervised learning.

Supervised Learning: In supervised ML, the artificial intelligence (AI) model is given data that is labeled in an organized fashion. For example, one might provide pictures of cat with the labels. Once enough structured and labeled data is provided, the AI model built can recognize and respond to patterns in data without explicit instructions. The output and the accuracy of supervised learning algorithms are easy to measure making supervised learning the most common method of machine learning today.

Unsupervised learning: You guessed it, it’s the opposite of supervised learning. Here the AI model is given data that is not labeled in an organized fashion. For example, one might provide pictures of animals (cats, dogs, etc.) without any labels. This method is used to identify underlying patterns or hidden structures from unlabeled data. The expectation is not to derive the right output but to explore datasets and draw inferences. This is rarely used today as the implications of unsupervised learning are unknown.

Semi-supervised learning: This method falls somewhere between supervised and unsupervised data. In this scenario, the model is given a small amount of labeled data and a much larger pool of unlabeled data. Semi-supervised learning combines the best of both worlds by having improved accuracy associated with supervised ML and makes use of unlabeled data. Often, the process of labeling massive amounts of data for supervised learning is time consuming and expensive. This process actually tends to improve the accuracy of the final model while reducing time and cost.

So what method should be used? Well, it depends. The structure and volume of data should inform the method and the approach that needs to be taken. Hence, there is not a one-size-fits-all solution when it comes to machine learning.

Next we will talk about deep learning, a powerful machine learning technique.

Miami Herald Article: Chatting with – Miami company uses AI to engage with customers

Chirrp has been featured in the Miami Herald! Writer Nancy Dalberg speaks to chirrp’s leadership about chirrp’s unique technology as part of the Herald’s Biz Monday feature.

Check out the full story here or read the transcript below.

Chatting with Chirrp: Miami company uses AI to engage with customers Partners with Accelirate to Transform Customer Experience via Chatbots

Chatbots are transforming customer experience!

Chirrp is partnering with Accelirate to offer cutting-edge enterprise chatbots solutions. Chirrp enables companies to transform their customer experience by providing human-like conversation.

With this partnership, chirrp expands its capabilities by integrating business process automation (BPA) software. To address increasing customer demand for chatbot solutions, Accelirate is looking to chirrp to provide relevant and accurate chatbot conversations. “There are many chatbot solutions out there, however, we wanted to make sure that the platform should be able to handle low-, medium- and high-complexity use cases,”  says Ahmed Zaidi, managing partner and chief automation officer of Accelirate.

Read the full press release here.

One chirrp at a time

Chirrp is an Artificial Intelligence multi-channel chatbot platform. Our mission is to enable enterprises from all industries to transform their customer experience by providing human-like conversations through our patent-pending conversational technology. Chirrp combines the convenience of chat with the intelligence of machine learning. It’s a consumer-engagement platform that learns from users’ interactions (amongst an ever-expanding list of communication channels) and provides insights into consumer behaviors, preferences, and concerns. For more information, visit our website or contact [email protected].


It’s now called “The cognitive world”

My thoughts on IBM’s World of Watson conference and where IBM sees cognitive going.

IBM has a very clear vision of what cognitive means to them. IBM calls it “a cognitive world for enterprises”. While Facebook and Google are creating Artificial Intelligence solutions focused towards consumers, IBM wants to help businesses achieve cognitive intelligence over its data, human resources, apps etc. And IBM does not want any of the data unlike Facebook and Google who keeps your data, leverages that data for a number of reasons, customer insight, targetted advertising, machine learning etc.

Ginni in her Keynote said “In the next five years, every important business decision, will be made with the assistance of IBM Watson”. She talked about the transition from Artificial Intelligence to Augmented Intelligence. “Our goal is for man and machine to exist together. This is all about extending your expertise. A teacher. A doctor. A lawyer. It doesn’t matter what you do. We will extend it”. It is a bold promise and can IBM achieve that?

Although AI and cognitive has been around for years, it is getting the much-needed attention now thanks to Google, Microsoft, Facebook and Amazon. It also means, they are all competing for market space. While the core Watson technology isn’t trained using customers’ proprietary data, it has received large chunks of information in various industries, from health care to weather to financial services. That data is being used to train the technology in specific domains to make watson expert in that domain.

With focus on enterprises and knowledge of domain data, IBM is well positioned to lead the enterprises into this cognitive world.

Mallesh Murugesan

CEO & Founder,

Playing the long game? Get a chatbot. NOW.

In the world we live in today, everything from recruitment of new employees to diagnosing diseases, is being handled by a Chatbot. No denying bots are definitely having a moment. It’s more than just a “15 minutes of fame” that we’re dealing with here. Moments like this in history can make or break businesses.

For example, look at the history of live chat. A decade ago it was the new way to communicate with consumers. Brands that figured it out saw higher conversions and customer satisfaction scores. Today almost every single online business offers live chat. While this form of digital engagement is expected to reach $819 million market share by 2020, we’ve moved into the next iteration. Adaptability is key to success and gives businesses the competitive advantage. Let’s take a look at the current consumer landscape and their needs.

Consumer needs are evolving

According to Pew Research, millennials have surpassed the Baby Boomers as the largest American generation. They are entering the workforce, and they have dollars to spend. Their expectation is that everything is available 24/7. How does an enterprise scale and meet that standard?

One of the best ways is an effective and efficient Chatbot strategy. The evidence of this approach is clear. In fact, brands that offer consumers solely a digital experience saw a 19% increase in customer satisfaction, as outlined in this report by McKinsey. It comes as no surprise when the majority of consumers entering the marketplace are the millennials. The world they live in is digital and instantaneous.

Brands can’t afford to wait around for consumers to engage with them. The technology that is offered by companies like, allows brands to be proactive and enables new partnerships between people and computers.

It’s time for brands to engage

CEO of Mallesh Murugesan explains: “Chatbots should be used not just for customer service, but to provide offers and valuable insights.” Bots through machine learning can go into a predictive state. When in this state, the chirrp bot can reach out and warn consumers if an account doesn’t have enough funds to pay their mortgage. It can also recommend ways to invest funds based on behavior or market trends.

What’s interesting about what Murugesan said is when a brand implements AI they don’t always make a quick buck. That’s not the point. Engagement opportunities like this provide a lot of value to the consumer, which in turn builds trust. Trust is one of the key differentiating factors a millennial will take into consideration when deciding whether they do business with your brand or not.

However, its not just millennials craving this type feature. NGDATA is an organization that helps data-rich brands drive connected experiences. They conducted a survey highlighting consumers’ needs from their banks. Key findings were personalized service and being treated like an individual. Bots and AI can do this at scale, 24 hours a day, 7 days a week. Leaders in the industry are already taking note.

For example, initial buzz suggested that predictive tasks will be a part of what Bank of America plans to do with Erica, their bot. CNBC quoted Michelle Moore, Bank of America’s head of digital banking. She said, “Erica might send someone a predictive text: “Michelle, I found a great opportunity for you to reduce your debt and save you $300.” Back 11 years ago Rabobank ‘Yvette’gave people advice on saving money and also showed a summary of their last bank transactions.

Improve experience while saving money

Murugesan also gives a sample scenario of consumers traveling outside the country. Financial institutions can choose how to engage in these situations. Bank A chooses to provide traditional customer service, and Bank B has upped their game. They use AI based chatbot engagement solutions.

Bank A requires the consumer to call, go through the traditional IVR system, talk to a rep, and get the answers to questions. Bank B allows the consumer to message in questions and answers them instantaneously.

In addition, Bank B provides relevant notifications before and during travel. Then engages with them at the right moment, e.g, notifying consumers as soon as they land with a message: “Your card is safe with us. Feel free to use it.” Thereby improving the consumer experience, creating sales avenues, and building brand loyalty.

The intelligence driving these types of interactions and sales at scale is possible only with a Chatbot and visually explained in the above Value Map created by

The results are clear; Chatbots work, and consumers want them. It’s an easy to integrate solution, and every brand should work it into their digital engagement strategy today. Be innovative, increase customer satisfaction, play the long game. Get a Chatbot.

View original post here

By M. Dorsett


chirrp vs. chat: how chirrp’s rich conversation is different from the rest

Our team has been giving demos to a wide audience, including investors, potential partners, and customers. One question that comes up over and over is “Why chirrp?” What makes chirrp different, in a market that growing by the minute? Let’s take a look.

When working with most chatbots, you must identify each question that your bot can handle. This format is inherently limited to a fixed number and to a fixed question sequence. If a customer engages in a way that your bot wasn’t programmed to handle, it has little ability to respond correctly. A chatbot built in this way has a limited amount of scenarios it understands, and is powerless to address any variance in those use cases. Even if a customer asks a question that’s in a programmed scenario, if the question is asked out of sequence, the system can’t adjust. This inflexibility is why most available bots are limited, and often not sufficient to significantly cover your customer service needs.

One of chirrp’s most powerful capabilities comes from its patented analytical model which recognizes questions and commands that it wasn’t explicitly coded to know. Every customer response is first analyzed by a robust natural language processor (NLP), which identifies the sentence structure, breaks down word phrasing, and determines intention.

Once the system has located the customer’s basic intention, the system will route the conversation to the appropriate branch point. It may be that the user has asked a question that the system has a direct answer to. In that case, a list of possible answers is pulled, and the answer with the highest certainty (best likelihood of being correct) is the response from chirrp. An example of this scenario involves a potential customer chatting with a doctor’s office chirrpbot. They may ask “what types of injuries do you treat?” To such a question, chirrp might provide a list of services and types of doctors in the practice.

Alternatively, the customer may ask a question that warrants another question from the system, in order to proceed. Often times this type of question may lead to a back and forth dialog that helps the user reach an end destination. A customer may ask a doctor’s office, “Do you accept Aetna insurance?” This could trigger a dialog around insurance and becoming a new patient. Chirrp might respond, “We accept certain types, what type of Aetna plan do you have?” or “Yes, we do. Would you like to schedule an appointment with one of our doctors?” or some question that proceeds with the new patient registration process. At each interaction, the system re-analyzes intent, to determine the next response from chirrp.

By applying a robust natural language processor to every customer response, chirrp’s AI can flexibly route the conversation to the correct next step. If a user asks a question that goes outside of a sequential dialog, the NLP will identify the intent, and route the question to the correct answer (which may be under a new topic altogether), or to a new dialog, or to a later point in the same dialog. Because the NLP is applied at every point, chirrp’s platform allows for fixed answer sets AND free text answers.

Chirrp’s flexibility in user input also makes system more user friendly and conversational. Customers aren’t forced to pick from a short list of answers; they can ask and respond using free text. This also allows the customer to have more control over the conversation. They can jump to a new topic, skip ahead if they don’t need info, and ask questions out of sequence.

As AI continues to mature, customers are expecting their interactions with bots to become more intelligent and capable. Due to its patent methodology, chirrp delivers flexible, responsive interactions to users across industries and channels. Using artificial intelligence, chirrp promises to push chatbot technology into our everyday lives.

– Rosanne Lush [VP of Product]

Event: You’re Invited to Digital Demo Day

Are you interested in chatbots, but not sure how they can be useful in everyday life and business?

Have you heard about chirrp, but don’t really know what it does?

If so, join Mallesh Murugesan, the Co-Founder and CEO of, today, as he demos the robust capabilities the chirrp, a chatbot enterprise platform. He’ll be presenting as part of a Digital Demo Day, presented by SeedInvest. Come armed with curiosity and questions to this short demonstration.

Chirrp will be presented Thursday, July 20th at 1:30-1:45PM Eastern Daylight Time.

To join, please fill out this short form. SeedInvest will email you the attendee info shortly before the meeting. Takes Home eMerge Americas Award

Mallesh and I traveled to eMerge Americas’ annual conference where we demoed our AI-powered conversational platform chirrp. In competition with over 100 other startups, chirrp took home the top prize in Early Stage Venture! We’re so proud of our team for their vision, hard work and dedication. We’re very grateful to all our friends, partners and clients who directly and indirectly have supported and encouraged us along the way.

Here’s a quick excerpt from our press release:

Chirrp is a multichannel conversational platform that uses the power of artificial intelligence and machine learning to deliver engaging interactions to customers. Chirrp provides solutions to key challenges for enterprises such as building stronger brand loyalty, driving additional revenue through upselling, and capitalizing on predictive and prescriptive data. The platform reduces the costs of delivering customer service, support and communications, while increasing customer satisfaction and loyalty.

For more info on the conference and a list of winners in the University and Late Stage categories, check out this article release by eMerge Americas.

Thanks again to all our friends, partners and family who’ve support us along the way.  If you’re just learning of  chirrp, we’d love to explore new opportunities with you, as we continue to grow.