For years IBM has been using cutting-edge AI to improve the digital experiences found in the Masters app. We taught an AI model to analyze Masters video and produce highlight reels for every player, minutes after their round is complete. We built models that generate scoring predictions for every player on every hole. But I believe the “AI Commentary” solution we built this year is the most consequential work we’ve done in the history of our 25-year partnership with the Masters.
AI Commentary is a new feature that automatically adds spoken commentary to videos of every shot, by every player, on every hole. Over the course of the tournament, it will narrate the golf action in more than 20,000 videos that are accessible through the My Group feature on the Masters app. It is designed to enhance the user experience. But the reason I believe this solution is so important is not because of what it does, but how it does it.
The AI Commentary feature is a generative AI built from a large language model that was trained on a massive corpus of language data. The world’s eyes were first opened to the power of large language models last November when a chatbot application dominated news cycles. Since then, there have been countless questions about the practical applications of these powerful models that seemingly understand the complex relationships between words, sentences, and concepts. I think the AI Commentary capability in the Masters app offers some answers.
Long before millions of people started generating college essays and humorous haiku online, IBM was busy figuring out how to make large language models enterprise grade. The first thing they needed was domain expertise. Because large language models are trained on vast quantities of unlabeled data, they can be quickly adapted to a wide range of tasks. But first, they need to acquire “domain expertise.” In other words, a general large language model might be able to generate a passable critique of John Steinbeck’s East of Eden, but without domain expertise, it can’t tell you how a customer service representative at a specific bank should manage a customer who has overdrawn their account. Or what an engineer on an oil rig should do about a high pressure reading on one of the gauges.
The second need is closely related, and really applies to any AI model used in a corporate setting. In order for a large language model to be deployed for internal operations or in customer-facing applications, it must deliver reliable, repeatable results. It cannot be wrong, offensive, or unexplainable. In my experience, the best way to ensure this is by tapping into curated, accurate, and relevant source data from across the enterprise. “Garbage in, garbage out” has never been more true than it is right now.
In the case of AI Commentary, the large language model we started with could already recognize, summarize, and generate text. But it didn’t understand golf. And it definitely didn’t understand the Masters. For example, at Augusta National Golf Club, a sand trap is called a bunker. The rough is called the second cut. And fans are called patrons. So our team began adding both golf domain expertise and Masters domain expertise to the foundational model. It took two IBM consultants with golf knowledge just three hours to seed the training with domain-specific data. The model began learning and refining from there. (In the not-so-distant past, building an AI solution like this might have taken those same consultants months if not years.)
To produce the spoken commentary during the tournament, the model taps into Masters “approved” data sources, including data from official providers – including shot data, scoring, stats, and of course, video – from a variety of the approved (trusted) sources. The AI translates the metadata from each shot into descriptive textual elements. That text goes through two neural networks, where hundreds of millions of computations are performed to produce thousands of possible sentences. The model then chooses the best sentence, passes that sentence into Watson Text-to-Speech service, aligns the audio with the action in the clip, and even varies the language and sentence structure from clip to clip.
Many people have wondered about the practical applications of large language models since the term first entered the public lexicon late last year. I believe AI Commentary in the Masters app is an example of the kinds of use cases we can expect: purpose-built AI models, built from trusted data, designed to serve up helpful, accurate information on specific subject matter. And I believe there will be thousands (if not millions) of them, because AI developers need only add the domain expertise of their industry, their company, or their department to quickly build them. There are moments when the raw capability of technology astounds us. But it’s not until you see these capabilities solve a specific problem that you begin to understand the impact they will have on your business. So as you enjoy the AI commentary feature in the Masters app this week, think about the potential for this technology to not just change the game, but change the world.
See how IBM transformed commentary at the Masters
The post What the Masters app can teach us about large language models appeared first on IBM Blog.