It’s been a long time, and I should have left you some AI-related content to step to…
In short, AI is real, and I’ve spent the last few months coming up to speed. What have I learned? From an IT infrastructure perspective, we will likely see patterns similar to those of modern application development and hosting.
As with traditional applications, we’ll see a shortage of resources, including infrastructure, talent, and services. We’re well along the journey for custom model generation. Due to the lack of resources, much of this will happen in the cloud. Inferencing is a different story.
Talking to Nvidia’s Chief Scientist Bill Dally during an impromptu dinner, he lamented the difference in quality between 1.7T parameter models, such as ChatGTP4, and 7B parameters, such as LLaMA2. He doesn’t believe the quality of the results of the 7B parameters is comparable to that of the larger models.
Due to their sheer size, excluding any custom data, these 1.7T parameter models will require huge GPUs on the likes of H100 and the recently announced Blackwell.
However, my on-the-ground conversations show that customers are leaning toward the 8B parameter models, which anything from a modern Xeon/EPYC CPU to cloud provider accelerators could easily accommodate depending on the application and load.
I know this “short” has been longer than most but I have a lot to process and share. Keep watching this space.