Artificial intelligence (AI) applications are becoming pervasive across all industries, and at companies of all sizes. In fact, most companies have a long line of AI use cases identified, in development, or already in productive use. As AI use cases start meeting their initial business objectives, initial infrastructure decisions can either accelerate the deployment of subsequent use cases or stall them. Businesses should be ready to adapt to these outcomes while being flexible regarding infrastructure.
Often, hybrid cloud solutions are the foundation of AI deployments. As use cases can vary greatly, requiring an increasing amount of resources over time, hybrid cloud deployments offer a way to match technology requirements with the increasing demands of an AI solution. Lastly, balancing overall costs versus performance is perhaps the most important step when considering any change in the AI infrastructure.
Whether you’re in the planning stages of a new AI deployment, expanding an existing AI use case, or adding new use cases into an existing deployment, here are a few recommendations for building an AI infrastructure that is scalable from development to production, and reliable enough to withstand continual use without sacrificing costs.
1. Compute where the data lives
The location of the data should influence where it is processed. AI requires a massive amount of data to get a desirable outcome. For example, If the data is in a data center, you should strongly consider model training as close to it as possible. Likewise, if the data is already in the cloud, then that’s probably the best place for the training to take place because it’s typically difficult (and sometimes expensive) to move a lot of data in and out of clouds.
2. Consider on-premises and/or cloud processing
Most companies have adopted a cloud strategy, but when launching an AI project there are some important considerations. The first business lesson many companies learn the hard way is that AI projects are iterative and never-ending. For example, model training can be very expensive in the cloud if you want to keep your neural network models continually updated with new data elements and patterns. On the other hand, there are a wide variety and growing number of AI cloud services, with some great data center options for hosting AI hardware. The trending reality is that most AI projects are taking a hybrid approach.
3. Understand that not everything runs on GPUs, nor should it
Understanding the workflow requirements is critical. CPUs are sufficient for basic AI workloads, but GPUs are more ideally suited for deep learning workloads, which can require multiple large datasets and scalable neural networks. The key takeaway here is knowing when a CPU is good enough. Sometimes, faster is just faster! Another important point to note here is CPU and GPU requirements will almost certainly increase over time. Having an adaptive and flexible compute environment is necessary to manage overall total cost of ownership in an AI environment.
4. Exercise careful planning with the final state in mind
Careful upfront planning to imagine the end state of the AI use case can avoid very costly changes in the future. Perhaps one of the biggest issues that falls into this category is not having a full understanding of the scalability requirements. Companies often start out AI projects by purchasing a few systems at a time. This can be very problematic if they haven’t considered what the fully mature AI use case will need to look like across the board (e.g. compute, storage, networking, etc.). An AI infrastructure should always start out with the final state in mind. For example, if the final state will require 50 to 100 systems with some requiring rear-door water cooling, you’ll need to make sure your data center can handle the requirements that come with it, in which case you should consider investing in a colocation provider that supplies and manages hardware infrastructure including servers, storage and network elements. Colo providers already have the density and the water-cooling capabilities to support the storage and servers provided by the customer. Plus, delivery times for providing AI systems are often lengthy—especially when AI systems are in high demand—but colocation partners can be much more responsive to meeting the huge exigency of AI systems.
The interwoven theme of these considerations is that careful planning can be one of the biggest contributors to ROI expectations when launching an AI solution. Often, the required infrastructure components are the least planned-out when it comes to AI use cases. That might be due to so many AI use cases starting out as a pilot or proof of concept running on a few small systems. The IT infrastructure decisions made up front can greatly affect the overall cost and complexity of the project for its duration.
Lastly, starting with a well-designed infrastructure solution will make a big difference in unlocking the full value of the AI use case. For any business wanting to leverage the value of AI, what truly matters is not the AI models they are developing, it’s the dependable, flexible and scalable infrastructure that is taking the company’s AI initiatives into the future.
If you would like more in-depth insight on how to manage change and expenses when deploying an AI infrastructure, register for the AI Micro-summit webinar series on IT Infrastructure, Tuesday, Dec. 8 and Tuesday, Dec. 15 at 1:00 p.m. PT. These 30-minute sessions will focus on key issues facing IT professionals like you as you look to deploy AI and machine learning (ML) strategies for your organization.
You can also learn more about solutions to AI business challenges by speaking with your Sirius representative, or you can contact us for more information.