The primordial soup created by the scaling of computing superpowers, hardware (e.g., GPUs and specialized chips), and advancements in parallelization have produced a mesmerizing cascade of innovations from the foundation model, middleware, and applications. AI has had two big moments in the past few months: Linux moments from the supplier/developer side and iPhone moments from the demand/application side. Reflecting on the AI stack today, the power distribution is still heavily unbalanced and skewed toward a select few centralized and gigantic companies. The biggest and the most dominating players in model development are entities that have the resources to harness huge amounts of data as well as the computing, storage, and training resources to create sophisticated AI models. But there are also rising forces from open-source, community-governed dataset contributors and foundation model developers that challenge the oligopoly, pass ownership to a larger group of users, and unlock network effects and data sources in a decentralized way to build scalable, long-lasting artifacts.
With the exponential growth in users within but a few brief months for companies like Open AI, it is critical that end users be aware that the economic incentives of these close-sourced foundation models owned by the moguls are not aligned with their interests. There are data privacy, alignment, and security problems inherent to the further development of these models. Most importantly, our shared human history demonstrates that the scalability of human intelligence is accomplished through knowledge sharing, collaboration, and federation. The centralized nature of foundation model development and the landscape in AI applications is a sharp contrast. Knowledge collaboration will be one of the key advantages that allow generalized and synthesized AI to rapidly evolve and become more powerful. As AI pioneer Marvin Vinsku, who first coined the term “Artificial Super Intelligence” in The Society of Mind, flags, the future of AI will not be one algorithm written by one entity or one guy but a network of different AIs, each doing different things and specialized in specific tasks, and the network of AI will cooperate together on distributed supercomputers through an incentivized mechanism. To achieve this kind of “AI Super Intelligence,” AI needs to decentralize from the underlying powerhouse (computing, network, storage) and focus on the collaborative development of AI models (training, fine-tuning, inference, federated learning) and ultimately models can be exchanged and work with each other (e.g., HuggingGPT, Multi-agents).
Infrastructure: Decentralized storage, networking, and computing platforms
The computation and storage required to train and run algorithms make the cost of AI massive, especially for new entrants who want to build their own in-house model. A single GPT3 with around 175B parameters in a training run takes 3.6K PetaFLOP/s-days (Brown et al., 2020), which amounts to $4M on today’s AWS on-demand instances (assuming 50% utilization). Even smaller-scale language models, e.g., GPT3-1.3B (1.3 billion parameters), require 64 Tesla V100 GPUs to run for one week, a cost of $32K on AWS. In real-world use cases, if ChatGPT or Bard were deployed in Google Search, the cost would top out at $20B in Capex. The underlying economic incentives of the centralized model are not aligned with the end user, and the monopoly of these centralized-computing-power providers has been such that buyers lack power in bargaining (AWS has benefited from an estimated 61% gross profit margin on commoditized computing hardware).
Centralized models are usually trained in very specialized clusters with fast, homogeneous interconnects and dedicated software systems to support both data parallelism and model & pipeline parallelism. Such dedicated clusters are expensive and hard to obtain. As more and more new open models are released and the number of scenarios and requirements for model accuracy grows, the expansion of the development of an array of foundation models will create another spike for computation. The shortage of computing resources and GPUs has even made the strategic resource equitable; VCs are purchasing hundreds of GPUs and building out clusters specifically for their portfolio startups. We’re seeing that investors can invest in startups with computing resources for equity. Although centralized AI mostly relies on dedicated cloud service, there is idle and unused GPU power around the world in different cycles and clusters. Decentralized-storage computing platforms can leverage a great amount of heterogeneous, lower-bandwidth interconnected computing, opening the door for many hardware providers to become software providers and unlock new seller profiles that were previously out-priced. The P2P storage and computing network fosters Bertrand-style competition that results in much lower fees for users. (There’s some excellent economic breakdown here).
There are several major challenges to identifying and utilizing these spot GPUs for decentralized computing platforms reliably and at a large scale, and these pain points are exactly where we’re seeing a lot of new startups breaking in from different angles. First, we need to partition these large models into different pieces and direct them toward other devices. This has been pretty challenging even in the cloud, and it becomes all the more difficult in the decentralized setting because the machine and network are heterogeneous. What is more, to best utilize all available resources and devices after the large models get partitioned, the different parts of computation tasks must be carefully mapped to another device in a different environment. To properly allocate the resources under the heterogeneous networks, a scheduling system must also be in place, yet another gargantuan and expensive task given the independency of the collection of computational tasks in foundation-model training and the geographic distributions of devices. Finally, handling preemption is key to harvesting all this idling GPU power as the model should be evicted the moment the machines are required by other users. Designing a system for detecting evictions and synchronizing inference results is something significant that needs to be taken care of.
Companies like Together (Compute) have innovated with a novel scheduling and preemption system to optimize the complex communication costs for training and benchmarking FMs. Companies with roots in decentralized machine learning like BreezeML have developed a virtual cloud manager that intelligently leverages heterogeneity in resources and pricing models within and across the public cloud for distributed training. DePIN (Decentralized Physical Infrastructure Networks) has emerged to deliver speed and cost-saving solutions with staking and token incentives. Companies like Render Network, and Bacalhau are P2P networks that enable individuals to contribute underutilized GPU to help projects render graphics and visual effects in return for utility tokens. Marketplaces companies like ExaBits are building a coordinating layer that matches GPU demand with suppliers.
Infrastructure: Decentralized training (Federated Learning-as-a-Service)
Today, data and knowledge are mainly created on the periphery, and so much of the world’s most valuable data remains untapped (e.g., mobile, IoTs). For all this data from the edge to be used in traditional centralized training, it needs to flow toward a centralized hub. Bandwidth limitations and the sheer volume of data can exceed the reasonable range for transfer to the cloud. Connectivity (data must be transmitted over a stable connection), latency (real-time applications usually require milliseconds), and privacy issues (sensitive data has to remain on-site) also pose challenges to centralized training.
Federated learning is an emerging research field for decentralized AI. A machine learning technique to train algorithms across decentralized edges while holding data locally, federated learning enables different peripheral devices to contribute to the knowledge and training of the model while keeping most of the data in the device. Data doesn’t leave the peripheral, and federated learning only sends training derivatives to the cloud and never stores anything on the device. This solves both connectivity and security issues. What is more, federated learning enables the local model to stay up to date by using local training data and sharing the model parameters with the centralized server. This research was initiated by Google and offered via Tensor Flow Federated. Nvidia has also introduced the Clara Holoscan MGX platform that offers federated learning-as-a-service for medical device and computational pharma development. We’re also seeing horizontal federated learning for start-ups with platforms like FedML, and DynamoFL. Companies tailored for the healthcare vertical, like Rhino Health, address the huge demands and strict regulation on the accurate, private, and sensitive training data from patients for the training of models at a large scale.
Federated learning still needs to overcome the challenges of excessive communications, system heterogeneity, and privacy, so much research is focusing on reducing the number of update rounds and update messages. Heterogeneity has been a perennial hurdle to be tackled in decentralized AI as edge devices vary so greatly in storage, computation, and communication capabilities, and each operates in different environments with different configurations. A systematic solution to heterogeneity without sacrificing privacy and security remains important.
Middleware: Collaborative Inference and Fine-Tuning of Large Models
Cost and efficiency are problems for the inference and fine-tuning of LLMs. With the release of open source LLM like Bloom (176B), users can download the pre-trained model, but the scale of parameters, the memory requirements, the compute cost, and the high-end hardware needed make it prohibitive for individuals, researchers, and practitioners to do model inference and fine-tuning. Existing alternatives, like RAM offloading and inference in public sources, have serious drawbacks. The former is not applicable to interactive interfaces, and the latter makes it difficult to access weights, logits, or attention.
Crowdsourced, distributed training has inspired several projects aiming to democratize LLMs. For example, Petals developed a platform that allows multiple users to collaborate and perform inference and fine-tuning of large language models over the Internet. Compare NVLink/NV Switch, which pools several GPU resources on an expensive system, to Petals’ more affordable alternative that can “offload,” i.e., it splits the inference process into smaller steps, so it can run the system with far fewer resources and run events 10x faster. The concept behind Petals resembles the Enigma Network (now rebranded as Secret Network), a blockchain-based technology that has been used for fully encrypted computation.
Marketplace: Marketplace & Protocols for Data/Model Exchange (Decentralized Data)
Users’ private data and proprietary industrial data are both valuable assets and tradable commodities. Since the most recent AI application renaissance, everyone has been emphasizing the moat and flywheels in proprietary data. Yet end users have been giving away their data and their digital footprint which becomes the topline profit of big companies! Since end users do not benefit from their own data, the scarcity of data is also becoming prevalent. These foundation models have been scaling into trillions of parameters, and we’re running out of training data, the fuel that powers the general intelligence system. The quality of data is even more important than the quantity. Models’ data needs to be accountable and secure to generate factually accurate and appropriate behaviors in real-world scenarios. Decentralized AI could leverage such concepts as tokens, smart contracts, and DAO that are indigenous to blockchain to model the interactions and set up incentives. Indeed, we are seeing web2 developers using points to incentivize proper behaviors and contributions and companies that use utility NFT (Apus Network) to align interests between providers, creators, and users. DataDAO, for example, manages the access and monetization of datasets that will later be tokenized and stored on IPFS.
Homomorphic encryption has been a leap forward in personal data security. The most straightforward use case for this innovative technology is when a data owner wants to share data or send data up to the cloud for processing but does not trust the provider. A homomorphic encryption scheme allows the owner to encrypt their data, share it with the server that will perform certain computations without decrypting it, and send back the encrypted result to the data owner. In other words, the data owner maintains control of the result which can only be decrypted with a secret key that no one else has. Homomorphic encryption enables individuals to contribute data to the training model without compromising privacy. Companies like SingularityNet are building a marketplace that allows people unlimited access to AI algorithms and AI applications powered by blockchain for modeling and application. Their protocol also supports data exchange, sharing, and collaboration across different algorithms to support the later development of multi-AI applications.
Model: Decentralized Collaborative Model System & Model Routing
Solving complex AI tasks with different domains of knowledge and modalities will be a critical milestone toward AGI or even ASI. The foreseeable future of AI won’t be dominated by one foundation model across different modalities but by specialized smaller models that can conduct specialized tasks in different verticals. Collaboration between models will bring a new wave of innovations in terms of model routing and model-to-model collaboration for different types of tasks. Auto-agents (Lilian has a really good write-up here for autonomous agents) have been a popular topic across different imaginative use cases (gaming, workflow, etc.), and with the collaboration of specialized models, there will be powerful multi-agent collaborations emerging in the near future for decentralized collaboration.
HuggingGPT presents a new paradigm and framework for designing AI solutions. The framework leverages LLMs (e.g., ChatGPT) to connect various AI models in machine-learning communities (e.g., Hugging Face) to solve AI tasks and creates a strong routing layer that leverages the specialty and advantages of different models to drive the best task result. This new collaborative agent will not only bring a new level of automation in workflow for enterprise use cases by dissembling, planning, and executing tasks but also unlock hyper-personalized personal assistants for the consumer world. (We’re actively seeking companies that are commercializing HuggingGPT, please reach out if you’re building in the space)
Conclusion
The infrastructure, development, and derivative of AI will be increasingly decentralized and gradually become a category where we see the convergence of the web2 and web3 worlds with super innovations. The decentralization of AI infrastructure, development processes, and derivatives not only democratizes AI but also is poised to unlock its full potential and pave the way for a future where strategic resources are being fully capitalized on and artificial super-intelligence is a reality.
References:
https://www.unusual.vc/post/ai-native-infrastructure-will-be-open