The Databases that are Eating the Internet


If you consume M365 services, you may have noticed that in the past year reliability has decreased. The quick-witted among you might be saying, “So what’s changed?” Seriously though, in East US some VM purchases have been restricted because of “Over Utilization.” I had a Windows 365 Cloud PC that was just shut off by Microsoft for three days. Even redeploying it did not bring it up. This was a $60 a month service just shutdown. For a service that is supposed to have four nines, the message is clear: something more important than your box is happening.

If you immediately knew I was going to talk about LLMs, you win a prize. The massive data going back and forth to people’s machines in prompts and feedback is enormous. But it pales to the actual text and database utilization that is occurring on the backend servers. Most people never touch the databases that are used for these systems. I imagine that is going to change in the future so I thought I would take a look at them, and what they do well.

The physical reality is that the public cloud has hit a hard ceiling. Hyperscalers are quietly throttling general-purpose workloads, restricting VM SKUs, and letting standard service availability drop because the underlying infrastructure is being cannibalized to fuel the massive compute demands of AI context engineering. When you send a prompt, background systems are frantically executing unstructured data pipelines—converting text, PDFs, and user activity into high-dimensional math coordinates. Storing and querying these spatial points at production scale requires specialized vector databases engineered for approximate nearest neighbor (ANN) search rather than exact relational lookups, and the engine you choose depends entirely on how you intend to manage these constrained cloud resources.

For massive, distributed enterprise architectures, Milvus handles high-throughput, heavy batch-loading pipelines. It utilizes a disaggregated architecture that explicitly decouples compute from storage. By allowing query nodes to scale independently from data ingestion nodes, it is built to absorb large-scale backfills—like processing a multi-million row legacy data deduplication or catalog indexing project—without crashing the operational layer.

Where raw search speed and exact metadata filtering are the priorities, Qdrant provides a highly optimized alternative. Written in Rust, it excels at hybrid search setups where you must combine vector spatial distance with strict, deterministic business logic—such as evaluating real-time user permissions, checking regional inventory, or isolating streaming infrastructure logs for anomaly detection. Its advanced filtering architecture executes complex boolean metadata constraints inline during the vector search itself, eliminating the latency of post-processing steps.

Conversely, Pinecone targets teams looking to bypass operational infrastructure overhead entirely by delivering a fully managed, cloud-native SaaS environment. It automates index optimization, scaling, and sharding out of the box, which simplifies deployment but forces a trade-off: you exchange architectural control, accept cloud vendor lock-in, and incur recurring usage costs, contrasting sharply with the bare-metal or self-hosted flexibility of open-source engines.

Ultimately, this infrastructure crunch is forcing a major shift in where data engineering is going. We are moving away from an era where pipelines were defined solely by moving tabular data between relational warehouses. As unstructured data becomes the primary asset, the bottleneck is no longer just about optimizing compute or basic file storage; it is about managing semantic infrastructure. Data engineers must now design pipelines that preserve the geometric and contextual meaning of information, relying on specialized vector stores to execute complex, probabilistic operations like data merging, semantic search, and fraud discovery efficiently enough to survive a resource-constrained cloud environment.