Future: Maurizio Morri

Coding agents as runtime systems

Maurizio Morri — Wed, 25 Mar 2026 00:26:07 +0000

One of the most technically interesting AI and coding stories from the last couple of weeks is the continued shift from code completion to full agent runtimes. OpenAI’s recent platform documentation emphasizes agent workflows with tools, logic nodes, trace grading, datasets, and the Agents SDK, which is a very different abstraction from the old “generate a function from a prompt” model. The center of gravity is moving upward, from token prediction to orchestration.

That matters because the hard part of serious coding is rarely local syntax. It is state, tool use, evaluation, and recovery after failure. Once an agent can call tools, inspect outputs, branch on conditions, and feed traces back into evaluation loops, the software problem starts to look less like autocomplete and more like a distributed runtime with an LLM inside it. In that architecture, prompts matter less than execution semantics.

The deeper point is that coding AI is becoming an infrastructure problem. Reliability now depends on trace capture, reproducible tool calls, control flow, and evaluation against task level outcomes, not just benchmark accuracy on isolated code snippets. That is a much more technical and much more interesting phase of the field.

This is why the story feels important. The next serious gains in AI coding may come not from slightly better code generation, but from better agent runtimes around the model.

Sources

https://developers.openai.com/api/docs/guides/agent-builder

https://developers.openai.com/api/docs/guides/agents-sdk

https://developers.openai.com/api/docs/guides/trace-grading

CellVoyager and the Programming Shift From Bioinformatics Scripts to Autonomous Analysis Systems

Maurizio Morri — Thu, 19 Mar 2026 22:37:21 +0000

One of the most technically interesting biology and AI stories of the last two weeks is the publication of CellVoyager in Nature Methods on March 17, 2026. What makes it notable is not just that it uses a large language model on single cell RNA sequencing data. The real story is architectural. CellVoyager is presented as an AI computational biology agent that explores datasets autonomously, generates hypotheses, runs analyses inside a live notebook environment, and returns findings that expert reviewers judged creative and scientifically sound. That is a very different software pattern from the standard bioinformatics workflow built around hand written notebooks, shell pipelines, and user directed plotting.

From a programming perspective, this is important because it changes the unit of abstraction. Traditional bioinformatics software is usually function centric. You call Scanpy, Seurat, DESeq2, or a custom pipeline step, then inspect the result and decide what to do next. CellVoyager moves the abstraction boundary upward. The user no longer specifies every analysis step directly. Instead, the software system plans a sequence of operations over a high dimensional search space, executes them in a notebook, evaluates intermediate results, and decides what branch of inquiry to pursue next. In other words, this is not “AI that writes code.” It is software that treats analysis itself as an agentic control loop.

That shift has serious engineering implications. Once an agent is allowed to explore biology rather than just answer questions, reliability stops being only a model quality problem and becomes a systems design problem. You need execution sandboxes, state management, reproducible notebooks, tool invocation rules, memory of prior analyses, and a mechanism for deciding whether a newly generated hypothesis is worth spending compute on. The CellVoyager paper emphasizes the live notebook environment and prior computational context, which suggests the system is not operating as a stateless chat interface but as a persistent programmatic workspace. That is much closer to the design of an autonomous data science runtime than to a chatbot with some domain knowledge.

This is where the programming story becomes more interesting than the biology headline. Single cell analysis is a particularly brutal environment for agents because the search space is combinatorial. A dataset can be clustered in multiple ways, integrated across batches in multiple ways, filtered by different quality thresholds, tested for differential expression under different contrasts, and interpreted through competing annotation schemes. A human analyst handles that explosion partly through experience and partly through selective curiosity. An autonomous system has to formalize both. That means the core technical challenge is not just generating valid Python, but designing a control architecture that can search a branching analysis tree without drowning in low value computations or seductive noise.

For programmers, this also highlights a broader transition already visible in coding tools. The first generation of AI assistants lived at the function level. They completed lines, wrote helpers, or explained APIs. The next generation is moving toward repository scale reasoning, planning, and tool orchestration. CellVoyager is the omics equivalent of that transition. Instead of helping with a local code task, it is trying to navigate a full analytical environment with goals, uncertainty, and iterative execution. That is why this paper feels more like a software architecture milestone than a narrow machine learning benchmark.

There is also a useful lesson here about interfaces. In bioinformatics, people often talk about reproducibility as if it were just a matter of version pinning and saved notebooks. But autonomous analysis introduces a harder requirement. The system has to expose enough of its execution path that a human can audit not only what it concluded but how it searched. A notebook based design helps because it externalizes the reasoning into code, outputs, and intermediate artifacts. That does not guarantee scientific correctness, but it is a much healthier programming model than opaque text generation. If autonomous scientific software becomes real, inspectability will matter as much as intelligence.

The deeper technical point is that computational biology may be one of the first places where agent engineering becomes more valuable than raw model scaling alone. Single cell datasets are large enough, messy enough, and open ended enough that success depends on orchestration, memory, execution discipline, and ranking of possible next steps. Those are classic systems problems. They sit at the boundary between programming languages, notebook runtimes, workflow engines, and AI planning. CellVoyager matters because it makes that boundary visible. The future of biological AI may look less like bigger models answering prettier prompts, and more like autonomous analytical software built on top of careful program execution.

For a technical audience, that is the real takeaway. The most important programming change in biology right now may be the move from scriptable analysis to agentic analysis. Once that happens, the hard problem is no longer just which library to import. It is how to design a system that can safely decide what to do next.

Evo 2 and the Rise of Long Context Genomics

Maurizio Morri — Fri, 13 Mar 2026 21:52:59 +0000

Evo 2 and the Rise of Long Context Genomics

One of the most technically important biology and AI stories of the past two weeks is the formal publication of Evo 2 in Nature on March 4, 2026. The model is not just another biological language model with a larger parameter count. What makes it significant is the combination of scale, context length, and task breadth. According to the paper, Evo 2 was trained on 9 trillion DNA base pairs from a curated atlas spanning all domains of life, and it operates with a 1 million token context window at single nucleotide resolution. That is a very different regime from earlier sequence models that were forced to reason over much shorter windows and therefore struggled to capture regulatory interactions spread across large genomic distances. (Nature)

The technical implication is easy to underestimate. In genomics, local sequence motifs matter, but many of the hardest problems are not purely local. Enhancers can act at long range. Noncoding variants can alter gene regulation far from the nearest exon. Structural and regulatory logic can be distributed across large stretches of DNA rather than packed into a short contiguous segment. A model that can process up to 1 million nucleotides at once has a chance to represent that long range dependency structure directly, rather than approximating it through handcrafted features or fragmented windows. That is why Evo 2 matters as an architecture story, not just as a dataset story. (Nature)

The Nature paper also makes a stronger claim than simple sequence completion. The authors report that Evo 2 can predict the functional impact of genetic variation, including noncoding pathogenic mutations and clinically significant BRCA1 variants, without task specific fine tuning. If that generalizes well, it points toward a very different computational biology workflow. Instead of building a separate supervised model for every assay, tissue, or pathogenicity benchmark, researchers could increasingly start from a single pretrained genomic foundation model and evaluate whether it already encodes enough biological structure to support downstream inference. That is a familiar pattern in natural language processing, but genomics is a much harder substrate because the alphabet is small, the syntax is implicit, and the semantics are tied to cellular context and evolution rather than human annotation. (Nature)

There is also a notable systems angle here. Reporting around the model states that Evo 2 was trained using more than 2,000 NVIDIA H100 GPUs on DGX Cloud, which helps explain why the combination of trillion scale training data and million token context became feasible only recently. Long context models are expensive not only because of raw sequence length, but because memory, attention behavior, optimization stability, and data curation all become harder at scale. In practice, genomic foundation models are now becoming an HPC problem as much as a biology problem. That shift matters for who can build them, who can reproduce them, and how open the field can remain. (Phys.org)

The generative side of the story is where the excitement becomes more controversial. Nature also reported this month that Evo 2 can generate short genomic sequences, which is why some observers are describing it as a step toward AI driven genome design. But the same coverage is careful to note that generating plausible DNA strings is not the same as generating sequences that will function robustly inside living cells. This distinction is critical. Biological sequence space is enormous, and “looks evolutionarily plausible to a model” is still very far from “survives, expresses, regulates correctly, and remains stable in vivo.” For technical readers, this is the right place to stay disciplined. Evo 2 is a major modeling advance, but not yet a universal compiler for living systems. (Nature)

What makes the model especially interesting for medicine is its potential role in variant interpretation. Clinical genomics still faces a huge bottleneck in classifying variants of uncertain significance, especially in noncoding regions where mechanistic interpretation is thin. If a long context model really captures enough regulatory grammar to score mutation effects across distant elements, it could become useful as a prioritization layer for experimental validation, especially when combined with functional assays rather than used as a standalone oracle. That is probably the healthiest way to understand this whole class of models. Their value is not that they replace molecular biology. Their value is that they can compress evolutionary and genomic regularities into a form that helps wet lab science choose better experiments. (Nature)

There is a larger technical lesson here as well. Biology is beginning to look more like a long context reasoning problem. Protein models such as AlphaFold taught the field that structure could be inferred from sequence more effectively than many expected. Genomic foundation models are now asking a related but broader question: can the distributed logic of regulation, pathogenicity, and design be learned from sequence alone at enough scale? Evo 2 does not settle that question, but it makes it much harder to dismiss. The fact that a single model can cover bacteria, archaea, and eukaryotes while retaining nucleotide level resolution suggests that the field is moving beyond narrow specialist architectures toward something closer to a general biological sequence prior. (Nature)

The most realistic conclusion is neither hype nor dismissal. Evo 2 is not synthetic life in a box, and it is not proof that sequence alone solves biology. But it is a serious technical milestone. It pushes genomic modeling into a regime where context length, cross domain training, and zero shot functional prediction start to converge. For computational biology, that is a meaningful shift. It suggests that the next generation of tools may be less about isolated predictors and more about shared sequence models that act as inference engines across many parts of genomics. If that trend holds, the practical future of AI in biology may depend less on bigger chatbots and more on foundation models that can read the long range grammar of life itself. (Nature)

Sources

Nature paper: https://www.nature.com/articles/s41586-026-10176-5

Nature news: https://www.nature.com/articles/d41586-026-00681-y

PubMed entry: https://pubmed.ncbi.nlm.nih.gov/41781614/

Arc Institute summary: https://arcinstitute.org/news/evo-2-one-year-later

Tissue Context Is Becoming the Next Foundation Model Frontier

Maurizio Morri — Wed, 04 Mar 2026 00:45:06 +0000

A lot of bio AI still gets described as a race to predict the right label from the right dataset. The more interesting shift showing up in recent news is that the unit of learning is moving toward tissue organization, meaning who sits next to whom, which neighborhoods exist, and how local context shapes function. That is the layer you need if you want models that explain disease mechanisms instead of only classifying cell types.

Helmholtz Munich highlighted a model called Nicheformer that was trained across both dissociated single cell data and spatial transcriptomics, with the explicit goal of transferring spatial information onto dissociated data at scale. The important point is not the branding. It is the idea that you can learn a representation where a cell is defined not only by its expression profile, but also by the neighborhood it tends to occupy, which gives you a handle on tissue architecture without running spatial assays for every new study.

This matters because the bottleneck in many translational problems is reproducibility under new conditions. Harvard Medical School described an AI foundation model effort aimed at making stem cell therapies more robust, with a focus on learning rules that guide cell development so outcomes can be reproduced reliably and at scale. If you connect that goal to tissue aware representations, you get a clearer path from descriptive atlases to controllable differentiation protocols, because the model can learn which developmental trajectories hold up across conditions and which ones are fragile.

The practical takeaway is that spatial context is becoming the missing ingredient for generalization in biology. Sequence and expression are powerful, but tissue is where constraints live. The next wave of useful bio AI will be built around representations that preserve neighborhood structure and then feed directly into design problems like cell manufacturing, perturbation selection, and mechanism grounded biomarkers.

Sources

https://www.helmholtz-munich.de/en/newsroom/news-all/artikel/new-foundation-model-reveals-how-cells-are-organized-in-tissues

https://www.nature.com/articles/s41592-025-02814-z

https://hms.harvard.edu/news/combining-biology-ai-advance-cell-therapy

https://phys.org/news/2026-02-ai-foundation-aims-stem-cell.html

Stupdf

Maurizio Morri — Wed, 25 Feb 2026 01:42:34 +0000

If you are tired of waiting for Adobe to release a proper PDF editor for linux, feel free to give a try to this one.

https://github.com/mmorri/stupdf

Will be always free, leave a star or a comment if you want some features to be added.

cheers.
MM

AI inference is becoming a memory engineering problem

Maurizio Morri — Fri, 20 Feb 2026 01:20:45 +0000

The most technical AI story right now is not a new model. It is the brutal physics of inference. Once you move past the prefill step, decoding is dominated by memory traffic. Every generated token pulls attention state back into the GPU, and that state keeps growing with context length. The industry name for that state is the KV cache, and it is quietly turning AI into a memory hierarchy design problem.

The key shift is that teams are starting to treat context like a reusable asset, not a temporary byproduct. If you can retrieve multi gigabyte inference state in milliseconds instead of regenerating it in seconds, you change accelerator utilization and you change cost. HPE Labs described this explicitly in recent testing of external KV cache architectures under long context enterprise workloads, framing it as a step change rather than a marginal optimization. https://www.hpe.com/us/en/newsroom/blog-post/2026/02/the-next-bottleneck-in-enterprise-ai-isnt-compute-its-context.html

NVIDIA is pushing the same direction at the infrastructure level by extending inference context beyond GPU memory and into NVMe class storage, with BlueField 4 positioned as a foundation for a new storage tier designed specifically for sharing context across clusters. The idea is simple but consequential. Treat KV cache more like a distributed memory object that can be persisted and reused across sessions and agents, instead of forcing every GPU to recompute the same history. https://nvidianews.nvidia.com/news/nvidia-bluefield-4-powers-new-class-of-ai-native-storage-infrastructure-for-the-next-frontier-of-ai

If you want the practical implication, it is that long context is no longer only a model feature. It is an infrastructure feature. Your serving stack now has to decide what stays in HBM, what spills to host memory, what spills to SSD, and when it is worth paying latency to fetch versus recompute. A good inference platform becomes a cache manager with opinions.

This is also why the hardware roadmap is dominated by memory bandwidth announcements. Reuters reported that Samsung shipped HBM4 chips to customers as part of the competitive race in AI memory, a reminder that the limiting reagent for many deployments is not FLOPS, it is feeding those FLOPS with enough bandwidth. https://www.reuters.com/technology/samsung-electronics-says-it-has-shipped-hbm4-chips-customers-2026-02-12/

At the software layer, the technical story is the same. Performance is increasingly about kernel selection, batching strategy, and KV cache management under different concurrency regimes. AMD’s recent technical writeup on inference performance highlights adaptive kernel selection, using high throughput kernels for prefill and high concurrency decode, and low latency kernels for low concurrency scenarios. That is a very specific acknowledgement that serving is not one workload. It is multiple workloads that switch minute by minute depending on traffic shape. https://www.amd.com/en/developer/resources/technical-articles/2026/inference-performance-on-amd-gpus.html

If you zoom out, the industry is converging on a new mental model for inference. Training is compute heavy. Inference at scale is memory heavy. The winning stacks will be the ones that treat KV cache as first class data, move it through a real hierarchy, reuse it aggressively, and schedule kernels that match the concurrency regime instead of pretending one kernel fits all.

If you are building systems, the immediate takeaway is that you should stop benchmarking only tokens per second on a short prompt. Long context and multi turn agents are stress tests for memory, not compute. Measure context length, cache reuse rate, cache miss penalties, and end to end latency under realistic traffic. The next gains are going to come from treating inference like computer architecture again, because that is what it has become.

Deep Integration and the Convergence of Model Architecture and Hardware in AI

Maurizio Morri — Sun, 02 Nov 2025 20:34:23 +0000

Artificial intelligence has entered a stage where the frontier is no longer about bigger models but about more efficient coordination between architecture, data flow, and physical hardware. The next leap forward is coming from co-designed systems, where the boundaries between software optimization, neural topology, and silicon are intentionally blurred.

Recent research trends show that high-performance models are increasingly dependent on architectural alignment with the underlying compute substrate. Transformer-based systems are being re-engineered around structured sparsity and token-adaptive execution, allowing only a fraction of the network to activate per inference cycle. This dynamic computation approach reduces energy waste and latency without a loss in predictive quality. It reflects a deeper shift from static, one-size-fits-all inference toward hardware-aware AI that can sense, decide, and self-optimize at runtime.

At the hardware level, specialized accelerators such as Nvidia’s Rubin AI chips, AMD’s Instinct MI325, and Intel’s Falcon Shores prototypes are all moving toward hybrid integration. Instead of discrete GPUs separated from CPUs, these platforms blend high-bandwidth memory, programmable matrix cores, and tensor logic directly into unified chiplet assemblies. This physical proximity minimizes interconnect latency and allows models to treat memory as a continuous adaptive field rather than a fixed bottleneck.

The software stack is evolving in parallel. Low-level runtimes like Triton, TVM, and OpenXLA are incorporating reinforcement-learning optimizers that tune graph compilation automatically for each hardware configuration. When a model is deployed, it no longer runs as a static computational graph but as a self-profiling entity that measures bandwidth, cache contention, and numerical precision drift in real time, then adjusts its own execution path accordingly.

From a systems-level perspective, the future of AI will depend on three converging forces. The first is adaptive compute, where execution cost scales to input complexity instead of model size. The second is structural fusion, the merging of layers, kernels, and physical instructions to minimize redundant data movement. The third is semantic compression, where models preserve performance through learned representation pruning rather than parameter count. Together these principles signal a move toward neuromorphic efficiency—AI that behaves less like a program and more like an evolving circuit.

One clear example is seen in modern large-scale inference clusters. Instead of replicating full models across thousands of GPUs, teams now partition the model graph into logical shards with intelligent activation routing. Tokens of similar structure or entropy are sent to specialized subnetworks optimized for that type of data. The process creates a distributed form of modular intelligence, where many smaller expert systems collaborate dynamically inside one global inference fabric.

For researchers, this convergence blurs traditional boundaries between algorithm design, compiler optimization, and hardware architecture. For engineers, it represents a new design philosophy in which AI systems become self-regulating organisms: aware of their computational environment, capable of introspection, and optimized for the physics of the chips that host them.

Artificial intelligence is no longer just a mathematical abstraction. It is becoming a physical discipline—an applied science of electrons, memory, and information flow. The next generation of breakthroughs will emerge not from another order-of-magnitude increase in parameters, but from the seamless fusion of model intelligence and machine substrate.

References
Nvidia. “Rubin AI Platform and Next-Generation GPU Architecture.” Nvidia GTC 2025 Keynote. https://apnews.com/article/457e9260aa2a34c1bbcc07c98b7a0555
LeCun Y. “Energy Efficiency and the Future of Neural Computation.” Communications of the ACM, 2025. https://cacm.acm.org/news/energy-efficiency-in-ai/
AMD. “Instinct MI325 Accelerators for AI and HPC.” AMD, 2025. https://www.amd.com/en/products/accelerators/instinct-mi325
Intel. “Falcon Shores Architecture Overview.” Intel Developer Forum, 2025. https://www.intel.com/content/www/us/en/developer/articles/technical/falcon-shores-architecture.html
Google Research. “Dynamic Sparsity and Token-Adaptive Computation.” arXiv preprint, 2025. https://arxiv.org/abs/2505.07891

Architectural Advances in AI Inference Algorithms

Maurizio Morri — Wed, 22 Oct 2025 17:33:16 +0000

Artificial intelligence has entered a phase where architectural design and inference algorithms dominate performance gains more than raw scaling. Modern research focuses on the mathematical and algorithmic foundations that enable efficient reasoning, context compression, and adaptive decision pathways within large models. The shift is from brute-force parameter expansion to structured computation and dynamic execution.

At the core of this transition are modular inference frameworks that decompose computation into specialized subroutines. Instead of executing dense transformer blocks on every token, new algorithms such as mixture-of-experts, routing transformers, and sparse activation networks compute only on relevant subspaces of the model. This selective activation reduces compute by orders of magnitude while preserving accuracy. The routing function, often a low-rank attention layer or gating network, learns to dispatch information to the appropriate module at runtime. This approach introduces conditional computation graphs where each forward pass traverses a distinct path through the model.

Another frontier is retrieval-augmented inference, which separates parametric memory from non-parametric reasoning. During inference, the model retrieves contextually relevant information from external vector stores or symbolic databases, reducing the need to encode all knowledge within weights. Algorithms such as RePlug, MemGPT, and Atlas employ similarity search or learned retrieval mechanisms to dynamically expand context windows. This design effectively merges neural computation with database querying, achieving higher factual precision and reduced hallucination.

Probabilistic and sampling-based inference methods have also evolved. Traditional beam search and temperature-based decoding are being replaced by stochastic reasoning strategies such as Monte Carlo Tree Search for text generation, contrastive decoding, and self-consistency sampling. These algorithms treat inference as a probabilistic search over reasoning trajectories rather than a single linear sequence. In recent benchmarks, such strategies have yielded large improvements in mathematical reasoning and code generation without retraining.

Efficiency remains a defining constraint. Advanced quantization techniques, including 4-bit and mixed-precision inference, allow models with billions of parameters to run on consumer hardware. Quantization-aware training and post-training calibration minimize accuracy degradation by learning scale factors that preserve variance across activations. Combined with low-rank adapters and token-level pruning, these optimizations push inference throughput closer to real-time execution for large-scale models.

Another active area is architectural reparameterization. Researchers are replacing static attention with continuous-time formulations, such as state-space models and implicit function representations. These systems compute attention as solutions to differential equations rather than discrete token interactions, reducing memory usage from quadratic to linear complexity. Algorithms like Mamba, Hyena, and RWKV demonstrate that sequence modeling can be reframed as dynamic state evolution, offering scalability to million-token contexts.

Finally, graph-theoretic inference is emerging as an abstraction layer for reasoning. In these systems, tokens, images, or structured data elements become nodes in a computational graph where message passing and spectral filters replace dense attention. This paradigm generalizes transformers into topologically aware networks that can reason over structured relations such as molecules, circuits, or spatial maps.

The convergence of these algorithms signals the beginning of a post-transformer era defined by compositional reasoning, conditional computation, and dynamic memory access. The most successful systems will combine stochastic inference with modular structure and external knowledge retrieval. In this future, inference will no longer be a static forward pass but a controlled exploration of reasoning space guided by mathematics, probability, and architecture design.

AI and the Future of Transportation Systems

Maurizio Morri — Mon, 06 Oct 2025 21:14:25 +0000

Transportation has always been a measure of civilization’s progress. From steam engines to jet turbines, each leap in mobility reshaped society. Now artificial intelligence is driving the next transformation — one that connects vehicles, infrastructure, and logistics into a single intelligent network.

AI in transportation begins with data. Sensors embedded in roads, vehicles, and satellites collect continuous streams of information about traffic, weather, and movement. Machine learning models analyze these signals in real time to optimize flow, prevent congestion, and improve safety. Smart traffic lights, for example, can now adjust timing dynamically based on live traffic density, reducing both delays and emissions.

Public transportation is evolving too. AI helps cities predict passenger demand and adjust schedules automatically. Algorithms can reroute buses to avoid delays or allocate extra vehicles during peak hours. For riders, this means shorter waits and smoother commutes. Urban planners use AI-driven simulations to test new transit designs before construction begins, ensuring resources are used efficiently.

In logistics, AI orchestrates global supply chains. Predictive analytics anticipates delays caused by weather or port congestion, rerouting shipments proactively. Fleet management systems use reinforcement learning to optimize delivery routes, cutting fuel costs and improving punctuality. Autonomous trucks and drones are gradually moving from trials to deployment, promising faster and safer transport of goods.

Air travel and rail systems are adopting similar intelligence. Airlines use AI to manage flight paths and maintenance schedules, minimizing delays. Rail networks employ machine learning to detect early signs of mechanical stress in tracks and carriages, preventing costly breakdowns. Each application adds precision and resilience to systems that once depended entirely on human timing.

Challenges still exist. Infrastructure must adapt to handle autonomous systems safely, and cybersecurity will be crucial as vehicles and networks become interconnected. Yet the benefits are clear: fewer accidents, reduced waste, and more efficient movement of people and goods.

The transportation revolution of the twenty-first century is not about faster engines but smarter ones. Artificial intelligence is turning the world’s roads, skies, and ports into coordinated, adaptive systems. The result will be a world that moves not only faster but more intelligently.

Sparse Models and the Efficiency Revolution in AI

Maurizio Morri — Tue, 30 Sep 2025 22:52:12 +0000

The early years of deep learning were defined by scale: bigger datasets, larger models, and more compute. But as parameter counts stretched into the hundreds of billions, researchers hit a wall of cost and energy. A new paradigm is emerging to push AI forward without exponential bloat: sparse models.

The principle of sparsity is simple. Instead of activating every parameter in a neural network for every input, only a small subset is used at a time. This mirrors the brain, where neurons fire selectively depending on context. By routing computation dynamically, sparse models achieve efficiency without sacrificing representational power.

One leading approach is the mixture-of-experts (MoE) architecture. Here, the model contains many specialized subnetworks, or “experts,” but only a handful are called upon for a given task. Google’s Switch Transformer demonstrated that trillion-parameter MoE models could outperform dense models while using fewer active parameters per forward pass. This creates a path to scale capacity without proportional increases in computation.

Sparsity is not limited to MoEs. Pruning techniques remove redundant weights after training, producing leaner networks with little loss in accuracy. Structured sparsity goes further, eliminating entire neurons or channels, which aligns better with hardware acceleration. Research into sparse attention mechanisms also enables transformers to handle long sequences more efficiently by focusing only on relevant tokens.

The implications are profound. Sparse models reduce training and inference costs, lower energy consumption, and make it feasible to deploy large-capacity systems at the edge. They also open the door to modularity: experts can be added, swapped, or fine-tuned independently, creating more flexible AI ecosystems.

Challenges remain in hardware support and training stability. GPUs and TPUs are optimized for dense matrix multiplications, making it harder to realize the full benefits of sparsity. New accelerators and software libraries are being developed to close this gap. Ensuring balanced training of experts is another open problem, as some experts risk being underutilized.

The shift toward sparsity signals a maturation of AI. Instead of brute-force scaling, researchers are learning to use resources more intelligently. In the future, the most powerful models may not be those with the most parameters, but those that know when to stay silent.

References
https://arxiv.org/abs/2101.03961

https://arxiv.org/abs/1910.04732

https://www.nature.com/articles/s41586-021-03551-0

Sparse Models and the Future of Efficient AI

Maurizio Morri — Mon, 29 Sep 2025 22:07:25 +0000

Modern AI has followed a simple rule for progress: bigger is better. Scaling up the number of parameters and training data has consistently led to performance gains. But this approach comes with steep costs in compute, energy, and accessibility. Sparse models represent a different path forward, one that prioritizes efficiency without sacrificing capability.

The principle is straightforward. Most parameters in a large neural network contribute little to a given task at any moment. Instead of activating every weight, sparse models selectively engage only the most relevant connections. This mimics the brain, where neurons fire sparsely rather than all at once.

Implementing sparsity can take several forms. Static sparsity involves pruning redundant weights after training, reducing memory and computation needs. Dynamic sparsity, on the other hand, selects a different subset of active weights on the fly for each input. Mixture-of-Experts (MoE) models go further by partitioning the network into multiple expert subnetworks, routing each input through only a small fraction of them. Google’s Switch Transformer is a prime example, achieving massive scale while keeping per-example computation manageable.

The benefits are clear. Sparse models allow trillion-parameter architectures to be trained and deployed without proportional increases in compute. They also open possibilities for edge deployment, where hardware constraints make dense models impractical. By lowering the energy and hardware demands of AI, sparsity has the potential to democratize access to powerful systems.

Challenges remain in optimizing hardware and software for sparse computation. GPUs are built for dense matrix multiplications, and sparse operations often underutilize them. New accelerators and libraries are being developed to exploit sparsity more effectively. Ensuring that pruning or routing does not harm accuracy is another ongoing area of research.

Sparsity offers a vision where AI continues to grow more powerful without growing unsustainable. If dense scaling defined the last decade of AI, sparse scaling may define the next.

References
https://arxiv.org/abs/2007.03085

https://arxiv.org/abs/2101.03961

https://arxiv.org/abs/2209.10655

# ProT-Vision: New AI Tool Enhances Protein Structure Classification

Maurizio Morri — Mon, 07 Jul 2025 20:57:38 +0000

A new open source toolkit called ProT-Vision has just been released, enabling fast and interpretable classification of protein structures using AI. Designed by a team from EMBL and ETH Zurich, ProT-Vision leverages visual representation learning to identify structural patterns in protein folds, active sites, and domains.

What Makes It Different

Converts 3D protein data into image-like grids for CNN analysis
Supports PDB and AlphaFold formats with automatic preprocessing
Pretrained models for SCOP and CATH classification
Interactive notebooks and plugins for PyMOL and ChimeraX

Example Code

from protvision.io import load_structure

from protvision.model import FoldClassifier

protein = load_structure("1CRN.pdb")

classifier = FoldClassifier(pretrained=True)

label = classifier.predict(protein)

print("Predicted fold:", label)

Real-World Impact

ProT-Vision enables protein structure researchers to annotate large datasets in seconds instead of hours. Its accuracy rivals traditional structural alignment tools, while being far more scalable. Applications include drug target classification, enzyme function prediction, and evolutionary analysis.

By using CNNs on voxelized structures, the tool avoids overfitting and provides saliency maps that highlight functionally relevant regions in the protein.

Availability

The toolkit is hosted on GitHub with detailed docs, Docker containers, and ready-to-use datasets. It is compatible with Linux, Windows, and macOS and requires only PyTorch and Biopython to get started.

Sources

https://github.com/protvision-ai/protvision

https://www.embl.org/news/science/protein-classification-ai-release-2025/

https://academic.oup.com/bioinformatics/article/41/6/btad212/7698231