Skip to main content
Advertisement

AI Bias Analysis

4 models · Takes ~15 seconds

VentureBeat

IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models

IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models
ShareXFacebook

Processing 200,000 tokens through a large language model is expensive and slow: the longer the context, the faster the costs spiral. Researchers at Tsinghua University and Z.ai have built a technique called IndexCache that cuts up to 75% of the redundant computation in sparse attention models, delivering up to 1.82x faster time-to-first-token and 1.48x faster generation throughput at that context

V

Source

VentureBeat

Read full article at VentureBeat

Opens original article in a new tab

Advertisement

Related Tech Stories

Bluesky’s new app is an AI for customizing your feed
The Verge2.9 · Center

Bluesky’s new app is an AI for customizing your feed

The latest app from the team behind Bluesky is Attie, an AI assistant that lets you build your own algorithm. At the Atmosphere conference, Bluesky's former CEO, Jay Graber, and CTO Paul Frazee, unveiled Attie, which is powered by Anthropic's Claude and built on top of Bluesky's underlying AT Protocol (atproto). Attie allows users to create custom feeds using natural language. For example, you could ask for "posts about folklore, mythology, and traditional music, especially Celtic traditions."

Read more →
Advertisement