News
Newest
Ask
Show
Jobs
Built with Nuxt3
High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction
(jchandra.com)
15 points | by
jchandra
2 days ago
2 comments
vivahir215
2 days ago
Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?
[-]
jchandra
2 days ago
[dead]
jchandra
2 days ago
[dead]
2 comments