[arXiv]
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models Krishna Teja Chitty-Venkata, Murali Emani
[Paper]
[HuggingFace]
[arXiv]
PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank Adapters
Krishu K Thapa, Reet Barik, Krishna Teja Chitty-Venkata, Murali Emani, Venkatram Vishwanath
[Paper]
[arXiv]
PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference Krishna Teja Chitty-Venkata, Jie Ye, Xian-He Sun, Anthony Kougkas, Murali Emani, Venkatram Vishwanath, Bogdan Nicolae
[Paper]
[arXiv]
LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference Krishna Teja Chitty-Venkata, Sandeep Madireddy, Murali Emani, Venkatram Vishwanath
[Paper]
[arXiv]
BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference
Ahmed Burak Gulhan, Krishna Teja Chitty-Venkata, Murali Emani, Mahmut Kandemir, Venkatram Vishwanath
[Paper]
[Bivision at ICCV 2025]
MoPEQ: Mixture of Mixed Precision Quantized Experts Krishna Teja Chitty-Venkata, Jie Ye, Murali Emani
Binary and Extreme Quantization for Computer Vision Workshop at ICCV 2025
[Paper]
[ICIP 2025]
Langvision-Lora-Nas: Neural Architecture Search for Variable Lora Rank In Vision Language Models Krishna Teja Chitty-Venkata, Murali Emani, Venkatram Vishwanath
2025 IEEE International Conference on Image Processing
[Paper]
[PMBS at SC 2025]
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models Krishna Teja Chitty-Venkata, Sylvia Howland, Golara Azar, Daria Soboleva, Natalia Vassilieva, Siddhisanket Raskar, Murali Emani, Venkatram Vishwanath
2025 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High-Performance Computer Systems (PMBS)
[Paper]
[PMBS at SC 2024]
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani, Venkatram Vishwanath
2024 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High-Performance Computer Systems (PMBS)
[Paper] [Code]
[MDPI AI 2024]
ConVision Benchmark: A Contemporary Framework to Benchmark CNN and ViT Models
Shreyas Bangalore Vijayakumar, Krishna Teja Chitty-Venkata, Kanishk Arya, Arun K Somani
MDPI AI Journal
[Paper]
[Code]
[EuroPar 2024]
WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators Krishna Teja Chitty-Venkata, Varuni Katti Sastry, Murali Emani, Venkatram Vishwanath, Sanjif Shanmugavelu, Sylvia Howland
European Conference on Parallel Processing
[Paper]
[JSA 2023]
A Survey of Techniques for Optimizing Transformer Inference Krishna Teja Chitty-Venkata, Yiming Bian, Murali Emani, Venkatram Vishwanath, Arun K Somani
Journal of Systems Architecture Journal
[Paper]
[ACM CSUR 2022]
Neural Architecture Search Survey: A Hardware Perspective Krishna Teja Chitty-Venkata, Arun K Somani
ACM Computing Surveys
[Paper]
[IEEE Access 2022]
Neural Architecture Search for Transformers: A Survey Krishna Teja Chitty-Venkata, Arun K Somani
IEEE Access Journal
[Paper]
[ACM HPDC 2022]
Efficient Design Space Exploration for Sparse Mixed Precision Neural Architectures Krishna Teja Chitty-Venkata, Arun K Somani
31st International Symposium on High-Performance Parallel and Distributed Computing (HPDC)
[Paper]
[IEEE ICIP 2021]
Searching Architecture and Precision for U-net based Image Restoration Tasks Krishna Teja Chitty-Venkata, Arun K Somani
2021 IEEE International Conference on Image Processing (ICIP)
[Paper]
[IEEE ASAP 2021]
Array-aware Neural Architecture Search Krishna Teja Chitty-Venkata, Arun K Somani
2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP)
[Paper]
[IEEE HPCC/DSS 2021]
Calibration Data-based CNN Filter Pruning for Efficient Layer Fusion Krishna Teja Chitty-Venkata, Arun K Somani
2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP)
[Paper]
[IEEE PRDC 2020]
Model Compression on Faulty Array-based Neural Network Accelerator Krishna Teja Chitty-Venkata, Arun K Somani
2020 IEEE 25th Pacific Rim International Symposium on Dependable Computing (PRDC)
[Paper]
[IEEE ASAP 2020]
Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators Krishna Teja Chitty-Venkata, Arun K Somani
2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)
[Paper]
[IEEE ASAP 2019]
Impact of Structural Faults on Neural Network Performance Krishna Teja Chitty-Venkata, Arun K Somani
2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
[Paper]