flashinfer-ai / flashinfer Public

Notifications
Fork 54
Star 690

Code
Issues 34
Pull requests 6
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: flashinfer-ai/flashinfer

[Roadmap] FlashInfer v0.1.0 release checklist

#19 opened Nov 27, 2023 by yzh119

Open 5

Labels 13 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

34 Open 24 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug report] BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 3

#258 opened May 24, 2024 by merrymercy

[Feature request] Support attention logits cap with tanh

#257 opened May 24, 2024 by merrymercy

Qwen1.5-32B failed: BatchPrefillWithPagedKVCachePyTorchWrapper failed to dispatch group_size 5

#254 opened May 23, 2024 by QwertyJack

Can BatchDecodeWithPaddedKVCache be used in cascade inference?

#250 opened May 22, 2024 by joey12300

CUDA Error: no kernel image is available for execution on the device (209) /tmp/build-via-sdist-nl8se4dx/flashinfer-0.0.4+cu118torch2.2/include/flashinfer/attention/decode.cuh: line 871 at function cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)

#249 opened May 16, 2024 by lucasjinreal

Circular import error when importing built-from-source flashinfer

#248 opened May 15, 2024 by vedantroy

multiple definition of `cuda::__3::pipeline...

#245 opened May 14, 2024 by jpf888

能否支持Volta/Tesla架构？

#242 opened May 10, 2024 by alexngng

Support MLA (Multi-Head Latency Attention) in DeepSeek-v2

#237 opened May 7, 2024 by yzh119

Vllm support

#202 opened Apr 12, 2024 by MikeChenfu

[LoRA] Roadmap of LoRA operators

#199 opened Apr 8, 2024 by yzh119

3 tasks

Shared-prefix rope issue

#194 opened Apr 1, 2024 by lkc1997

Does flashinfer support float datatype?

#191 opened Mar 26, 2024 by ZSL98

How was the data in the blog measured?

#188 opened Mar 21, 2024 by cloudhan

Make flashinfer kernels cuda graphs friendly

#187 opened Mar 20, 2024 by AgrawalAmey

[BUG] model Yi-34B compat

#181 opened Mar 14, 2024 by Qubitium

JIT compilation priority: high

#170 opened Mar 11, 2024 by yzh119

stack smashing detected in begin_forward when compiling directly from the repo

#166 opened Mar 8, 2024 by mkrima

[Performance] Support strides in attention kernels

#163 opened Mar 7, 2024 by yzh119

Support for Volta / Turing architectures

#160 opened Mar 7, 2024 by tgaddair

Sliding window attention

#159 opened Mar 6, 2024 by WoosukKwon

Faster compilation times

#154 opened Mar 5, 2024 by skrider

Downloadable Package in PyPI

#153 opened Mar 4, 2024 by WoosukKwon

QUESTION: How to implement a tree attention with flashinfer

#152 opened Mar 4, 2024 by UranusSeven

[Feature request] Interleaved ROPE support

#151 opened Mar 4, 2024 by guocuimi

Previous 1 2 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly