Commit Graph

256 Commits

Author SHA1 Message Date
goodbey857 58bc254809 feat: add PaddleOCR-vl loader support and implement retrieval router infrastructure (#23945)
Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: joaoback <156559121+joaoback@users.noreply.github.com>
2026-04-24 15:19:37 +09:00
Timothy Jaeryang Baek b9fc3f367a refac 2026-04-21 15:47:32 +09:00
Timothy Jaeryang Baek 4113b15a60 chore: format 2026-04-17 14:28:18 +09:00
Timothy Jaeryang Baek 860b90fd17 refac 2026-04-17 13:47:21 +09:00
Timothy Jaeryang Baek ba83613ff2 refac 2026-04-17 13:35:35 +09:00
Timothy Jaeryang Baek 4d2f189810 feat: add RAG_RERANKING_BATCH_SIZE configuration option
Add configurable reranker batch size (env var RAG_RERANKING_BATCH_SIZE,
default 32) following the same pattern as RAG_EMBEDDING_BATCH_SIZE.

- config.py: PersistentConfig for RAG_RERANKING_BATCH_SIZE
- main.py: import, state init, pass to get_reranking_function
- colbert.py: accept batch_size param in predict() (was hardcoded 32)
- utils.py: get_reranking_function passes batch_size at call time
- retrieval.py: expose in config GET/POST endpoints and ConfigForm
- Documents.svelte: add Reranking Batch Size input in admin settings

Closes #23730
2026-04-17 08:35:45 +09:00
Timothy Jaeryang Baek 5dae600ce7 chore: format 2026-04-14 17:27:31 -05:00
Classic298 a3ea7bf043 fix(retrieval): offload Loader.load to a worker thread so file uploads stop blocking the event loop (#23705)
Loader.load() dispatches to the underlying langchain document loaders
(PyMuPDF, Unstructured, python-docx, Tika, …) which are all
synchronous and CPU/IO-bound. process_file() awaited it directly on
the event loop, so parsing a non-trivial PDF/DOCX would freeze the
entire FastAPI app for the duration of the parse — which is what users
experience as "the server hangs whenever I upload a file."

Add an `aload()` async wrapper on Loader that runs the sync load on a
worker thread via asyncio.to_thread, and update process_file() to
await it. The sync API is preserved so existing callers that already
run inside run_in_threadpool (e.g. save_docs_to_vector_db) are
unaffected.

https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-14 10:55:46 -05:00
Timothy Jaeryang Baek 4866bec0f2 refac 2026-04-14 10:55:11 -05:00
Classic298 804f9f3153 fix(retrieval): offload sync VECTOR_DB_CLIENT calls in async paths via AsyncVectorDBClient (#23706)
* fix(retrieval): offload sync VECTOR_DB_CLIENT calls in async paths via AsyncVectorDBClient

The vector DB backends (Chroma, pgvector, Qdrant, Milvus, Pinecone,
Weaviate, …) are uniformly synchronous and their methods perform
blocking network or disk I/O. Multiple async route handlers and helpers
were calling them directly on the event loop — file processing,
memories, knowledge bases, hybrid search bookkeeping — so a single
upsert/delete/search would freeze every other in-flight request for the
duration of the call.

Introduce `AsyncVectorDBClient`, a thin async facade that wraps the
existing sync client and dispatches each method through
`asyncio.to_thread`. It mirrors `VectorDBBase` exactly and forwards
*args/**kwargs so backend-specific extra parameters keep working.

Update every async-context call site (routers/retrieval, routers/files,
routers/memories, routers/knowledge, retrieval/utils,
tools/builtin) to await `ASYNC_VECTOR_DB_CLIENT` instead of calling the
sync client directly. Two helpers that were sync-only also acquire
async siblings or are awaited via `asyncio.to_thread` at their async
call site (`remove_knowledge_base_metadata_embedding`,
`get_all_items_from_collections`, `query_doc`).

The original sync `VECTOR_DB_CLIENT` is unchanged, so callers that
already run inside `run_in_threadpool` (e.g. `save_docs_to_vector_db`
and the sync `query_doc`/`get_doc` helpers) are unaffected.

https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8

* fix(retrieval): restore explicit AsyncVectorDBClient signatures matching VectorDBBase

Per PR review: the original *args/**kwargs forwarding lost type
safety and IDE/static-analysis support. Restore explicit signatures
that mirror VectorDBBase exactly, so:

  * Bad kwargs fail at the facade boundary instead of inside the
    worker thread (where the resulting TypeError tends to be
    swallowed by surrounding `try/except`).
  * IDE autocomplete and static analysis work as expected.
  * The stated intent ("mirror VectorDBBase exactly") now holds at
    the API contract level, not just behaviourally.

While doing this, surface a pre-existing bug in
`delete_entries_from_collection` that the stricter typing flagged:
the call passed `metadata={'hash': hash}` which is not a parameter
on `VectorDBBase.delete` nor any backend. The TypeError raised
inside the sync delete was silently swallowed by `except Exception`
so the endpoint always reported `{'status': False}` for every
request instead of actually deleting matching vectors. Replace with
`filter=...` to do what the endpoint name promises.

The thorough review's other note (no concurrency/backpressure on
the shared default threadpool) is intentionally not addressed here:
asyncio.to_thread on the shared executor is the right primitive for
this use case; per-domain bounded executors would add lifecycle
complexity disproportionate to the problem and the loop is no
longer blocked, which was the actual bug.

https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8

* fix(retrieval): parallelize hybrid-search collection prefetch; document async facade contracts

Address PR review findings:

1. Hybrid-search prefetch was sequential
   `query_collection_with_hybrid_search` previously awaited
   `ASYNC_VECTOR_DB_CLIENT.get(name)` once per collection in a for
   loop. Each call already off-loaded to a worker thread, but
   awaiting them serially meant total prefetch latency scaled
   linearly with the number of collections. Run them concurrently
   with `asyncio.gather` so multi-collection queries actually
   benefit from the threadpool. Per-collection exception handling
   is preserved by wrapping each fetch in a small helper that
   logs and returns `(name, None)` on failure, so a single bad
   collection cannot poison the whole gather.

2. Document the thread-safety expectation explicitly
   The facade now formally states what was always implicit: the
   sync `VECTOR_DB_CLIENT` is shared across worker threads, so the
   underlying backend driver must be thread-safe. This is not a
   new exposure — `save_docs_to_vector_db` already called the sync
   client from `run_in_threadpool`. Adding a global lock here
   would defeat the responsiveness the facade exists to provide;
   backends that cannot tolerate concurrent access should grow
   their own internal serialization.

3. Document the API-surface choice and `.sync` escape hatch
   The strict `VectorDBBase` mirror was a deliberate choice (the
   previous `*args/**kwargs` revision let a `metadata=` typo
   silently break an endpoint). Document it, and call out the
   `.sync` escape hatch with an example for callers that genuinely
   need a backend-specific parameter not on `VectorDBBase`.

https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8

* fix(retrieval): guard /delete against null file.hash and let HTTPException reach the client

Address PR review finding on the `metadata=` → `filter=` change in
`delete_entries_from_collection`.

The new `filter={'hash': hash}` query was correct for files that
have a hash, but did not handle `file.hash is None` (unprocessed,
failed, or legacy records). The match semantics of a null filter
value are backend-dependent — some ignore the key entirely, some
treat it as "metadata field absent" and match every such row — so
issuing the query risked deleting unrelated entries.

  * Reject `hash is None` up front with a 400 explaining the file
    has no hash to target.

  * Narrow the surrounding `except Exception` so it no longer
    swallows `HTTPException`. Without this fix the new 400 (and the
    pre-existing 404 for missing files) would be silently re-shaped
    into `{'status': False}` and the caller could not distinguish a
    bad-request input from a backend error.

https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-14 10:50:18 -05:00
Timothy Jaeryang Baek d1a0fbe292 refac 2026-04-13 13:36:54 -05:00
Timothy Jaeryang Baek 22cfb3c673 refac 2026-04-13 13:26:13 -05:00
Timothy Jaeryang Baek d4b90f93bd refac 2026-04-12 22:08:27 -05:00
Timothy Jaeryang Baek 27169124f2 refac: async db 2026-04-12 14:22:11 -05:00
Timothy Jaeryang Baek 51b200c67b refac 2026-04-01 05:52:03 -05:00
Timothy Jaeryang Baek 36d02aa147 refac 2026-03-31 23:12:23 -05:00
Timothy Jaeryang Baek ade617efa8 refac 2026-03-24 04:49:48 -05:00
Timothy Jaeryang Baek 9a2c60d595 refac 2026-03-21 17:12:33 -05:00
Timothy Jaeryang Baek de3317e26b refac 2026-03-17 17:58:01 -05:00
Timothy Jaeryang Baek b171b0216b refac 2026-03-17 17:54:59 -05:00
Ethan T. a229f9ea42 fix: replace bare except with except Exception (#22473)
Replace bare except clauses with except Exception to follow Python best practices and avoid catching unexpected system exceptions like KeyboardInterrupt and SystemExit.
2026-03-15 17:48:23 -05:00
Timothy Jaeryang Baek 6d9996e599 refac 2026-03-06 20:12:37 -06:00
Timothy Jaeryang Baek 73b69ae408 refac 2026-03-06 15:13:21 -06:00
Timothy Jaeryang Baek 80376a3fdc revert 2026-03-06 15:05:36 -06:00
Algorithm5838 1c1c1c3100 fix: allow clearing file upload settings (#22336) 2026-03-06 14:23:20 -06:00
Timothy Jaeryang Baek 0c2e4270bc chore: format 2026-03-01 14:10:45 -06:00
Classic298 2054ee0b73 fix: enforce ownership check on user-memory collection queries (#22109)
* fix: enforce ownership check on user-memory collection queries

fix: enforce ownership check on user-memory collection queries

Prevent authenticated users from querying other users' memory
collections via the /query/doc and /query/collection endpoints.
A new _validate_collection_access helper rejects requests for
user-memory-{UUID} collections where the UUID does not match
the requesting user. Admins bypass the check.

* Update retrieval.py

* Update retrieval.py
2026-03-01 15:03:37 -05:00
Timothy Jaeryang Baek 93bab8d822 refac 2026-03-01 13:54:44 -06:00
Timothy Jaeryang Baek 259d5ca596 refac 2026-03-01 13:49:36 -06:00
Timothy Jaeryang Baek c83a42198d refac 2026-03-01 13:37:31 -06:00
Timothy Jaeryang Baek 5ee5093259 refac
Co-Authored-By: Johannes Fahrenkrug <16358+jfahrenkrug@users.noreply.github.com>
2026-02-24 17:23:36 -06:00
Timothy Jaeryang Baek 631e30e22d refac 2026-02-21 15:35:34 -06:00
lazariv 5759917f54 feat: Adding You.com as a web search provider (#21599)
* Add ydc.py provider implementation

* Add PersistentConfig entry for you.com

* Add Youcom search function import

* Update you.com configuration

* Add you.com as a web search engine option in frontend

* Add YOUCOM_API_KEY to main.py
2026-02-21 14:51:56 -06:00
Timothy Jaeryang Baek 5d4547f934 enh: RAG_EMBEDDING_CONCURRENT_REQUESTS 2026-02-21 14:33:48 -06:00
Timothy Jaeryang Baek 4bef69cc63 refac 2026-02-19 16:03:03 -06:00
Timothy Jaeryang Baek 74988189b8 refac 2026-02-18 13:06:50 -06:00
Timothy Jaeryang Baek c653e4ec54 refac 2026-02-12 15:25:24 -06:00
Classic298 8cf32ae2a7 fix: prevent worker death during document upload by using run_coroutine_threadsafe (#21158)
* fix: prevent worker death during document upload by using run_coroutine_threadsafe

Replace asyncio.run() with asyncio.run_coroutine_threadsafe() in
save_docs_to_vector_db() to prevent uvicorn worker health check failures.

The issue: asyncio.run() creates a new event loop and blocks the thread
completely, preventing the worker from responding to health checks during
long-running embedding operations (>5 seconds default timeout).

The fix: Schedule the async embedding work on the main event loop using
run_coroutine_threadsafe(). This keeps the main loop responsive to health
check pings while the sync caller waits for the result.

Changes:
- main.py: Store main event loop reference in app.state.main_loop at startup
- retrieval.py: Use run_coroutine_threadsafe() instead of asyncio.run()

https://claude.ai/code/session_01UQSYvSTkXb57sFb7M85Kcw

* add env var

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-12 15:22:57 -06:00
Tim Baek 258454276e fix: files settings save issue 2026-02-06 22:33:49 +04:00
Danil c5c4aef7b1 Yandex web search (#20922)
Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: joaoback <156559121+joaoback@users.noreply.github.com>
2026-01-26 07:31:44 -05:00
Classic298 b272ca5e88 fix: remove invalid expunge call on Pydantic FileModel (#20931)
fix: remove invalid expunge call on Pydantic FileModel
Files.get_file_by_id() returns a Pydantic FileModel, not an SQLAlchemy
ORM object. Calling db.expunge() on a Pydantic model fails with
UnmappedInstanceError since it lacks _sa_instance_state.
The expunge was also unnecessary because subsequent DB updates already
use fresh sessions via get_db() context manager.
Fixes #20925
2026-01-26 07:24:53 -05:00
Classic298 25fd342261 Update retrieval.py (#20930) 2026-01-26 15:29:15 +04:00
Timothy Jaeryang Baek 9af40624c5 refac 2026-01-22 18:58:00 +04:00
Timothy Jaeryang Baek 68b2872ed6 fix/refac: file batch process issue 2026-01-22 15:03:31 +04:00
Classic298 00b3583dc2 fix: fix reindex not working due to unnecessary dupe check (#20857)
* Update retrieval.py

* Update knowledge.py

* Update retrieval.py

* Update knowledge.py
2026-01-21 18:36:08 -05:00
Timothy Jaeryang Baek ecbdef732b enh: PDF_LOADER_MODE 2026-01-21 23:51:36 +04:00
Classic298 182d5e8591 fix(db): release connection before embedding in process_files_batch (#20576)
Remove Depends(get_session) from POST /process/files/batch endpoint to prevent database connections from being held during batch embedding API calls (5-60+ seconds for large batches).

The save_docs_to_vector_db() function makes external embedding API calls. Post-embedding file updates (Files.update_file_by_id) manage their own short-lived sessions internally, releasing connections promptly.
2026-01-11 23:32:56 +04:00
G30 4b4743b497 feat: enforce permissions in backend (#20471)
* feat: enforce image generation permissions in backend

* feat: enforce web search permissions in backend

* feat: enforce audio (tts/stt) permissions in backend
2026-01-08 02:48:35 +04:00
Timothy Jaeryang Baek 1d08376860 refac 2026-01-05 18:55:44 +04:00
Timothy Jaeryang Baek d3ab9f4b96 fix: failed hash in files 2026-01-05 18:21:00 +04:00