Pairs unscoped label resolution and viewer-scoped link gating in a
WorkPackagePreloadCache instead of two RequestStore keys with a
five-method save/restore protocol. Exposes one `current_cache` reader;
consumers ask the cache directly via `fetch` and `visible?`.
Extracts a `text_only?` predicate in the WP link handler so the
`context[:plain_text]` and invisible-WP guards collapse into a single
call site. `SemanticIdentifier.format` renames its parameter to
reflect that the input may or may not be semantic.
The macro preload was visibility-scoped — references to work packages the
recipient cannot see would fall through to the literal `#43` shape, even
when the same reference rendered as `DCP-1` for an author with full view
permission. Notification recipients saw misleading numeric ids for cross-
project references in journal notes.
Splits label resolution from link gating:
- `ResourceLinksMatcher.build_lookup` now does an unscoped fetch for the
primary identifier and a separate visibility-scoped id pluck. The link
handler reads `visible_to_current_user?` to decide between a navigable
anchor and a plain-text label.
- `UpdateAncestorsService#set_journal_note` writes `#display_id` so new
notes carry the semantic shape at the source; render-time resolution
heals legacy `#N` content for users with view permission.
Tradeoff: a recipient without view permission now sees the WP's semantic
identifier (e.g. `DCP-1`) as plain text rather than `#43`. The reference's
existence was already disclosed by the stored journal text; the project
identifier is the only new piece of information surfaced, and is not
treated as a secret elsewhere in the system (URLs, exports, API).
The `mentioned` and `watcher_changed` text-mailer bodies surfaced raw
journal markdown — numeric `#42` references stayed numeric in semantic
mode, and `<mention>` envelopes leaked as HTML source.
Introduces `:plain_text` as a sibling format inside the existing Plain
module. The filter chain mirrors the markdown pipeline (markdown,
sanitization, mention, pattern-matcher) and finishes with a new
`PlainTextOutputFilter` that collapses the DOM to text. The
`WorkPackages` link handler and `MentionFilter` get plain-text branches
keyed off `context[:plain_text]` so identifier resolution stays in one
place across rich and plain channels.
Closes https://community.openproject.org/wp/74762
* Scope SQL log assertions to target SELECTs in alias-fold-in spec
The N+1 guard was counting the entire QueryRecorder log, which made it
brittle against any incidental Setting/permission query that landed on
the same render path. Switch to scoped greps against the two SELECTs
we actually care about: the work_packages batched query and the
sidecar alias pluck. A regression on either now fails with a clear
message pointing at the offending source.
* Flatten matcher preload wrapping into an iterative fold
The recursive shift-and-recurse shape mutated a duplicated array and
forwarded an anonymous block at every frame, which obscured what the
loop was actually doing: wrap each opt-in matcher's preload hook
around the inner block, first matcher outermost. The iterative form
walks the matcher list once in reverse and rebinds a lambda, so the
nesting order is visible without unwinding a recursion.
* Make the WP preload cache's stringified-key invariant explicit
Both ends of the cache assumed every key was a string, but the
contract lived only in the read site's `identifier.to_s` and an
ambient confidence that the identifier column is text. Normalize at
write time too, swap the safe-navigation lookup for `dig`, and leave
a one-line note at the canonical builder so a future reader doesn't
have to grep the call sites to convince themselves a numeric input
will resolve.
The alias fold-in subquery materialised the user's full set of
visible work packages, then IN-tested against it. Threading the
already-loaded WP ids from `build_lookup` collapses that to a
literal IN-list bounded by user input, and makes the visibility
contract explicit at the method signature rather than implicit
in a `lookup.values` invariant.
The end-to-end visibility tests still pin behaviour for aliases
that target inaccessible work packages.
The lookup cache in `ResourceLinksMatcher` resolved `#42` / `#PROJ-1`
/ `#OLDPROJ-1` references without any permission check, so the link
handler could render `formatted_id` / `display_id` for work packages
the current user had no read access to. Both the main query and the
historical-alias pluck now scope through `WorkPackage.visible`,
matching `MentionFilter` and the other link handlers.
The historical-alias query-count spec now asserts statement count
rather than table-name grep — the visibility CTE references
`work_packages` in both statements, so the old regex over-matched.
Quickinfo macro emission for `##PROJ-1` / `###PROJ-1` is unchanged
and queued as a separate PR. The data-id only echoes user input and
the API endpoints enforce auth, but the Angular custom element still
bootstraps and renders a "not found" error chip for inaccessible
WPs. Fixing that cleanly needs the cache to distinguish `denied`
from `absent`.
Refs #23221.
It's like OpenProject::Cache, but it encrypts cached
values at rest. Callers that store confidential things in the cache
have been updated to use it, reducing the risk to expose secrets
to an attacker that obtained access to the contents of OpenProject's cache.
Also expose the #delete method offered by Rails caches.
Adapt the interface of existing methods to stricter follow
the upstream interface. neither #read nor #write accept passing
a block to them.
- mention_filter.rb: rename local `href_id` to `display_id` to mirror the
method name and the `data-display-id` wire attribute (the value is the
user-facing identifier, used for both the href and the data-attribute).
- auto_completes_controller.rb: trim the `displayId` doc comment that
incorrectly claimed the editor builds the mention's link URL from this
field. The URL is composed server-side at render time from `data-id` /
`data-display-id`; the editor only inserts the markdown source.
`MentionFilter#work_package_mention` and
`LinkHandlers::WorkPackages#render_work_package_macro` render
`<opce-macro-wp-quickinfo data-id="<wp.id>" data-display-id="<wp.display_id>" data-detailed="…">`.
`data-id` is the work-package id (stable across renames);
`data-display-id` is the user-facing identifier (semantic in
semantic mode, numeric string in classic). The convention matches
the wire form on `<mention>` envelopes and the
`data-type="user"`/`"group"` mention convention, where `data-id`
has always been the record id.
The link-handler fetches the work package on the semantic-mode
quickinfo branch too — the preload is already populated in semantic
mode, so this is a `RequestStore` hit, not a SELECT.
`MentionFilter` reads `data-id` and resolves via `find_by(id:)`.
Non-numeric `data-id` (parser-emitted source-typed mentions) falls
back to literal text.
`WorkPackageQuickinfoMacroComponent` reads
`dataset.displayId ?? dataset.id` so stored markdown produced before
the attribute split keeps loading: legacy
`<opce-macro-wp-quickinfo data-id="DISPLAY">` resolves via the
fallback; new shape resolves via the preferred attribute.
Backend specs lock the new shape end-to-end: the link handler test
fixture, the in-tool-links pipeline test, and the MentionFilter
spec all assert distinct `data-id` (id) and `data-display-id`
(display_id) values where they diverge, and identical values where
they don't.
In semantic mode, every `#N` reference in legacy content needs a WP
record to render the consistent `formatted_id` label users opted into.
The per-match `find_by_display_id` fallback that ships in #23203 scales
linearly with reference count: a wiki page with 50 refs runs 50 indexed
PK lookups per render, and API collection endpoints / mailer fan-out /
journal feeds compound the multiplier across records.
`PatternMatcherFilter` now primes a `RequestStore`-backed lookup once
per document via `ResourceLinksMatcher.with_preloaded_resources`. The
link handler reads from `work_package_for(identifier)` so the same path
serves numeric and semantic input. Cost is one batched `WorkPackage`
SELECT per render, plus an alias-table SELECT only when historical
identifiers are referenced. Classic mode short-circuits before any
preload — `formatted_id` collapses to the numeric form, so the matched
id alone is enough.
The save/restore around nested `format_text` is retained: custom-field
formatters re-enter the pipeline mid-render and must not clobber the
outer document's lookup.
The earlier `MAX_PRELOAD_IDENTIFIERS` cap is intentionally omitted.
Silent truncation past position N would render the (N+1)th reference
as literal text — a regression of the feature itself. Postgres handles
several-thousand-bind `IN` clauses comfortably; the right safety net,
if one is needed later, is log-and-continue, not truncate.
`PreformattedBlocks` is restored so the preload visitor and the
matcher's text-node walk share one `<pre>` / `<code>` skip.
Follow-up to https://github.com/opf/openproject/pull/23203
`version#0123` previously resolved to version 123 via the `to_i`
conversion that the link handler still uses. The earlier guard
piggybacked on `WorkPackage::SemanticIdentifier.numeric_id?`, which
treats canonical-shape strings only as numeric and so rejected
leading-zero inputs along with the semantic shapes it was added to
filter.
The check is now a digit-only regex anchored to the full identifier.
It admits any string that `to_i` parses to a positive integer (the
existing renderers all dereference through `oid`), while still
short-circuiting `version#PROJ-1` so prefixed resources don't issue
`find_by(id: 0)`.
The WP-side `numeric_id?` predicate stays intentionally stricter —
`#0123` should not resolve to WP 123 — so the two predicates diverge
on purpose.
The PreformattedBlocks extraction served two consumers
(PatternMatcherFilter and ResourceLinksMatcher) before the cache spike;
ResourceLinksMatcher's text-node walk is gone, so PatternMatcherFilter
is the only caller. Restore the inline PREFORMATTED_BLOCKS constant
matching dev's shape and remove the module.
The doc-level RequestStore-backed preload in PatternMatcherFilter +
ResourceLinksMatcher batched all `#N` references into a single
WorkPackage SELECT (with a sidecar WorkPackageSemanticAlias pluck for
historical aliases). It is removed; the WP link handler now resolves
each reference inline via WorkPackage.find_by_display_id, which already
encapsulates "numeric PK or current identifier or historical alias",
returning nil on miss so the existing literal-text fallback still kicks
in for unresolved refs.
Why drop it: the cache only changed user-visible output for one case
(bare `#1234` plain links in semantic mode, where the formatted_id
label needs the WP row). Quickinfo widgets fetch formatted_id from the
API at render time; PDF export and the editor mention path bypass this
filter entirely; autocomplete-stored mentions take a different code
path. Realistic doc shape is <10 refs per render, so per-match
find_by_display_id is fine and the document walk + IN-list batch +
RequestStore save/restore + alias-key fold-in is complexity without a
matching workload.
Tradeoffs:
- Each `#N` / `#PROJ-N` reference now issues its own SELECT, vs one
batched SELECT before.
- Historical-alias resolution loses its sidecar pluck; find_by_display_id
performs `identifier = ? OR EXISTS(alias subquery)` in a single SELECT.
- Classic mode `#N` rendering goes from zero queries (cache opted out) to
one query per reference.
Spec adjustments:
- Drop the with_preloaded_resources nesting test (mechanism is gone).
- Drop the classic-mode "query-free" test (no longer holds — this is the
most concrete cost of the spike).
- Update N+1 / mixed-refs tests to assert one SELECT per reference.
- Update the historical-alias test to assert one WP SELECT, no
alias-only SELECT.
The 500-identifier cap on doc-level work-package preload was defensive
against a workload that has never been observed: a single rendered body
containing 500+ distinct `#N` references. Work package bodies are
unbounded `text`, but real comments and descriptions sit nowhere near
that scale, and PostgreSQL handles IN-lists with thousands of bind
parameters comfortably (the protocol ceiling is ~32k).
The cost of the cap was not theoretical: once the limit was reached,
references past it silently fell through to the cache-miss path and
rendered as literal numeric ids — exactly the semantic-link breakage
this PR exists to fix. A feature that ships "render semantic display
ids in formatted text" cannot also ship "but only the first 500 per
document."
The pre-PR baseline had no preload at all (one SELECT per matched
reference), so removing the cap is strictly an improvement over the
shipped behaviour and a return to the pre-PR contract for pathological
inputs. If a real workload ever surfaces, the right answer is
render-path caching keyed on the rendered body, not a lossy truncation
that quietly hides links from the reader.
S2's splat refactor of where_display_id_in (c24e3cfab0) accepts a
flat list of values via `*values; values.flatten(1)`. Ruby's
Array#flatten doesn't unroll non-Array Enumerables — passing a Set
as a single arg leaves it boxed inside the values Array and the
subsequent map(&:to_s) stringifies the whole Set as one entry.
Splat at the call site unrolls the Set into varargs, restoring the
batched-lookup behaviour.
`lookup` is already keyed by `wp.id.to_s`, so the second index built
inside `fold_in_alias_keys` was redundant. The alias-lookup branch can
read straight from `lookup` using the plucked work_package_id.
The doc-level preload that backs the macro pipeline reads identifiers
straight out of user-pasted CKEditor content. Without a bound, a single
multi-megabyte comment could push thousands of values into one
`WHERE id IN (...)` and an alias `WHERE identifier IN (...)`, both built
in memory and shipped in one query.
`MAX_PRELOAD_IDENTIFIERS = 500` caps the per-render Set; references past
the cap render via the link handler's cache-miss fallback (numeric → bare
`#N` link, semantic → literal text).
Spec stubs the constant to 2 and asserts the WP IN-list never exceeds it.
Replace duplicated `%w(# ## ###).include?(matcher.sep)` and
`["##", "###"].include?(matcher.sep)` checks with named predicates on
the WorkPackages link handler: HASH_TRIGGERS for the `#`/`##`/`###`
family, hash_trigger? to gate the handler, and quickinfo?/detailed?
to distinguish the three macro shapes. The PDF-export subclass picks
both up via inheritance.
"Trigger" replaces "separator" in the WP context: in `#1234` the `#`
is a sigil that triggers mention recognition, not something that
separates parts of the reference. The matcher's generic `sep`
vocabulary stays — it still fits the `version#3` and `version:1.0.0`
cases that other handlers own.
Cuts comments that restate code mechanics, narrate the journey, or
pile on implementation detail. Genuine WHYs (perf rationale of the
to_i.to_s round-trip, the alias second-query reason, leading-zero
rejection, classic-mode preload skip) all stayed.
The matcher regex now accepts both numeric and semantic identifier shapes
(`\d+|PROJ-\d+`) on a single alternation branch shared by the bare WP
handler and the prefixed-resource handler. Without this guard,
`version#PROJ-1` reaches HashSeparator with `oid = "PROJ-1".to_i = 0`,
which is `present?`, so it issues `Version.find_by(id: 0)` — a guaranteed
miss against the wrong table. The same shape applies to every prefix
HashSeparator handles (document, message, meeting, project, user, group,
view), turning a single pasted comment into one wasted SELECT per prefix.
Gating `applicable?` on `numeric_id?(matcher.identifier)` short-circuits
the handler before any DB lookup; the regex still matches and the text
falls through to literal output. Regression test asserts zero
`FROM "versions"` SELECTs for `version#PROJ-1`.
The `value == value.to_i.to_s` round-trip check that filters leading-
zero ID forms ("0123") was duplicated across the WP link handler, the
PDF export macro, and the cost-query filter, and the matching guard in
`ResourceLinksMatcher#extract_work_package_identifier` was a no-op
(the regex's `\d+|[A-Z][A-Z0-9_]*-\d+` branch already gates the shape).
A new `WorkPackage::SemanticIdentifier.numeric_id?(value)` predicate
captures the canonical-numeric check at one site. It pairs with
`semantic_id?` as the WP-finder shape gate; the two answer different
questions (shape vs routing) and so are kept independent rather than
expressed as one another's negation.
The redundant `extract_work_package_identifier` guard is dropped along
with its misleading "rejecting leading-zero forms" comment — rejection
actually happens downstream in the link handler, where the new
predicate now reads as the contract it always was.
Comments shouldn't read as commit-history narration or as reviewer
shorthand. Each one now describes the present invariant in plain words:
- Drop "Pre-PR-E behaviour" / "matching pre-PR behaviour" / "matches the
pre-PR" — the spec and inline comments now state the current rule.
- Drop "load-bearing", "already in this codebase", "we fall back" — say
what the code does, not how a reviewer would describe it.
- Reword the `semantic_id?` rationale to a forward-looking constraint
(`don't tighten it`) instead of a comparative description of why this
module owns it.
The regex now declares each capture group by name, so `parse_match`
reads as a flat name-to-name map rather than "group 7 || 9 || 11"
arithmetic. Adding or removing a branch later doesn't ripple through
group numbers — historically that was the bug-window in this method.
Also bring the class-level docstring up to date with `##N`/`###N`
quickinfo macros (which were missing from the docs entirely) and the
new `#PROJ-N` semantic-id forms.