Commit Graph

21 Commits

Author SHA1 Message Date
Manuel Rüger
ac264210b5 Feature/robust comment preservation (#768)
This is based on guoweis-work PR https://github.com/kovetskiy/mark/pull/145

* feat(confluence): add support for fetching page body and inline comments

* feat(cmd): add --preserve-comments flag to preserve inline comments

* feat(mark): implement context-aware inline comment preservation

* test(mark): add tests for context-aware MergeComments logic

* fix: remove empty else branch in MergeComments to fix SA9003

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* perf: compile markerRegex once as package-level variable

Avoids recompiling the inline comment marker regex on every call to
MergeComments, which matters for pages with many comment markers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: guard against nil comments pointer in MergeComments

Prevents a panic when GetInlineComments returns nil (e.g. on pages
where the inline comments feature is not enabled).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: add edge-case tests for MergeComments; fix overlapping replacement

Four new test cases:
- SelectionMissing: comment dropped gracefully when text is gone from new body
- OverlappingSelections: overlapping comments no longer corrupt the body;
  the later match (by position) wins and the earlier overlapping one is dropped
- NilComments: nil pointer returns new body unchanged
- HTMLEntities: &lt;, &gt;, &#39; selections match correctly

Also fixes the overlapping replacement bug: apply back-to-front and skip any
replacement whose end exceeds the start of an already-applied one.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: escape ref attribute value in inline comment marker XML

Use html.EscapeString on r.ref before interpolating it into the
ac:ref attribute to prevent malformed XML if the value ever contains
quotes or other special characters.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: use first occurrence when no context is available in MergeComments

Without context the old code left distance=0 for every match and
updated bestStart on each iteration, so the final result depended on
whichever occurrence was visited last (non-deterministic with respect
to the search order).

Restructure the loop to break immediately on the first match when
hasCtx is false, making the behaviour explicit and deterministic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: log warning when overlapping inline comment marker is dropped

Previously the overlap was silently skipped. Now a zerolog Warn message
is emitted with the ref, the conflicting byte offsets, and the ref of
the already-placed marker, so users can see which comment was lost
rather than silently getting incomplete output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: warn when inline comments are silently dropped in MergeComments

Three cases now emit a zerolog Warn instead of silently discarding:

1. Comment location != "inline": logs ref and actual location.
2. Selected text not found in new body: logs ref and selection text.
3. Overlapping replacement (existing): adds selection text to the
   already-present overlap warning for easier diagnosis.

Also adds a selection field to the replacement struct so the overlap
warning can report the dropped text.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: update markerRegex to match markers with nested tags

Replace ([^<]*) with (?s)(.*?) so the pattern:
- Matches marker content that contains nested inline tags (e.g. <strong>)
- Matches across newlines ((?s) / DOTALL mode)

The old character class [^<]* stopped at the first < inside the
marker body, causing the context-extraction step to miss any comment
whose original selection spanned formatted text.

Add TestMergeComments_NestedTags to cover this path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: guard against empty OriginalSelection in MergeComments

strings.Index(s, "") always returns 0, so an empty escapedSelection
would spin the search loop indefinitely (or panic when currentPos
advances past len(newBody)).

Skip comments with an empty selection early, emit a Warn log, and
add TestMergeComments_EmptySelection to cover the path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: paginate GetInlineComments to avoid silently truncating results

The Confluence child/comment endpoint is paginated. The previous
single-request implementation silently dropped any comments beyond
the server's default page size.

Changes:
- Add Links (context, next) to InlineComments struct so the _links
  field from each page response is decoded.
- Rewrite GetInlineComments to loop with limit/start parameters
  (pageSize=100), accumulating all results, following the same pattern
  used by GetAttachments and label fetching.
- Add TestMergeComments_DuplicateMarkerRef to cover the deduplication
  guard added in the previous commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix UTF-8 safety, API compat, log verbosity

- levenshteinDistance: convert to []rune before empty-string checks so
  rune counts (not byte counts) are returned for strings with multi-byte
  characters

- Add contextBefore/contextAfter helpers that use utf8.RuneStart to
  avoid slicing in the middle of a multi-byte UTF-8 sequence when
  extracting 100-char context windows from oldBody and newBody

- Add truncateSelection helper (50 runes + ellipsis) and apply it in all
  Warn log messages that include the selected text, preventing large or
  sensitive page content from appearing in logs

- Downgrade non-inline comment log from Warn to Debug with message
  'comment ignored during inline marker merge: not an inline comment';
  page-level comments are not inline markers and are not 'lost'

- Restore original one-argument GetPageByID (expand='ancestors,version')
  and add GetPageByIDExpanded for the one caller that needs a custom
  expand value, preserving backward compatibility for API consumers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address new PR review comments

- Remove custom min() function: shadows the Go 1.21+ built-in min for
  the entire package; the built-in handles the 3-arg call in
  levenshteinDistance identically

- Validate rune boundaries on strings.Index candidates: skip any match
  where start or end falls in the middle of a multi-byte UTF-8 rune
  to prevent corrupt UTF-8 output

- Defer preserve-comments API calls until after shouldUpdatePage is
  determined: avoids unnecessary GetPageByIDExpanded + GetInlineComments
  round-trips on no-op --changes-only runs

- Capitalize Usage string for --preserve-comments flag (util/flags.go)
  and matching README.md entry to match sentence case of surrounding flags

- Run gofmt on util/cli.go to fix struct literal field alignment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: document --preserve-comments feature in README

Add a dedicated 'Preserving Inline Comments' section under Tricks with:
- Usage examples (CLI flag and env var)
- Step-by-step explanation of the Levenshtein-based relocation algorithm
- Limitations (deleted text, overlapping selections, new pages,
  changes-only interaction)

Also add a cross-reference NOTE near the --preserve-comments flag entry
in the Usage section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: fix markdownlint errors in README

- Change unordered list markers from dashes to asterisks (MD004)
- Remove extra blank line before Issues section (MD012)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Extract named types for InlineComments; optimize Levenshtein search

- Introduce InlineCommentProperties, InlineCommentExtensions, and
  InlineCommentResult named types in confluence/api.go, replacing the
  anonymous nested struct in InlineComments.Results. Callers and tests
  can now construct/inspect comment objects without repeating the JSON
  shape.

- Simplify makeComments helper in mark_test.go to use the new named
  types directly, eliminating the verbose anonymous struct literal.

- Add two Levenshtein candidate-search optimisations in MergeComments:
  * Exact-context fast path: if both the before and after windows match
    exactly, take that occurrence immediately without computing distance.
  * Lower-bound pruning: skip the full O(m*n) Levenshtein computation
    for a candidate when the absolute difference in window lengths alone
    already meets or exceeds the current best distance.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Use stable sort with ref tie-breaker; fix README overlap description

- Replace slices.SortFunc with slices.SortStableFunc for the
  replacements slice, adding ref as a lexicographic tie-breaker when
  two markers resolve to the same start offset. This makes overlap
  resolution fully deterministic across runs.

- Correct the README limitation note: the *earlier* overlapping
  match (lower byte offset) is what gets dropped; the later one
  (higher byte offset, applied first in the back-to-front pass) is
  kept. The previous wording said 'the second one is dropped' which
  was ambiguous and inaccurate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix rune-based lower-bound pruning; clarify test comment

- Use utf8.RuneCountInString instead of len() for the Levenshtein
  lower-bound pruning computation. The levenshteinDistance function
  operates on rune slices, so byte-length differences can exceed the
  true rune-length difference for multibyte UTF-8 content, causing
  valid candidates to be incorrectly skipped.

- Update TestMergeComments_SelectionMissing comment to say the comment
  is 'dropped with a warning' rather than 'silently dropped', matching
  the actual behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add missing unit tests for helpers and MergeComments scenarios

Helper function tests:
- TestTruncateSelection: short/exact/long strings and multibyte runes
- TestLevenshteinDistance: empty strings, identical, insertions,
  deletions, substitutions, 'kitten/sitting', and a multibyte UTF-8
  case to exercise rune-based counting
- TestContextBefore / TestContextAfter: basic windowing, window larger
  than string, and a case where the raw byte offset lands mid-rune (é)
  to verify the rune-boundary correction logic

MergeComments scenario tests:
- TestMergeComments_MultipleComments: two non-overlapping comments both
  correctly applied via back-to-front replacement
- TestMergeComments_EmptyResults: non-nil InlineComments with zero
  results returns body unchanged
- TestMergeComments_NonInlineLocation: page-level comments (location
  != 'inline') are skipped; body unchanged
- TestMergeComments_NoContext: when a ref has no marker in oldBody the
  first occurrence of the selection in newBody is used
- TestMergeComments_UTF8: multibyte (Japanese) characters in both body
  and selection are handled correctly

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix three correctness issues in MergeComments

- Fix html import shadowing: alias the 'html' import as 'stdhtml' to
  avoid shadowing by the local 'html' variable used throughout
  ProcessFile. Both callers updated: stdhtml.EscapeString for the
  ref attribute, htmlEscapeText for the selection search.

- Fix selection search with quotes/apostrophes: replace
  html.EscapeString for the selection with a new htmlEscapeText helper
  that only escapes &, <, > — not ' or ". Confluence storage HTML
  often leaves quotes and apostrophes unescaped in text nodes, so
  fully-escaped selections would fail to match and inline comments
  would be silently dropped. Add TestMergeComments_SelectionWithQuotes.

- Fix duplicate-ref warnings: move seenRefs[ref]=true to immediately
  after the duplicate-check, before the search loop. Previously seenRefs
  was only set on a successful match, so multiple results for the same
  MarkerRef with no match in the new body would each emit a 'dropped'
  warning. Add TestMergeComments_DuplicateMarkerRefDropped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Optimize levenshteinDistance to use two rolling rows instead of full matrix

Reduces memory allocation from O(m×n) to O(n) by keeping only the
previous and current rows. Also swaps r1/r2 so the shorter string is
used for column width, minimizing row allocation size.

This matters in MergeComments where levenshteinDistance is called for
every candidate match of every comment's selection in newBody — on
pages with many comments or short/common selections the number of
calls can be high.

Addresses thread [40] from PR review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix test description and README algorithm doc

mark_test.go (thread [43]):
- TestMergeComments_HTMLEntities: the description incorrectly claimed
  &#39; (apostrophe) was tested; the selection '<world>' contains no
  apostrophe. Updated comment to accurately describe what is covered
  (&lt;/&gt; entity matching) and note the &#39; limitation.
- Add TestMergeComments_ApostropheSelection: verifies a selection with
  a literal apostrophe is found when the new body also has a literal
  apostrophe (the common case from mark's renderer). This exercises
  the htmlEscapeText path which intentionally does not encode ' or ".

README.md (thread [42]):
- Step 2 of the algorithm description said context was recorded
  'immediately before and after the commented selection' which is
  ambiguous. Clarified that context windows are taken around the
  <ac:inline-comment-marker> tag boundaries in the old body (not
  around the raw selection text), so the context is stable even when
  the marker wraps additional inline markup such as <strong>.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Unexport mergeComments and cap candidate evaluation

Thread [44]: MergeComments was exported but is internal-only — only
called within the mark package and tested from the same package.
Unexport it to mergeComments to avoid expanding the public API surface
unnecessarily. Add a Go doc comment describing the function contract,
HTML expectations, and the candidate cap.

Thread [45]: The candidate-scoring loop had no upper bound. For short
or common selections (e.g. 'a', 'the') on large pages the loop could
invoke levenshteinDistance thousands of times, each allocating rune
and int slices. Add a maxCandidates=100 constant and break once that
many on-rune-boundary occurrences have been evaluated. The exact-context
fast-path and lower-bound pruning already skip many candidates before
Levenshtein is called, so in practice the cap is only reached for very
common selections where the 100th candidate is unlikely to be
meaningfully better than an earlier one anyway.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: fix HTMLEntities description and add ApostropheEncoded limitation test

Thread #43: TestMergeComments_HTMLEntities had a misleading note claiming it
covered the &#39; apostrophe case, but the selection under test ('<world>') did
not include an apostrophe. Remove that note and add a dedicated
TestMergeComments_ApostropheEncoded test that explicitly documents the known
limitation: when a Confluence body stores an apostrophe as the numeric entity
&#39;, mergeComments cannot locate the selection (htmlEscapeText does not
encode ' to &#39;), so the comment is dropped with a warning.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix CDATA selection fallback and extract contextWindowBytes constant

Thread #46: mergeComments only searched for htmlEscapeText(selection) and
would fail for selections inside CDATA-backed macro bodies (e.g. ac:code),
where < and > are stored as raw characters rather than HTML entities. Restructure
the search loop to build a searchForms slice: the escaped form is tried first
(covers normal XML text nodes), and the raw unescaped form is appended as a
fallback when they differ. A stopSearch flag exits early on an exact context
match or when maxCandidates is reached, preserving the same performance
guarantees as before. Add TestMergeComments_CDATASelection to cover this path.

Thread #47: The context-window size 100 was repeated in four places across
mergeComments (two in the context-extraction loop and two in the scoring loop).
Extract it to const contextWindowBytes = 100 so it is easy to tune and stays
consistent everywhere.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 16:20:03 +02:00
Manuel Rüger
a43f5fec2e refactor: modernize Go primitives
- Replace interface{} with any (Go 1.18) across confluence/api.go,
  macro/macro.go, util/cli.go, util/error_handler.go, includes/templates.go
- Replace sort.SliceStable with slices.SortStableFunc + cmp.Compare (Go 1.21)
  in attachment/attachment.go, consistent with existing slices usage
- Replace fmt.Errorf("%s", msg) with errors.New(msg) in mark.go

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-08 01:39:29 +02:00
Manuel Rüger
0859bf4d08 refactor: replace karma-go with standard error handling 2026-03-30 11:23:01 +02:00
Manuel Rüger
e160121005 feat: replace logging with zerolog 2026-03-30 11:23:01 +02:00
Manuel Rüger
7be2325340 fix: restore fallback in GetUserByName for older Confluence APIs 2026-03-26 00:52:56 +01:00
Manuel Rüger
2b62ffd822 confluence: fix NewAPI double-slash and DeletePageLabel missing status check
NewAPI: normalize baseURL by trimming the trailing slash before building
rest and json-rpc endpoints. Previously the TrimSuffix only applied to
api.BaseURL but rest/json URLs were already constructed with the raw
(potentially trailing-slash) baseURL, producing double slashes like
'http://example.com//rest/api'.

DeletePageLabel: add a non-200/non-204 status check before the type
assertion. Without this guard any error response (400, 403, 500) would
fall through to request.Response.(*LabelInfo) and either panic or return
garbage data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 19:18:29 +01:00
Manuel Rüger
0d7caab5d8 fix: close response body on all paths in newErrorStatusNotOK
The 401 and 404 early-return paths returned without closing the HTTP
response body, leaking the underlying connection. Move the
defer body.Close() to the top of the function so it runs regardless
of which code path is taken.

fix: add HTTP status check to GetCurrentUser

GetCurrentUser did not validate the HTTP response status code. A
401/403/500 response was silently ignored and returned a zero-value
User pointer, causing callers (e.g. RestrictPageUpdatesCloud fallback)
to use an empty accountId.

fix: return nil on HTTP 204 from DeletePageLabel instead of panicking

DeletePageLabel accepted both 200 OK and 204 No Content as success, but
then unconditionally did request.Response.(*LabelInfo). On a 204 the
response body is empty so request.Response is nil; the type assertion
panics. Return (nil, nil) for 204 responses.

fix: paginate GetPageLabels to handle pages with >50 labels

A single request with the default page size silently truncated label
lists longer than the API default (~50). Add a pagination loop
matching the pattern used by GetAttachments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 19:18:29 +01:00
Manuel Rüger
ed6ae15500 fix: add HTTP status checks to GetUserByName; remove redundant FindHomePage check
GetUserByName made two REST requests without checking the HTTP status
codes. A 401/403/500 response would silently be treated as an empty
result set and return 'user not found' instead of the real error.
Add a status check after each request.

FindHomePage had 'StatusNotFound || != StatusOK' — the first clause
is always a subset of the second, making it dead code. Simplified to
just '!= StatusOK'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 19:18:29 +01:00
Manuel Rüger
b7c9229da4 fix: RestrictPageUpdatesCloud now resolves allowedUser by name
The allowedUser parameter was completely ignored; the function always
restricted edits to the currently authenticated API user via
GetCurrentUser(). Resolve the specified user via GetUserByName first
and fall back to the current user only if that lookup fails, matching
the behaviour of RestrictPageUpdatesServer which uses the parameter
directly.

fix: paginate GetAttachments to handle pages with >100 attachments

The previous implementation fetched a single page of up to 1000
attachments. Pages with more than 1000 attachments would silently
miss some, causing attachment sync to skip or re-upload them.
Replace with a pagination loop (100 per page) that follows the
_links.next cursor until all attachments are retrieved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 19:18:29 +01:00
Manuel Rüger
3e71d65f61 fix: remove unused newLabels parameter from UpdatePage
The newLabels parameter was accepted but never used in the function
body; labels are synced through the separate updateLabels/AddPageLabels
/DeletePageLabel calls. The dead parameter misled callers into thinking
labels were being set during the page update.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 19:18:29 +01:00
Manuel Rüger
566fd74efe fix: validate emoji rune from utf8.DecodeRuneInString
DecodeRuneInString returns utf8.RuneError for invalid UTF-8, which was
silently converted to the hex string "fffd" and sent to Confluence.
Return an error instead so the caller gets a clear diagnostic rather
than storing a replacement character as the page emoji.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 19:18:29 +01:00
Manuel Rüger
9184e91268 fix: defer body.Close() before ReadAll to ensure it runs on read error
The defer was placed after io.ReadAll, so if ReadAll returned an
error the body would not be closed. Move the defer before the read.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-16 19:18:29 +01:00
Nikolai Emil Damm
a334c1c1cc feat: enhance Confluence link generation by utilizing base URL from API response
Signed-off-by: Nikolai Emil Damm <ndam@tv2.dk>
2026-01-06 15:25:59 +01:00
dgudim
e82c425471 Rename insecure flag to insecure-skip-tls-verify 2025-12-08 21:53:28 +01:00
Danila Gudim
b36d7aa135 feature: Add --insecure flag for ignoring tls errors 2025-12-08 21:53:28 +01:00
iyz
d789261c9a feat: use gopencils retrial option, upgrade version 2025-04-04 22:13:57 +02:00
Manuel Rüger
f24d8c8957 Fix lint issues detected by golangci v2 2025-04-02 16:42:32 +02:00
Joris Conijn
1a0e452910 feat: support emojis on pages
Define an emoji in the markdown files and get them published as page
emoji icons.
2025-02-17 17:31:04 +01:00
Kassem Sandarusi
5accce3b17 add flag for updating on change 2025-01-13 19:05:29 +01:00
Smaug123
82aebec1eb Use query param for labels 2024-10-21 13:11:42 +02:00
Manuel Rüger
dc8842106b *: Reorganize code 2024-09-29 00:13:04 +02:00