Feature/robust comment preservation (#768)

This is based on guoweis-work PR https://github.com/kovetskiy/mark/pull/145 * feat(confluence): add support for fetching page body and inline comments * feat(cmd): add --preserve-comments flag to preserve inline comments * feat(mark): implement context-aware inline comment preservation * test(mark): add tests for context-aware MergeComments logic * fix: remove empty else branch in MergeComments to fix SA9003 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * perf: compile markerRegex once as package-level variable Avoids recompiling the inline comment marker regex on every call to MergeComments, which matters for pages with many comment markers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: guard against nil comments pointer in MergeComments Prevents a panic when GetInlineComments returns nil (e.g. on pages where the inline comments feature is not enabled). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test: add edge-case tests for MergeComments; fix overlapping replacement Four new test cases: - SelectionMissing: comment dropped gracefully when text is gone from new body - OverlappingSelections: overlapping comments no longer corrupt the body; the later match (by position) wins and the earlier overlapping one is dropped - NilComments: nil pointer returns new body unchanged - HTMLEntities: <, >, ' selections match correctly Also fixes the overlapping replacement bug: apply back-to-front and skip any replacement whose end exceeds the start of an already-applied one. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: escape ref attribute value in inline comment marker XML Use html.EscapeString on r.ref before interpolating it into the ac:ref attribute to prevent malformed XML if the value ever contains quotes or other special characters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: use first occurrence when no context is available in MergeComments Without context the old code left distance=0 for every match and updated bestStart on each iteration, so the final result depended on whichever occurrence was visited last (non-deterministic with respect to the search order). Restructure the loop to break immediately on the first match when hasCtx is false, making the behaviour explicit and deterministic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: log warning when overlapping inline comment marker is dropped Previously the overlap was silently skipped. Now a zerolog Warn message is emitted with the ref, the conflicting byte offsets, and the ref of the already-placed marker, so users can see which comment was lost rather than silently getting incomplete output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: warn when inline comments are silently dropped in MergeComments Three cases now emit a zerolog Warn instead of silently discarding: 1. Comment location != "inline": logs ref and actual location. 2. Selected text not found in new body: logs ref and selection text. 3. Overlapping replacement (existing): adds selection text to the already-present overlap warning for easier diagnosis. Also adds a selection field to the replacement struct so the overlap warning can report the dropped text. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: update markerRegex to match markers with nested tags Replace ([^<]*) with (?s)(.*?) so the pattern: - Matches marker content that contains nested inline tags (e.g. <strong>) - Matches across newlines ((?s) / DOTALL mode) The old character class [^<]* stopped at the first < inside the marker body, causing the context-extraction step to miss any comment whose original selection spanned formatted text. Add TestMergeComments_NestedTags to cover this path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: guard against empty OriginalSelection in MergeComments strings.Index(s, "") always returns 0, so an empty escapedSelection would spin the search loop indefinitely (or panic when currentPos advances past len(newBody)). Skip comments with an empty selection early, emit a Warn log, and add TestMergeComments_EmptySelection to cover the path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: paginate GetInlineComments to avoid silently truncating results The Confluence child/comment endpoint is paginated. The previous single-request implementation silently dropped any comments beyond the server's default page size. Changes: - Add Links (context, next) to InlineComments struct so the _links field from each page response is decoded. - Rewrite GetInlineComments to loop with limit/start parameters (pageSize=100), accumulating all results, following the same pattern used by GetAttachments and label fetching. - Add TestMergeComments_DuplicateMarkerRef to cover the deduplication guard added in the previous commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix UTF-8 safety, API compat, log verbosity - levenshteinDistance: convert to []rune before empty-string checks so rune counts (not byte counts) are returned for strings with multi-byte characters - Add contextBefore/contextAfter helpers that use utf8.RuneStart to avoid slicing in the middle of a multi-byte UTF-8 sequence when extracting 100-char context windows from oldBody and newBody - Add truncateSelection helper (50 runes + ellipsis) and apply it in all Warn log messages that include the selected text, preventing large or sensitive page content from appearing in logs - Downgrade non-inline comment log from Warn to Debug with message 'comment ignored during inline marker merge: not an inline comment'; page-level comments are not inline markers and are not 'lost' - Restore original one-argument GetPageByID (expand='ancestors,version') and add GetPageByIDExpanded for the one caller that needs a custom expand value, preserving backward compatibility for API consumers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address new PR review comments - Remove custom min() function: shadows the Go 1.21+ built-in min for the entire package; the built-in handles the 3-arg call in levenshteinDistance identically - Validate rune boundaries on strings.Index candidates: skip any match where start or end falls in the middle of a multi-byte UTF-8 rune to prevent corrupt UTF-8 output - Defer preserve-comments API calls until after shouldUpdatePage is determined: avoids unnecessary GetPageByIDExpanded + GetInlineComments round-trips on no-op --changes-only runs - Capitalize Usage string for --preserve-comments flag (util/flags.go) and matching README.md entry to match sentence case of surrounding flags - Run gofmt on util/cli.go to fix struct literal field alignment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: document --preserve-comments feature in README Add a dedicated 'Preserving Inline Comments' section under Tricks with: - Usage examples (CLI flag and env var) - Step-by-step explanation of the Levenshtein-based relocation algorithm - Limitations (deleted text, overlapping selections, new pages, changes-only interaction) Also add a cross-reference NOTE near the --preserve-comments flag entry in the Usage section. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: fix markdownlint errors in README - Change unordered list markers from dashes to asterisks (MD004) - Remove extra blank line before Issues section (MD012) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Extract named types for InlineComments; optimize Levenshtein search - Introduce InlineCommentProperties, InlineCommentExtensions, and InlineCommentResult named types in confluence/api.go, replacing the anonymous nested struct in InlineComments.Results. Callers and tests can now construct/inspect comment objects without repeating the JSON shape. - Simplify makeComments helper in mark_test.go to use the new named types directly, eliminating the verbose anonymous struct literal. - Add two Levenshtein candidate-search optimisations in MergeComments: * Exact-context fast path: if both the before and after windows match exactly, take that occurrence immediately without computing distance. * Lower-bound pruning: skip the full O(m*n) Levenshtein computation for a candidate when the absolute difference in window lengths alone already meets or exceeds the current best distance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Use stable sort with ref tie-breaker; fix README overlap description - Replace slices.SortFunc with slices.SortStableFunc for the replacements slice, adding ref as a lexicographic tie-breaker when two markers resolve to the same start offset. This makes overlap resolution fully deterministic across runs. - Correct the README limitation note: the *earlier* overlapping match (lower byte offset) is what gets dropped; the later one (higher byte offset, applied first in the back-to-front pass) is kept. The previous wording said 'the second one is dropped' which was ambiguous and inaccurate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix rune-based lower-bound pruning; clarify test comment - Use utf8.RuneCountInString instead of len() for the Levenshtein lower-bound pruning computation. The levenshteinDistance function operates on rune slices, so byte-length differences can exceed the true rune-length difference for multibyte UTF-8 content, causing valid candidates to be incorrectly skipped. - Update TestMergeComments_SelectionMissing comment to say the comment is 'dropped with a warning' rather than 'silently dropped', matching the actual behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add missing unit tests for helpers and MergeComments scenarios Helper function tests: - TestTruncateSelection: short/exact/long strings and multibyte runes - TestLevenshteinDistance: empty strings, identical, insertions, deletions, substitutions, 'kitten/sitting', and a multibyte UTF-8 case to exercise rune-based counting - TestContextBefore / TestContextAfter: basic windowing, window larger than string, and a case where the raw byte offset lands mid-rune (é) to verify the rune-boundary correction logic MergeComments scenario tests: - TestMergeComments_MultipleComments: two non-overlapping comments both correctly applied via back-to-front replacement - TestMergeComments_EmptyResults: non-nil InlineComments with zero results returns body unchanged - TestMergeComments_NonInlineLocation: page-level comments (location != 'inline') are skipped; body unchanged - TestMergeComments_NoContext: when a ref has no marker in oldBody the first occurrence of the selection in newBody is used - TestMergeComments_UTF8: multibyte (Japanese) characters in both body and selection are handled correctly Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix three correctness issues in MergeComments - Fix html import shadowing: alias the 'html' import as 'stdhtml' to avoid shadowing by the local 'html' variable used throughout ProcessFile. Both callers updated: stdhtml.EscapeString for the ref attribute, htmlEscapeText for the selection search. - Fix selection search with quotes/apostrophes: replace html.EscapeString for the selection with a new htmlEscapeText helper that only escapes &, <, > — not ' or ". Confluence storage HTML often leaves quotes and apostrophes unescaped in text nodes, so fully-escaped selections would fail to match and inline comments would be silently dropped. Add TestMergeComments_SelectionWithQuotes. - Fix duplicate-ref warnings: move seenRefs[ref]=true to immediately after the duplicate-check, before the search loop. Previously seenRefs was only set on a successful match, so multiple results for the same MarkerRef with no match in the new body would each emit a 'dropped' warning. Add TestMergeComments_DuplicateMarkerRefDropped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Optimize levenshteinDistance to use two rolling rows instead of full matrix Reduces memory allocation from O(m×n) to O(n) by keeping only the previous and current rows. Also swaps r1/r2 so the shorter string is used for column width, minimizing row allocation size. This matters in MergeComments where levenshteinDistance is called for every candidate match of every comment's selection in newBody — on pages with many comments or short/common selections the number of calls can be high. Addresses thread [40] from PR review. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix test description and README algorithm doc mark_test.go (thread [43]): - TestMergeComments_HTMLEntities: the description incorrectly claimed ' (apostrophe) was tested; the selection '<world>' contains no apostrophe. Updated comment to accurately describe what is covered (</> entity matching) and note the ' limitation. - Add TestMergeComments_ApostropheSelection: verifies a selection with a literal apostrophe is found when the new body also has a literal apostrophe (the common case from mark's renderer). This exercises the htmlEscapeText path which intentionally does not encode ' or ". README.md (thread [42]): - Step 2 of the algorithm description said context was recorded 'immediately before and after the commented selection' which is ambiguous. Clarified that context windows are taken around the <ac:inline-comment-marker> tag boundaries in the old body (not around the raw selection text), so the context is stable even when the marker wraps additional inline markup such as <strong>. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Unexport mergeComments and cap candidate evaluation Thread [44]: MergeComments was exported but is internal-only — only called within the mark package and tested from the same package. Unexport it to mergeComments to avoid expanding the public API surface unnecessarily. Add a Go doc comment describing the function contract, HTML expectations, and the candidate cap. Thread [45]: The candidate-scoring loop had no upper bound. For short or common selections (e.g. 'a', 'the') on large pages the loop could invoke levenshteinDistance thousands of times, each allocating rune and int slices. Add a maxCandidates=100 constant and break once that many on-rune-boundary occurrences have been evaluated. The exact-context fast-path and lower-bound pruning already skip many candidates before Levenshtein is called, so in practice the cap is only reached for very common selections where the 100th candidate is unlikely to be meaningfully better than an earlier one anyway. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test: fix HTMLEntities description and add ApostropheEncoded limitation test Thread #43: TestMergeComments_HTMLEntities had a misleading note claiming it covered the ' apostrophe case, but the selection under test ('<world>') did not include an apostrophe. Remove that note and add a dedicated TestMergeComments_ApostropheEncoded test that explicitly documents the known limitation: when a Confluence body stores an apostrophe as the numeric entity ', mergeComments cannot locate the selection (htmlEscapeText does not encode ' to '), so the comment is dropped with a warning. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix CDATA selection fallback and extract contextWindowBytes constant Thread #46: mergeComments only searched for htmlEscapeText(selection) and would fail for selections inside CDATA-backed macro bodies (e.g. ac:code), where < and > are stored as raw characters rather than HTML entities. Restructure the search loop to build a searchForms slice: the escaped form is tried first (covers normal XML text nodes), and the raw unescaped form is appended as a fallback when they differ. A stopSearch flag exits early on an exact context match or when maxCandidates is reached, preserving the same performance guarantees as before. Add TestMergeComments_CDATASelection to cover this path. Thread #47: The context-window size 100 was repeated in four places across mergeComments (two in the context-extraction loop and two in the scoring loop). Extract it to const contextWindowBytes = 100 so it is easy to tune and stays consistent everywhere. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-18 04:01:12 +00:00 · 2026-04-08 15:44:21 +02:00
parent a43f5fec2e
commit ac264210b5
6 changed files with 840 additions and 10 deletions
--- a/README.md
+++ b/README.md
@@ -880,6 +880,7 @@ GLOBAL OPTIONS:
   --mermaid-scale float                    defines the scaling factor for mermaid renderings. (default: 1) [$MARK_MERMAID_SCALE]
   --include-path string                    Path for shared includes, used as a fallback if the include doesn't exist in the current directory. [$MARK_INCLUDE_PATH]
   --changes-only                           Avoids re-uploading pages that haven't changed since the last run. [$MARK_CHANGES_ONLY]
+   --preserve-comments                      Fetch and preserve inline comments on existing Confluence pages. [$MARK_PRESERVE_COMMENTS]
   --d2-scale float                         defines the scaling factor for d2 renderings. (default: 1) [$MARK_D2_SCALE]
   --features string [ --features string ]  Enables optional features. Current features: d2, mermaid, mention, mkdocsadmonitions (default: "mermaid", "mention") [$MARK_FEATURES]
   --insecure-skip-tls-verify               skip TLS certificate verification (useful for self-signed certificates) [$MARK_INSECURE_SKIP_TLS_VERIFY]
@@ -903,6 +904,8 @@ image-align = "center"

 **NOTE**: Labels aren't supported when using `minor-edit`!

+**NOTE**: See [Preserving Inline Comments](#preserving-inline-comments) for a detailed description of the `--preserve-comments` flag.
+
 **NOTE**: The system specific locations are described in here:
 <https://pkg.go.dev/os#UserConfigDir>.
 Currently, these are:
@@ -973,6 +976,34 @@ mark -f "**/docs/*.md"

 We recommend to lint your markdown files with [markdownlint-cli2](https://github.com/DavidAnson/markdownlint-cli2) before publishing them to confluence to catch any conversion errors early.

+### Preserving Inline Comments
+
+When collaborators leave inline comments on a Confluence page, updating the page via `mark` will normally erase those comments because the stored body is fully replaced. The `--preserve-comments` flag re-attaches inline comment markers to the new page body before uploading, so existing review threads survive updates.
+
+```bash
+mark --preserve-comments -f docs/page.md
+```
+
+Or via environment variable:
+
+```bash
+MARK_PRESERVE_COMMENTS=true mark -f docs/page.md
+```
+
+**How it works:**
+
+1. Before uploading, `mark` fetches the current page body and all inline comment markers from the Confluence API.
+2. For each existing `<ac:inline-comment-marker>` tag it records the content wrapped by that marker plus a short context window immediately before the opening tag and immediately after the closing tag in the old body (not around the raw selection text, so the context is stable even when the marker wraps additional inline markup such as `<strong>`).
+3. It searches the new body for the same selected text and picks the occurrence whose surrounding context best matches the original (using Levenshtein distance), so the marker lands in the right place even if nearby text has shifted.
+4. The updated body—with all markers re-embedded—is then uploaded as normal.
+
+**Limitations:**
+
+* If the commented text was deleted from the document, the inline comment cannot be relocated and will be lost. `mark` logs a warning in this case.
+* Overlapping selections (two comments anchored to the same stretch of text) are detected; the earlier overlapping match is dropped with a warning, and the later one (higher byte offset) is kept, rather than producing malformed markup.
+* `--preserve-comments` is automatically skipped for newly created pages (there are no comments to preserve yet).
+* When combined with `--changes-only`, the comment-preservation API calls are skipped entirely on runs where the page content has not changed, avoiding unnecessary round-trips.
+
 ## Issues, Bugs & Contributions

 I've started the project to solve my own problem and open sourced the solution so anyone who has a problem like me can solve it too.
--- a/confluence/api.go
+++ b/confluence/api.go
@@ -58,6 +58,12 @@ type PageInfo struct {
 		Title string `json:"title"`
 	} `json:"ancestors"`

+	Body struct {
+		Storage struct {
+			Value string `json:"value"`
+		} `json:"storage"`
+	} `json:"body"`
+
 	Links struct {
 		Full string `json:"webui"`
 		Base string `json:"-"` // Not from JSON; populated from response _links.base
@@ -85,6 +91,29 @@ type LabelInfo struct {
 	Labels []Label `json:"results"`
 	Size   int     `json:"number"`
 }
+
+type InlineCommentProperties struct {
+	OriginalSelection string `json:"originalSelection"`
+	MarkerRef         string `json:"markerRef"`
+}
+
+type InlineCommentExtensions struct {
+	Location         string                  `json:"location"`
+	InlineProperties InlineCommentProperties `json:"inlineProperties"`
+}
+
+type InlineCommentResult struct {
+	Extensions InlineCommentExtensions `json:"extensions"`
+}
+
+type InlineComments struct {
+	Links struct {
+		Context string `json:"context"`
+		Next    string `json:"next"`
+	} `json:"_links"`
+	Results []InlineCommentResult `json:"results"`
+}
+
 type form struct {
 	buffer io.Reader
 	writer *multipart.Writer
@@ -464,9 +493,13 @@ func (api *API) GetAttachments(pageID string) ([]AttachmentInfo, error) {
 }

 func (api *API) GetPageByID(pageID string) (*PageInfo, error) {
+	return api.GetPageByIDExpanded(pageID, "ancestors,version")
+}
+
+func (api *API) GetPageByIDExpanded(pageID string, expand string) (*PageInfo, error) {
 	request, err := api.rest.Res(
 		"content/"+pageID, &PageInfo{},
-	).Get(map[string]string{"expand": "ancestors,version"})
+	).Get(map[string]string{"expand": expand})
 	if err != nil {
 		return nil, err
 	}
@@ -478,6 +511,44 @@ func (api *API) GetPageByID(pageID string) (*PageInfo, error) {
 	return request.Response.(*PageInfo), nil
 }

+func (api *API) GetInlineComments(pageID string) (*InlineComments, error) {
+	const pageSize = 100
+	all := &InlineComments{}
+	start := 0
+
+	for {
+		result := &InlineComments{}
+		request, err := api.rest.Res(
+			"content/"+pageID+"/child/comment", result,
+		).Get(map[string]string{
+			"expand": "extensions.inlineProperties",
+			"limit":  fmt.Sprintf("%d", pageSize),
+			"start":  fmt.Sprintf("%d", start),
+		})
+		if err != nil {
+			return nil, err
+		}
+
+		if request.Raw.StatusCode != http.StatusOK {
+			return nil, newErrorStatusNotOK(request)
+		}
+
+		if all.Links.Context == "" {
+			all.Links = result.Links
+		}
+
+		all.Results = append(all.Results, result.Results...)
+
+		if len(result.Results) < pageSize || result.Links.Next == "" {
+			break
+		}
+
+		start += len(result.Results)
+	}
+
+	return all, nil
+}
+
 func (api *API) CreatePage(
 	space string,
 	pageType string,
--- a/mark.go
+++ b/mark.go
@@ -6,6 +6,7 @@ import (
 	"encoding/hex"
 	"errors"
 	"fmt"
+	stdhtml "html"
 	"io"
 	"os"
 	"path/filepath"
@@ -13,6 +14,7 @@ import (
 	"slices"
 	"strings"
 	"time"
+	"unicode/utf8"

 	"github.com/bmatcuk/doublestar/v4"
 	"github.com/kovetskiy/mark/v16/attachment"
@@ -28,6 +30,8 @@ import (
 	"github.com/rs/zerolog/log"
 )

+var markerRegex = regexp.MustCompile(`(?s)<ac:inline-comment-marker ac:ref="([^"]+)">(.*?)</ac:inline-comment-marker>`)
+
 // Config holds all configuration options for running Mark.
 type Config struct {
 	// Connection settings
@@ -55,10 +59,11 @@ type Config struct {
 	ContentAppearance        string

 	// Page updates
-	MinorEdit      bool
-	VersionMessage string
-	EditLock       bool
-	ChangesOnly    bool
+	MinorEdit        bool
+	VersionMessage   string
+	EditLock         bool
+	ChangesOnly      bool
+	PreserveComments bool

 	// Rendering
 	DropH1          bool
@@ -282,6 +287,7 @@ func ProcessFile(file string, api *confluence.API, config Config) (*confluence.P
 	}

 	var target *confluence.PageInfo
+	var pageCreated bool

 	if meta != nil {
 		parent, pg, err := page.ResolvePage(false, api, meta)
@@ -298,6 +304,7 @@ func ProcessFile(file string, api *confluence.API, config Config) (*confluence.P
 			// conflict that can occur when attempting to update a page just
 			// after it was created. See issues/139.
 			time.Sleep(1 * time.Second)
+			pageCreated = true
 		}

 		target = pg
@@ -415,6 +422,27 @@ func ProcessFile(file string, api *confluence.API, config Config) (*confluence.P
 		finalVersionMessage = config.VersionMessage
 	}

+	// Only fetch the old body and inline comments when we know the page will
+	// actually be updated. This avoids unnecessary API round-trips for no-op
+	// runs (e.g. when --changes-only determines the content is unchanged).
+	if shouldUpdatePage && config.PreserveComments && !pageCreated {
+		pg, err := api.GetPageByIDExpanded(target.ID, "ancestors,version,body.storage")
+		if err != nil {
+			return nil, fmt.Errorf("unable to retrieve page body for comments: %w", err)
+		}
+		target = pg
+
+		comments, err := api.GetInlineComments(target.ID)
+		if err != nil {
+			return nil, fmt.Errorf("unable to retrieve inline comments: %w", err)
+		}
+
+		html, err = mergeComments(html, target.Body.Storage.Value, comments)
+		if err != nil {
+			return nil, fmt.Errorf("unable to merge inline comments: %w", err)
+		}
+	}
+
 	if shouldUpdatePage {
 		err = api.UpdatePage(
 			target,
@@ -531,3 +559,327 @@ func sha1Hash(input string) string {
 	h.Write([]byte(input))
 	return hex.EncodeToString(h.Sum(nil))
 }
+
+// htmlEscapeText escapes only the characters that Confluence storage HTML
+// always encodes in text nodes (&, <, >). Unlike html.EscapeString it does NOT
+// escape single-quotes or double-quotes, because those are frequently left
+// unescaped inside text nodes by the Confluence editor and by mark's own
+// renderer, so escaping them would prevent the selection-search from finding
+// a valid match.
+var htmlTextReplacer = strings.NewReplacer("&", "&amp;", "<", "&lt;", ">", "&gt;")
+
+func htmlEscapeText(s string) string {
+	return htmlTextReplacer.Replace(s)
+}
+
+// truncateSelection returns a truncated preview of s for use in log messages,
+// capped at maxRunes runes, with an ellipsis appended when trimmed.
+func truncateSelection(s string, maxRunes int) string {
+	runes := []rune(s)
+	if len(runes) <= maxRunes {
+		return s
+	}
+	return string(runes[:maxRunes]) + "…"
+}
+
+// contextBefore returns up to maxBytes of s ending at byteEnd, trimmed
+// forward to the nearest valid UTF-8 rune start so the slice is never
+// split across a multi-byte sequence.
+func contextBefore(s string, byteEnd, maxBytes int) string {
+	start := byteEnd - maxBytes
+	if start < 0 {
+		start = 0
+	}
+	for start < byteEnd && !utf8.RuneStart(s[start]) {
+		start++
+	}
+	return s[start:byteEnd]
+}
+
+// contextAfter returns up to maxBytes of s starting at byteStart, trimmed
+// back to the nearest valid UTF-8 rune start so the slice is never split
+// across a multi-byte sequence.
+func contextAfter(s string, byteStart, maxBytes int) string {
+	end := byteStart + maxBytes
+	if end >= len(s) {
+		return s[byteStart:]
+	}
+	for end > byteStart && !utf8.RuneStart(s[end]) {
+		end--
+	}
+	return s[byteStart:end]
+}
+
+func levenshteinDistance(s1, s2 string) int {
+	r1 := []rune(s1)
+	r2 := []rune(s2)
+
+	if len(r1) == 0 {
+		return len(r2)
+	}
+	if len(r2) == 0 {
+		return len(r1)
+	}
+
+	// Use two rolling rows instead of a full matrix to reduce allocations
+	// from O(m×n) to O(n). Swap r1/r2 so r2 is the shorter string, keeping
+	// the row width (len(r2)+1) as small as possible.
+	if len(r1) < len(r2) {
+		r1, r2 = r2, r1
+	}
+
+	prev := make([]int, len(r2)+1)
+	curr := make([]int, len(r2)+1)
+
+	for j := range prev {
+		prev[j] = j
+	}
+
+	for i := 1; i <= len(r1); i++ {
+		curr[0] = i
+		for j := 1; j <= len(r2); j++ {
+			cost := 0
+			if r1[i-1] != r2[j-1] {
+				cost = 1
+			}
+			curr[j] = min(
+				prev[j]+1,      // deletion
+				curr[j-1]+1,    // insertion
+				prev[j-1]+cost, // substitution
+			)
+		}
+		prev, curr = curr, prev
+	}
+	return prev[len(r2)]
+}
+
+type commentContext struct {
+	before string
+	after  string
+}
+
+// mergeComments re-embeds inline comment markers from the Confluence API into
+// newBody (the updated storage HTML about to be uploaded). It extracts context
+// from each existing marker in oldBody and uses Levenshtein distance to
+// relocate each marker to the best-matching position in newBody, so comment
+// threads survive page edits even when the surrounding text has shifted.
+//
+// At most maxCandidates occurrences of each selection are evaluated with
+// Levenshtein distance; further occurrences are ignored to bound CPU cost on
+// pages where a selection is short or very common.
+const maxCandidates = 100
+
+// contextWindowBytes is the number of bytes of surrounding text captured as
+// context around each inline-comment marker. It is used both when extracting
+// context from oldBody and when scoring candidates in newBody.
+const contextWindowBytes = 100
+
+func mergeComments(newBody string, oldBody string, comments *confluence.InlineComments) (string, error) {
+	if comments == nil {
+		return newBody, nil
+	}
+	// 1. Extract context for each comment from oldBody
+	contexts := make(map[string]commentContext)
+	matches := markerRegex.FindAllStringSubmatchIndex(oldBody, -1)
+	for _, match := range matches {
+		ref := oldBody[match[2]:match[3]]
+		// context around the tag
+		before := contextBefore(oldBody, match[0], contextWindowBytes)
+		after := contextAfter(oldBody, match[1], contextWindowBytes)
+		contexts[ref] = commentContext{
+			before: before,
+			after:  after,
+		}
+	}
+
+	type replacement struct {
+		start     int
+		end       int
+		ref       string
+		selection string
+	}
+	var replacements []replacement
+	seenRefs := make(map[string]bool)
+
+	for _, comment := range comments.Results {
+		if comment.Extensions.Location != "inline" {
+			log.Debug().
+				Str("location", comment.Extensions.Location).
+				Str("ref", comment.Extensions.InlineProperties.MarkerRef).
+				Msg("comment ignored during inline marker merge: not an inline comment")
+			continue
+		}
+
+		ref := comment.Extensions.InlineProperties.MarkerRef
+		selection := comment.Extensions.InlineProperties.OriginalSelection
+
+		if seenRefs[ref] {
+			// Multiple results share the same MarkerRef (e.g. threaded replies).
+			// The marker only needs to be inserted once; skip duplicates.
+			continue
+		}
+		// Mark ref as seen immediately so subsequent results for the same ref
+		// (threaded replies) are always deduplicated, even if this one is dropped.
+		seenRefs[ref] = true
+
+		if selection == "" {
+			log.Warn().
+				Str("ref", ref).
+				Msg("inline comment skipped: original selection is empty; comment will be lost")
+			continue
+		}
+
+		ctx, hasCtx := contexts[ref]
+
+		// Build the list of forms to search for in newBody. The escaped form
+		// is tried first (normal XML text nodes). The raw form is appended as a
+		// fallback for text inside CDATA-backed macro bodies (e.g. ac:code),
+		// where < and > are stored unescaped inside <![CDATA[...]]>.
+		escapedSelection := htmlEscapeText(selection)
+		searchForms := []string{escapedSelection}
+		if selection != escapedSelection {
+			searchForms = append(searchForms, selection)
+		}
+
+		var bestStart = -1
+		var bestEnd = -1
+		var minDistance = 1000000
+
+		// Iterate over search forms; stop as soon as we have a definitive best.
+		candidates := 0
+		stopSearch := false
+		for _, form := range searchForms {
+			if stopSearch {
+				break
+			}
+			currentPos := 0
+			for {
+				index := strings.Index(newBody[currentPos:], form)
+				if index == -1 {
+					break
+				}
+				start := currentPos + index
+				end := start + len(form)
+
+				// Skip candidates that start or end in the middle of a multi-byte
+				// UTF-8 rune; such a match would produce invalid UTF-8 output.
+				if !utf8.RuneStart(newBody[start]) || (end < len(newBody) && !utf8.RuneStart(newBody[end])) {
+					currentPos = start + 1
+					continue
+				}
+
+				candidates++
+				if candidates > maxCandidates {
+					stopSearch = true
+					break
+				}
+
+				if !hasCtx {
+					// No context available; use the first occurrence.
+					bestStart = start
+					bestEnd = end
+					stopSearch = true
+					break
+				}
+
+				newBefore := contextBefore(newBody, start, contextWindowBytes)
+				newAfter := contextAfter(newBody, end, contextWindowBytes)
+
+				// Fast path: exact context match is the best possible result.
+				if newBefore == ctx.before && newAfter == ctx.after {
+					bestStart = start
+					bestEnd = end
+					stopSearch = true
+					break
+				}
+
+				// Lower-bound pruning: Levenshtein distance is at least the
+				// absolute difference in rune counts. Use rune counts (not byte
+				// lengths) to match the unit levenshteinDistance operates on,
+				// avoiding false skips for multibyte UTF-8 content.
+				lbBefore := utf8.RuneCountInString(ctx.before) - utf8.RuneCountInString(newBefore)
+				if lbBefore < 0 {
+					lbBefore = -lbBefore
+				}
+				lbAfter := utf8.RuneCountInString(ctx.after) - utf8.RuneCountInString(newAfter)
+				if lbAfter < 0 {
+					lbAfter = -lbAfter
+				}
+				if lbBefore+lbAfter >= minDistance {
+					currentPos = start + 1
+					continue
+				}
+
+				distance := levenshteinDistance(ctx.before, newBefore) + levenshteinDistance(ctx.after, newAfter)
+
+				if distance < minDistance {
+					minDistance = distance
+					bestStart = start
+					bestEnd = end
+				}
+
+				currentPos = start + 1
+			}
+		}
+
+		if bestStart != -1 {
+			replacements = append(replacements, replacement{
+				start:     bestStart,
+				end:       bestEnd,
+				ref:       ref,
+				selection: selection,
+			})
+		} else {
+			log.Warn().
+				Str("ref", ref).
+				Str("selection_preview", truncateSelection(selection, 50)).
+				Msg("inline comment dropped: selected text not found in new body; comment will be lost")
+		}
+	}
+
+	// Sort replacements from back to front to avoid offset issues.
+	// Use a stable sort with ref as a tie-breaker so the ordering is
+	// deterministic when two markers resolve to the same start offset.
+	slices.SortStableFunc(replacements, func(a, b replacement) int {
+		if a.start != b.start {
+			return b.start - a.start
+		}
+		if a.ref < b.ref {
+			return -1
+		}
+		if a.ref > b.ref {
+			return 1
+		}
+		return 0
+	})
+
+	// Apply replacements back-to-front. Track the minimum start of any
+	// applied replacement so that overlapping candidates (whose end exceeds
+	// that boundary) are dropped rather than producing nested or malformed
+	// <ac:inline-comment-marker> tags.
+	minAppliedStart := len(newBody)
+	for _, r := range replacements {
+		if r.end > minAppliedStart {
+			// This replacement overlaps with an already-applied one.
+			// Drop it and warn so the user knows the comment was skipped.
+			log.Warn().
+				Str("ref", r.ref).
+				Str("selection_preview", truncateSelection(r.selection, 50)).
+				Int("start", r.start).
+				Int("end", r.end).
+				Int("conflicting_start", minAppliedStart).
+				Msg("inline comment marker dropped: selection overlaps an already-placed marker")
+			continue
+		}
+		minAppliedStart = r.start
+		selection := newBody[r.start:r.end]
+		withComment := fmt.Sprintf(
+			`<ac:inline-comment-marker ac:ref="%s">%s</ac:inline-comment-marker>`,
+			stdhtml.EscapeString(r.ref),
+			selection,
+		)
+		newBody = newBody[:r.start] + withComment + newBody[r.end:]
+	}
+
+	return newBody, nil
+}
--- a/mark_test.go
+++ b/mark_test.go
@@ -0,0 +1,369 @@
+package mark
+
+import (
+	"testing"
+
+	"github.com/kovetskiy/mark/v16/confluence"
+	"github.com/stretchr/testify/assert"
+)
+
+// ---------------------------------------------------------------------------
+// Helper function unit tests
+// ---------------------------------------------------------------------------
+
+func TestTruncateSelection(t *testing.T) {
+	assert.Equal(t, "hello", truncateSelection("hello", 10))
+	assert.Equal(t, "hello", truncateSelection("hello", 5))
+	assert.Equal(t, "hell…", truncateSelection("hello", 4))
+	assert.Equal(t, "", truncateSelection("", 5))
+	// Multibyte runes count as single units.
+	assert.Equal(t, "世界…", truncateSelection("世界 is the world", 2))
+}
+
+func TestLevenshteinDistance(t *testing.T) {
+	tests := []struct {
+		s1, s2 string
+		want   int
+	}{
+		{"", "", 0},
+		{"abc", "", 3},
+		{"", "abc", 3},
+		{"abc", "abc", 0},
+		{"abc", "axc", 1},   // one substitution
+		{"abc", "ab", 1},    // one deletion
+		{"ab", "abc", 1},    // one insertion
+		{"kitten", "sitting", 3},
+		// Multibyte: é is one rune, so distance from "héllo" to "hello" is 1.
+		{"héllo", "hello", 1},
+	}
+	for _, tt := range tests {
+		t.Run(tt.s1+"/"+tt.s2, func(t *testing.T) {
+			assert.Equal(t, tt.want, levenshteinDistance(tt.s1, tt.s2))
+		})
+	}
+}
+
+func TestContextBefore(t *testing.T) {
+	// Basic cases.
+	assert.Equal(t, "", contextBefore("hello", 0, 10))
+	assert.Equal(t, "hello", contextBefore("hello", 5, 10))
+	assert.Equal(t, "llo", contextBefore("hello", 5, 3))
+
+	// "héllo" is 6 bytes (h=1, é=2, l=1, l=1, o=1).
+	// maxBytes=4 → raw start=2, which lands mid-rune (é's continuation byte).
+	// Should advance to byte 3 (first 'l').
+	assert.Equal(t, "llo", contextBefore("héllo", 6, 4))
+}
+
+func TestContextAfter(t *testing.T) {
+	// Basic cases.
+	assert.Equal(t, "", contextAfter("hello", 5, 10))
+	assert.Equal(t, "hello", contextAfter("hello", 0, 10))
+	assert.Equal(t, "hel", contextAfter("hello", 0, 3))
+
+	// "héllo" is 6 bytes. contextAfter(s, 0, 2) → raw end=2 (é's continuation
+	// byte), which is not a rune start. Should back up to 1, returning just "h".
+	assert.Equal(t, "h", contextAfter("héllo", 0, 2))
+}
+
+// makeComments builds an InlineComments value from alternating
+// (selection, markerRef) pairs, all with location "inline".
+func makeComments(pairs ...string) *confluence.InlineComments {
+	c := &confluence.InlineComments{}
+	for i := 0; i+1 < len(pairs); i += 2 {
+		selection, ref := pairs[i], pairs[i+1]
+		c.Results = append(c.Results, confluence.InlineCommentResult{
+			Extensions: confluence.InlineCommentExtensions{
+				Location: "inline",
+				InlineProperties: confluence.InlineCommentProperties{
+					OriginalSelection: selection,
+					MarkerRef:         ref,
+				},
+			},
+		})
+	}
+	return c
+}
+
+func TestMergeComments(t *testing.T) {
+	body := "<p>Hello world</p>"
+	oldBody := `<p>Hello <ac:inline-comment-marker ac:ref="uuid-123">world</ac:inline-comment-marker></p>`
+	comments := makeComments("world", "uuid-123")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>Hello <ac:inline-comment-marker ac:ref="uuid-123">world</ac:inline-comment-marker></p>`, result)
+}
+
+func TestMergeComments_Escaping(t *testing.T) {
+	body := "<p>Hello &amp; world</p>"
+	oldBody := `<p>Hello <ac:inline-comment-marker ac:ref="uuid-456">&amp;</ac:inline-comment-marker> world</p>`
+	comments := makeComments("&", "uuid-456")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>Hello <ac:inline-comment-marker ac:ref="uuid-456">&amp;</ac:inline-comment-marker> world</p>`, result)
+}
+
+func TestMergeComments_Disambiguation(t *testing.T) {
+	body := "<p>Item one. Item two. Item one.</p>"
+	// Comment is on the second "Item one."
+	oldBody := `<p>Item one. Item two. <ac:inline-comment-marker ac:ref="uuid-1">Item one.</ac:inline-comment-marker></p>`
+	comments := makeComments("Item one.", "uuid-1")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	// Context should correctly pick the second occurrence
+	assert.Equal(t, `<p>Item one. Item two. <ac:inline-comment-marker ac:ref="uuid-1">Item one.</ac:inline-comment-marker></p>`, result)
+}
+
+// TestMergeComments_SelectionMissing verifies that a comment whose selection
+// no longer appears in the new body is dropped without returning an error or panicking.
+// A warning is logged so the user knows the comment was not relocated.
+func TestMergeComments_SelectionMissing(t *testing.T) {
+	body := "<p>Completely different content</p>"
+	oldBody := `<p><ac:inline-comment-marker ac:ref="uuid-gone">old text</ac:inline-comment-marker></p>`
+	comments := makeComments("old text", "uuid-gone")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	// Comment is dropped; body is returned unchanged.
+	assert.Equal(t, body, result)
+}
+
+// TestMergeComments_OverlappingSelections verifies that when two comments
+// reference overlapping text regions the later one (by position) is kept and
+// the earlier overlapping one is dropped rather than corrupting the body.
+func TestMergeComments_OverlappingSelections(t *testing.T) {
+	body := "<p>foo bar baz</p>"
+	// Neither comment has a marker in oldBody, so no positional context is
+	// available; the algorithm falls back to a plain string search.
+	oldBody := "<p>foo bar baz</p>"
+	// "foo bar" starts at 3, ends at 10; "bar baz" starts at 7, ends at 14.
+	// They overlap on "bar".  The later match (uuid-B at position 7) wins.
+	comments := makeComments("foo bar", "uuid-A", "bar baz", "uuid-B")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>foo <ac:inline-comment-marker ac:ref="uuid-B">bar baz</ac:inline-comment-marker></p>`, result)
+}
+
+// TestMergeComments_NilComments verifies that a nil comments pointer is
+// handled gracefully and the new body is returned unchanged.
+func TestMergeComments_NilComments(t *testing.T) {
+	body := "<p>Hello world</p>"
+	result, err := mergeComments(body, "", nil)
+	assert.NoError(t, err)
+	assert.Equal(t, body, result)
+}
+
+// TestMergeComments_HTMLEntities verifies that selections containing HTML
+// entities (&lt;, &gt;) are matched correctly. The API returns raw (unescaped)
+// text for OriginalSelection; htmlEscapeText encodes &, < and > to their
+// entity forms before searching.
+func TestMergeComments_HTMLEntities(t *testing.T) {
+	body := `<p>Hello &lt;world&gt; it&#39;s me</p>`
+	oldBody := `<p>Hello <ac:inline-comment-marker ac:ref="uuid-ent">&lt;world&gt;</ac:inline-comment-marker> it&#39;s me</p>`
+	// The API returns the raw (unescaped) selection text.
+	comments := makeComments("<world>", "uuid-ent")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>Hello <ac:inline-comment-marker ac:ref="uuid-ent">&lt;world&gt;</ac:inline-comment-marker> it&#39;s me</p>`, result)
+}
+
+// TestMergeComments_ApostropheEncoded verifies the known limitation: when a
+// selection includes an apostrophe that Confluence stores as the numeric
+// entity &#39; in the page body, mergeComments cannot locate the selection
+// (htmlEscapeText does not encode ' to &#39;) and the comment is dropped with
+// a warning rather than panicking or producing invalid output.
+func TestMergeComments_ApostropheEncoded(t *testing.T) {
+	// New body uses &#39; entity (as Confluence sometimes stores apostrophes).
+	body := `<p>Hello &lt;world&gt; it&#39;s me</p>`
+	// Old body has the comment marker around a selection that includes an apostrophe.
+	oldBody := `<p>Hello <ac:inline-comment-marker ac:ref="uuid-apos-enc">&lt;world&gt; it&#39;s</ac:inline-comment-marker> me</p>`
+	// The API returns the raw unescaped selection including a literal apostrophe.
+	comments := makeComments("<world> it's", "uuid-apos-enc")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	// The comment is dropped (body unchanged) because htmlEscapeText("it's")
+	// produces "it's", which doesn't match "it&#39;s" in the new body.
+	assert.Equal(t, body, result)
+}
+
+// TestMergeComments_ApostropheSelection verifies that a selection containing a
+// literal apostrophe is found when the new body also contains a literal
+// apostrophe (as mark's renderer typically emits). This exercises the
+// htmlEscapeText path which intentionally does not encode ' or ".
+func TestMergeComments_ApostropheSelection(t *testing.T) {
+	body := `<p>Hello it's a test</p>`
+	oldBody := `<p>Hello <ac:inline-comment-marker ac:ref="uuid-apos">it's</ac:inline-comment-marker> a test</p>`
+	// The API returns the raw (unescaped) selection text with a literal apostrophe.
+	comments := makeComments("it's", "uuid-apos")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>Hello <ac:inline-comment-marker ac:ref="uuid-apos">it's</ac:inline-comment-marker> a test</p>`, result)
+}
+
+
+// TestMergeComments_NestedTags verifies that a marker whose stored content
+// contains nested inline tags (e.g. <strong>) is still recognised by
+// markerRegex and the comment is correctly relocated into the new body.
+func TestMergeComments_NestedTags(t *testing.T) {
+	// The new body contains plain bold text (no marker yet).
+	body := "<p>Hello <strong>world</strong></p>"
+	// The old body already has the marker wrapping the bold tag.
+	oldBody := `<p>Hello <ac:inline-comment-marker ac:ref="uuid-nested"><strong>world</strong></ac:inline-comment-marker></p>`
+	// The API returns the raw selected text without markup.
+	comments := makeComments("world", "uuid-nested")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>Hello <strong><ac:inline-comment-marker ac:ref="uuid-nested">world</ac:inline-comment-marker></strong></p>`, result)
+}
+
+// TestMergeComments_EmptySelection verifies that a comment with an empty
+// OriginalSelection is skipped without panicking and the body is returned
+// unchanged.
+func TestMergeComments_EmptySelection(t *testing.T) {
+	body := "<p>Hello world</p>"
+	comments := makeComments("", "uuid-empty")
+
+	result, err := mergeComments(body, body, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, body, result)
+}
+
+// TestMergeComments_DuplicateMarkerRef verifies that multiple comment results
+// sharing the same MarkerRef (e.g. threaded replies) produce exactly one
+// <ac:inline-comment-marker> insertion rather than nested duplicates.
+func TestMergeComments_DuplicateMarkerRef(t *testing.T) {
+	body := "<p>Hello world</p>"
+	oldBody := `<p>Hello <ac:inline-comment-marker ac:ref="uuid-dup">world</ac:inline-comment-marker></p>`
+	// Two results with identical ref — simulates threaded replies.
+	comments := makeComments("world", "uuid-dup", "world", "uuid-dup")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>Hello <ac:inline-comment-marker ac:ref="uuid-dup">world</ac:inline-comment-marker></p>`, result)
+}
+
+// ---------------------------------------------------------------------------
+// Additional mergeComments scenario tests
+// ---------------------------------------------------------------------------
+
+// TestMergeComments_MultipleComments verifies that two non-overlapping comments
+// are both correctly re-embedded via back-to-front replacement.
+func TestMergeComments_MultipleComments(t *testing.T) {
+	body := "<p>Hello world and foo bar</p>"
+	oldBody := `<p>Hello <ac:inline-comment-marker ac:ref="uuid-1">world</ac:inline-comment-marker> and foo <ac:inline-comment-marker ac:ref="uuid-2">bar</ac:inline-comment-marker></p>`
+	comments := makeComments("world", "uuid-1", "bar", "uuid-2")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>Hello <ac:inline-comment-marker ac:ref="uuid-1">world</ac:inline-comment-marker> and foo <ac:inline-comment-marker ac:ref="uuid-2">bar</ac:inline-comment-marker></p>`, result)
+}
+
+// TestMergeComments_EmptyResults verifies that an InlineComments value with a
+// non-nil but empty Results slice is handled gracefully.
+func TestMergeComments_EmptyResults(t *testing.T) {
+	body := "<p>Hello world</p>"
+	result, err := mergeComments(body, body, &confluence.InlineComments{})
+	assert.NoError(t, err)
+	assert.Equal(t, body, result)
+}
+
+// TestMergeComments_NonInlineLocation verifies that page-level comments
+// (location != "inline") are silently skipped and the body is unchanged.
+func TestMergeComments_NonInlineLocation(t *testing.T) {
+	body := "<p>Hello world</p>"
+	comments := &confluence.InlineComments{
+		Results: []confluence.InlineCommentResult{
+			{
+				Extensions: confluence.InlineCommentExtensions{
+					Location: "page",
+					InlineProperties: confluence.InlineCommentProperties{
+						OriginalSelection: "Hello",
+						MarkerRef:         "uuid-page",
+					},
+				},
+			},
+		},
+	}
+	result, err := mergeComments(body, body, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, body, result)
+}
+
+// TestMergeComments_NoContext verifies that when a comment's MarkerRef has no
+// corresponding marker in oldBody (no context available) the first occurrence
+// of the selection in the new body is used.
+func TestMergeComments_NoContext(t *testing.T) {
+	body := "<p>foo bar foo</p>"
+	oldBody := "<p>foo bar foo</p>" // no markers → no context
+	comments := makeComments("foo", "uuid-noctx")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	// First occurrence of "foo" is at position 3.
+	assert.Equal(t, `<p><ac:inline-comment-marker ac:ref="uuid-noctx">foo</ac:inline-comment-marker> bar foo</p>`, result)
+}
+
+// TestMergeComments_UTF8 verifies that selections and bodies containing
+// multibyte UTF-8 characters are handled correctly.
+func TestMergeComments_UTF8(t *testing.T) {
+	body := "<p>こんにちは世界</p>"
+	oldBody := `<p>こんにちは<ac:inline-comment-marker ac:ref="uuid-jp">世界</ac:inline-comment-marker></p>`
+	comments := makeComments("世界", "uuid-jp")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>こんにちは<ac:inline-comment-marker ac:ref="uuid-jp">世界</ac:inline-comment-marker></p>`, result)
+}
+
+// TestMergeComments_SelectionWithQuotes verifies that a selection containing
+// apostrophes or double-quotes is found correctly in the new body even though
+// html.EscapeString would encode those characters. Only &, <, > should be
+// escaped when searching.
+func TestMergeComments_SelectionWithQuotes(t *testing.T) {
+	body := `<p>It's a "test" page</p>`
+	oldBody := `<p>It's a <ac:inline-comment-marker ac:ref="uuid-q">"test"</ac:inline-comment-marker> page</p>`
+	comments := makeComments(`"test"`, "uuid-q")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, `<p>It's a <ac:inline-comment-marker ac:ref="uuid-q">"test"</ac:inline-comment-marker> page</p>`, result)
+}
+
+// TestMergeComments_DuplicateMarkerRefDropped verifies that when multiple
+// comment results share the same MarkerRef and the selection cannot be found,
+// only a single warning is emitted (not one per result).
+func TestMergeComments_DuplicateMarkerRefDropped(t *testing.T) {
+	body := "<p>Hello world</p>"
+	// Duplicate refs, but selection "gone" is not present in body or oldBody.
+	comments := makeComments("gone", "uuid-dup2", "gone", "uuid-dup2")
+
+	result, err := mergeComments(body, body, comments)
+	assert.NoError(t, err)
+	assert.Equal(t, body, result) // body unchanged, single warning logged
+}
+
+// TestMergeComments_CDATASelection verifies that a selection inside a
+// CDATA-backed macro body (e.g. ac:code) is matched even though < and > are
+// stored as raw characters rather than HTML entities. The raw form is tried as
+// a fallback when the escaped form is not found.
+func TestMergeComments_CDATASelection(t *testing.T) {
+	// New body contains a code macro with CDATA — raw < and > in the content.
+	body := `<ac:structured-macro ac:name="code"><ac:plain-text-body><![CDATA[func foo() { return <nil> }]]></ac:plain-text-body></ac:structured-macro>`
+	// Old body has the marker around the raw selection inside CDATA.
+	oldBody := `<ac:structured-macro ac:name="code"><ac:plain-text-body><![CDATA[func foo() { return <ac:inline-comment-marker ac:ref="uuid-cdata"><nil></ac:inline-comment-marker> }]]></ac:plain-text-body></ac:structured-macro>`
+	// The API returns the raw (unescaped) selection.
+	comments := makeComments("<nil>", "uuid-cdata")
+
+	result, err := mergeComments(body, oldBody, comments)
+	assert.NoError(t, err)
+	// The raw selection "<nil>" should be found and wrapped with a marker.
+	assert.Equal(t, `<ac:structured-macro ac:name="code"><ac:plain-text-body><![CDATA[func foo() { return <ac:inline-comment-marker ac:ref="uuid-cdata"><nil></ac:inline-comment-marker> }]]></ac:plain-text-body></ac:structured-macro>`, result)
+}
--- a/util/cli.go
+++ b/util/cli.go
@@ -7,9 +7,9 @@ import (
 	"path/filepath"
 	"strings"

+	mark "github.com/kovetskiy/mark/v16"
 	"github.com/rs/zerolog"
 	"github.com/rs/zerolog/log"
-	mark "github.com/kovetskiy/mark/v16"
 	"github.com/urfave/cli/v3"
 )

@@ -111,10 +111,11 @@ func RunMark(ctx context.Context, cmd *cli.Command) error {
 		TitleAppendGeneratedHash: cmd.Bool("title-append-generated-hash"),
 		ContentAppearance:        cmd.String("content-appearance"),

-		MinorEdit:      cmd.Bool("minor-edit"),
-		VersionMessage: cmd.String("version-message"),
-		EditLock:       cmd.Bool("edit-lock"),
-		ChangesOnly:    cmd.Bool("changes-only"),
+		MinorEdit:        cmd.Bool("minor-edit"),
+		VersionMessage:   cmd.String("version-message"),
+		EditLock:         cmd.Bool("edit-lock"),
+		ChangesOnly:      cmd.Bool("changes-only"),
+		PreserveComments: cmd.Bool("preserve-comments"),

 		DropH1:          cmd.Bool("drop-h1"),
 		StripLinebreaks: cmd.Bool("strip-linebreaks"),
--- a/util/flags.go
+++ b/util/flags.go
@@ -194,6 +194,12 @@ var Flags = []cli.Flag{
 		Usage:   "Avoids re-uploading pages that haven't changed since the last run.",
 		Sources: cli.NewValueSourceChain(cli.EnvVar("MARK_CHANGES_ONLY"), altsrctoml.TOML("changes-only", altsrc.NewStringPtrSourcer(&filename))),
 	},
+	&cli.BoolFlag{
+		Name:    "preserve-comments",
+		Value:   false,
+		Usage:   "Fetch and preserve inline comments on existing Confluence pages.",
+		Sources: cli.NewValueSourceChain(cli.EnvVar("MARK_PRESERVE_COMMENTS"), altsrctoml.TOML("preserve-comments", altsrc.NewStringPtrSourcer(&filename))),
+	},
 	&cli.FloatFlag{
 		Name:    "d2-scale",
 		Value:   1.0,