Skip to content

Conversation

@diraneyya
Copy link

@diraneyya diraneyya commented Nov 8, 2025

Problem

The block cursor in Vim mode breaks the visual continuity of Arabic connected characters, making text editing confusing and difficult for Arabic users.

Issue 1: Character Breaking

❌ Cursor breaking letter connection ✅ Cursor not breaking letter connection
image image

The opaque cursor background covers Arabic letters, disrupting their connected forms. Arabic letters change shape based on position (isolated/initial/medial/final), and these visual connections are essential for readability.

When the cursor is on a character like ن in the middle of a word, the character appears in its connected (medial) form in the actual text, but the cursor div contains the isolated form, creating visual disruption.

Issue 2: Incorrect Width

ا كـ ـتـ  ـب مسافة
image image image image image

The cursor width was based on the isolated form of characters rather than their actual rendered width in connected text:

  • Letter ن when connected (narrow): cursor was too wide with extra space
  • Letter ك in different positions: cursor didn't match actual glyph width
  • End of line: cursor extended far beyond the character

Issue 3: Newline Selection

In normal mode, the cursor could be positioned on newline characters at the end of lines (unlike standard Vim behavior where $ positions on the last character).

Solution

1. Transparent Cursor with Outline

  • Changed from opaque background: #ff9696 to background: transparent with box-shadow: 0 0 0 1px #ff9696 outline
  • Arabic letters remain visually connected underneath the cursor
  • Letter inside cursor div is always transparent (partial: true) to avoid covering the actual text

2. DOM-Based Width Measurement

  • Use browser's Range.getBoundingClientRect() to measure actual rendered character width
  • Captures the true width after Arabic text shaping is applied by the browser
  • Works correctly for all contextual forms (isolated/initial/medial/final)
  • Saves original DOM position before traversal to ensure accurate measurement

3. Smart Newline Handling

  • Narrow cursor (15% of font size) for newline characters
  • In normal mode, auto-adjust cursor from end-of-line newline to last real character
  • Preserve cursor on empty lines (consecutive newlines)

Technical Changes

  • Added width property to Piece class for explicit width control
  • Apply measured width via elt.style.width in adjust() method
  • Save original domAtPos before DOM traversal for accurate measurement
  • Use Range API to measure individual character width from text nodes
  • Force transparent letter rendering to avoid covering underlying text
  • Detect and handle newline positioning in normal mode

Testing

Tested with Arabic text in various scenarios:

  • Connected characters in middle of words (ـنـ, ـكـ, etc.)
  • Characters at beginning/end of words
  • Mixed Arabic and English text
  • End of line positioning
  • Empty lines
  • Mouse click positioning vs keyboard navigation ($, h, l)

All cursor positioning now matches standard Vim behavior while correctly handling Arabic character shaping.

Impact

This fixes a major usability issue for Arabic language users, making the Vim mode cursor work correctly with Arabic's connected writing system. The solution properly handles complex text shaping without breaking visual character connections.

This commit improves the block cursor behavior for Arabic text, where
connected characters were being broken by the cursor overlay.

## Problems Fixed

1. **Character Breaking in Arabic**: The block cursor used an opaque
   background that covered characters, breaking visual continuity of
   connected Arabic letters. In Arabic, letters change shape based on
   their position in a word (isolated/initial/medial/final forms), and
   the cursor was disrupting these connections.

2. **Incorrect Width Calculation**: The cursor width was based on the
   isolated form of characters placed inside the cursor div, not the
   actual rendered width in connected text. This caused misalignment
   where narrow connected forms appeared in wide cursor boxes.

3. **Newline Cursor Issues**:
   - Wide cursor boxes appeared at end of lines
   - In normal mode, cursor could be positioned on newline characters
     (inconsistent with Vim behavior where $ positions on last character)

## Solutions Implemented

1. **Transparent Cursor with Outline**: Changed from opaque background
   to transparent background with box-shadow outline, allowing underlying
   text to show through naturally without breaking character connections.

2. **DOM-Based Width Measurement**: Calculate actual character width by
   measuring the rendered glyph using Range.getBoundingClientRect(). This
   captures the true width of characters after browser text shaping,
   including Arabic contextual forms.

3. **Smart Newline Handling**:
   - Use narrow cursor (15% of font size) for newline characters
   - In normal mode, automatically adjust cursor position to last real
     character when on end-of-line newline (matching Vim $ behavior)
   - Preserve cursor on empty lines (consecutive newlines)

## Technical Details

- Added `width` property to Piece class for explicit width control
- Save original DOM position before traversal for accurate measurement
- Use Range API to measure individual character width from text nodes
- Force transparent letter rendering to avoid covering underlying text
- Distinguish between end-of-line newlines and empty line newlines

## Impact

This fixes a major usability issue for Arabic language users, making the
Vim mode cursor behavior work correctly with Arabic's connected writing
system while properly handling complex text shaping.

Fixes visual character breaking in Arabic text editing.
style.fontFamily, style.fontSize, style.fontWeight, style.color,
primary ? "cm-fat-cursor cm-cursor-primary" : "cm-fat-cursor cm-cursor-secondary",
letter, hCoeff != 1)
letter, true) // Always use transparent letter to preserve RTL character connections
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To preserve intra-word "letter connection" in connected scripts like Arabic (this fix has nothing to do with RTL per se so this is an inaccurate comment)

@nightwing
Copy link
Collaborator

  • If text inside cursor is transparent, why do we want to keep it?
  • with this change there is no difference between focused and unfocused states
  • why "Narrow cursor for newline characters" is needed? vim doesn't seem to do that.

Overall this seems to make experience with non-connected scripts worse, so maybe we can decide behavior based on the character under cursor?

@diraneyya
Copy link
Author

I agree that there is no difference between

  • If text inside cursor is transparent, why do we want to keep it?

Keep what? Keep the text? Keep the cursor. I do not understand what you mean by this question.

  • with this change there is no difference between focused and unfocused states

You are absolutely right. It is also harder to spot the cursor when the fill is gone. I am contemplating how to improve this while maintaining function in connected scripts like Arabic.

  • why "Narrow cursor for newline characters" is needed? vim doesn't seem to do that.

It is not necessary. What is problematic and used to happen without some of the code introduced here is that someone could use the mouse to select the new line character at the end of the line, which led to discrepancy since clicking $ would move the cursor to the last character (and won't be able to access this new line character, not to mention having the cursor there.

Overall this seems to make experience with non-connected scripts worse, so maybe we can decide behavior based on the character under cursor?

Genuinely good idea. Let me think about it. There might be a better way to do this that works with connected scripts like Arabic.

Implements Phase 1 of the dual-cursor system architecture.

This adds utilities to detect the script type (Latin, Arabic, connected
scripts) of characters based on Unicode ranges. The detection is used
to determine appropriate cursor rendering strategies:

- Latin text: standard opaque Vim cursor
- Arabic/connected scripts: dual-layer cursor (word block + char outline)

Features:
- detectScriptType(): Detects script from Unicode ranges
- isNeutralChar(): Identifies neutral characters (spaces, numbers, punctuation)
- detectScriptTypeWithContext(): Context-aware detection for neutral chars

Supported scripts:
- Arabic (U+0600–U+06FF and related ranges)
- Syriac (U+0700–U+074F) - connected RTL
- N'Ko (U+07C0–U+07FF) - connected RTL
- Hebrew (U+0590–U+05FF) - RTL but not connected, uses standard cursor

Performance: O(1) Unicode range checks, suitable for per-keystroke execution.

Related to replit#248
Implements Phase 2 of the dual-cursor system architecture.

This adds utilities to find word boundaries in connected scripts like
Arabic. Word boundaries are defined by transitions:
- FROM: non-Arabic TO: Arabic (word starts)
- FROM: Arabic TO: non-Arabic (word ends)

For example, in "TOOمودا", the word "مودا" has clear boundaries at
the transition points.

Features:
- findArabicWordBoundaries(): Finds start/end positions of connected word
- Expands from cursor position until non-Arabic characters
- Performance optimized with MAX_WORD_SEARCH_RANGE = ±50 characters

Algorithm:
1. Start from cursor position
2. Expand leftward while on Arabic/connected characters
3. Expand rightward while on Arabic/connected characters
4. Return {start, end, text} with absolute document positions

Used for rendering word-block layer of dual cursor in Arabic text.

Performance: O(n) where n ≤ 100 characters, suitable for real-time rendering.

Related to replit#248
Implements Phase 3 of the dual-cursor system architecture.

This modifies the cursor rendering to detect script type and apply
appropriate visual treatment:

- Latin/non-connected scripts (focused): Opaque text with solid background
  (restores standard Vim block cursor behavior)
- Arabic/connected scripts (focused): Transparent text with solid background
  (preserves visual character connections in RTL)
- Any script (unfocused): Transparent text with outline only

Changes:
- Import detectScriptTypeWithContext() from script-detection module
- Add script detection and focus state checking in measureCursor()
- Set partial parameter based on script type and focus state

This addresses maintainer feedback on replit#248 about
restoring standard Vim cursor behavior for Latin text while maintaining
special handling for Arabic connected characters.

Performance: Adds single O(1) script detection per cursor render.

Tested: ✅ Latin letters show white text in cursor (opaque)
Tested: ✅ Arabic letters are invisible in cursor (transparent)
diraneyya added a commit to diraneyya/codemirror-vim that referenced this pull request Nov 10, 2025
Implements Phase 4 of the dual-cursor system architecture.

This adds hierarchical dual-cursor rendering for Arabic text:
- Word-level block: Semi-transparent pink background covering entire connected word
- Character-level outline: White 1px outline on specific letter under cursor

Changes:
- Add CursorLayerType enum for different cursor rendering strategies
- Extend Piece class with layerType parameter
- Modify measureCursor() to return Piece[] for multi-layer rendering
- Add measureArabicDualCursor() function for dual-layer measurement
- Update CSS theme with Arabic-specific cursor styles
- Refine script detection to exclude only punctuation (not diacritics)
- Ensure spaces/whitespace always treated as word boundaries
- Fix neutral character detection: inherit script type but not special cursor
- Only show dual-cursor for connected words (2+ Arabic characters)
- Fix character positioning using coordsForChar for accurate RTL placement

Visual design:
- Focused Arabic (connected word): Semi-transparent pink word block + white char outline
- Focused Arabic (isolated char): Standard transparent cursor
- Focused Latin: Solid pink block with white text (opaque)
- Focused neutral (punctuation, numbers): Standard transparent cursor
- Unfocused: Pink outline for all (character outline hidden for Arabic)

Performance: Word boundary detection O(n) where n ≤ 100 characters

Tested: ✅ Dual-cursor renders correctly on Arabic connected words
Tested: ✅ Word boundaries respect punctuation and spaces
Tested: ✅ Navigation (hjkl) tracks correctly through Arabic words
Tested: ✅ Single isolated Arabic characters use standard cursor
Tested: ✅ Neutral characters (# punctuation) use standard cursor
Tested: ✅ Character outline positioned correctly within word block

Related to replit#248
@diraneyya diraneyya force-pushed the fix/cursor-arabic-connected-characters branch from ae1310a to c9dd524 Compare November 10, 2025 16:02
@diraneyya
Copy link
Author

diraneyya commented Nov 10, 2025

Alternative: Dual-cursor concept for Arabic

Position Focused  Unfocused 
1 image  image
2 image image
3 image image
4 image image
5 image image
6 image  image

What do you think? @nightwing

Implements Phase 4 of the dual-cursor system architecture.

This adds hierarchical dual-cursor rendering for Arabic text:
- Word-level block: Semi-transparent pink background covering entire connected word
- Character-level outline: White 1px outline on specific letter under cursor

Changes:
- Add CursorLayerType enum for different cursor rendering strategies
- Extend Piece class with layerType parameter
- Modify measureCursor() to return Piece[] for multi-layer rendering
- Add measureArabicDualCursor() function for dual-layer measurement
- Update CSS theme with Arabic-specific cursor styles
- Refine script detection to exclude only punctuation (not diacritics)
- Ensure spaces/whitespace always treated as word boundaries
- Fix neutral character detection: inherit script type but not special cursor
- Only show dual-cursor for connected words (2+ Arabic characters)
- Fix character positioning using coordsForChar for accurate RTL placement

Visual design:
- Focused Arabic (connected word): Semi-transparent pink word block + white char outline
- Focused Arabic (isolated char): Standard transparent cursor
- Focused Latin: Solid pink block with white text (opaque)
- Focused neutral (punctuation, numbers): Standard transparent cursor
- Unfocused: Pink outline for all (character outline hidden for Arabic)

Performance: Word boundary detection O(n) where n ≤ 100 characters

Tested: ✅ Dual-cursor renders correctly on Arabic connected words
Tested: ✅ Word boundaries respect punctuation and spaces
Tested: ✅ Navigation (hjkl) tracks correctly through Arabic words
Tested: ✅ Single isolated Arabic characters use standard cursor
Tested: ✅ Neutral characters (# punctuation) use standard cursor
Tested: ✅ Character outline positioned correctly within word block

Related to replit#248
@diraneyya
Copy link
Author

for more context: Zettlr/Zettlr#6004

@mumendiraneyya
Copy link

Just to clarify: this PR is a working prototype to demonstrate functionality, not a merge request. Once we agree on an approach that works well for Arabic users and fits the project, I'm happy to do a proper implementation following your architectural and style guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants