Implement utf8 support and solve #8 issue#47
Implement utf8 support and solve #8 issue#47Melnytskyi wants to merge 11 commits intolionkor:masterfrom
Conversation
lionkor
left a comment
There was a problem hiding this comment.
Thank you very much! Please see my comments. I will code review in detail and test on all supported platforms soon, as well.
src/backends/InteractiveBackend.cpp
Outdated
There was a problem hiding this comment.
would it be reasonable to use a library to handle UTF-8?
There was a problem hiding this comment.
Yes, there's a header-only library that might suit this repo. At first, I thought this library was too big for only three functions. Mb just add this lib as a Git submodule? Or some other lib?
There was a problem hiding this comment.
I tried this library and changed my mind, because it checks every character and throws exceptions on non-valid input and that needs to be handled. I have no idea how to handle such things in a separate thread for console reading and whether they make sense in the console. That's why I decided to improve the current functions so that they just return values that will definitely not lead to an out of range.
If it is not suitable, I can redo it.
There was a problem hiding this comment.
Okay, thanks for looking into it! I'll read up on UTF-8 so I can be sure that this is conformant, but it looks good. Thank you!
There was a problem hiding this comment.
I genuinely have not had the time or motivation to understand the unicode utf 8 spec to a point where i can merge this code without concerns :/
- Fix UTF8 handling for `current_view_cursor_pos` and `current_view` functions
- Move UTF8 handling to a separate header file
- Rename and document UTF8 functions
- Fix `get_terminal_width` function, not correctly using CONSOLE_SCREEN_BUFFER_INFO structure
There was a problem hiding this comment.
Pull Request Overview
This PR implements UTF-8 character support for the interactive terminal backend and fixes issue #8 related to terminal width calculation. It also removes redundant newline/carriage return characters at the end of commands.
- Adds UTF-8 character handling with new functions for reading multi-byte characters
- Separates cursor position tracking into byte position and visual character position
- Fixes terminal width calculation bug (was returning height instead of width)
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/windows_impl.cpp | Adds UTF-8 console setup, new UTF-8 character reading function, fixes terminal width bug |
| src/linux_impl.cpp | Implements UTF-8 character reading for Linux platform |
| src/impls.h | Adds declaration for new UTF-8 character reading function |
| src/backends/utf8.hpp | New utility functions for UTF-8 string manipulation and length calculation |
| src/backends/InteractiveBackend.h | Adds new cursor position tracking member and multi-byte character method |
| src/backends/InteractiveBackend.cpp | Integrates UTF-8 support throughout interactive input handling |
| CMakeLists.txt | Updates CMake and C++ standard versions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
I requested an AI review here to speed things up. I'll still do a manual review. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Make the UTF-8 character check consistent with other UTF-8 functions Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- and also apply copilot's recommendation
|
Should I increase the CMake version to 3.5? Or higher? For 3.5 there's also a warning: [cmake] Compatibility with CMake < 3.10 will be removed from a future version of
[cmake] CMake. |
|
I don't see why we wouldn't pull it up to 3.10 or so. That's from 2018. I think we can safely use that ;) |
|
Why are we moving to C++23? From C++11, that's quite the jump. EDIT: Nevermind, already changed back. My bad. |
|
Other than that, this looks pretty good, nice clean code in the utf8 file now. Good job! I'll do a thorough code-review and some testing still. |
This PR implements UTF-8 handling and, by the way, resolves #8 issue.
Also solves issue with redundant
\nor\rcharacters at the end of commands.Linux implementation not checked.