Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenXRay/True Stalker does not support all Polish letters with Windows 1250 #44

Closed
genjonakasone opened this issue Jan 11, 2024 · 18 comments

Comments

@genjonakasone
Copy link
Collaborator

genjonakasone commented Jan 11, 2024

So I am currently working on a Polish translation for the True Stalker mod, and it looks like that - in contrary to what the readme says - Windows 1250 with "font_prefix = _cent" entry in localization.ltx file does not make the Polish letters to show up. So for example word "Wyjdź" in main menu is shown as "Wyjd ".

@genjonakasone
Copy link
Collaborator Author

genjonakasone commented Jan 11, 2024

I should correct myself - Windows 1250 supports Polish symbols but not in True Stalker/OpenXRay.

The "encoding=" entry in beginning of the xml file does not matter, symbols will be displayed the same way no matter if there's windows-1250, 1251, ISO 8859-2 etc.

Saving the file with Windows 1250 encoding in Visual Studio Code keeps the Polish-specific letters, like for example "ROZDZIELCZOŚĆ", as it should be, but in game it turns into "ROZDZIELCZOĆ", same happens to the letters "Ł" and "Ż". I've seen this thread in OpenXRay repo about UTF-8 and other encodings but does not bring any consensus.

Now it makes me think, is it possible that True Stalker is simply missing a font that has all Polish letters in it? Any recommendations on how to approach finding the original font, checking whether it's okay or not and eventually how to replace it? It's my first time working with OpenXRay.

I'm open for any advice.

@genjonakasone genjonakasone changed the title Windows 1250 does not support Polish letters OpenXRay/True Stalker does not support all Polish letters with Windows 1250 Jan 11, 2024
@genjonakasone
Copy link
Collaborator Author

Ok, I figured it out, the default font does not support some of Polish diacritics. A bandaid solution I did was to slap Stalker Gamma fonts into the True Stalker's gamedata and edited the fonts.ltx changing Roboto fonts to whatever from Gamma (in my case it's Letter font, ui_font_letter_XX_XXXX.cent.dds). I will keep translating the files and make a PR with just the xmls, regarding fonts I will leave it up to you.

@lehrax
Copy link
Contributor

lehrax commented Jan 12, 2024

Wow, that is great news! If there's something missing in the default fonts, let's add the font from Gamma for now (if it helps).

In the meantime I've been exploring the options regarding the organisation of the repository and came to a conclusion that storing source files in unicode is the cleanest possible way (which additionally benefits the contributors who prefer Web IDE, since it very much likes to screw with the non-UTF-8 encodings...)

Since, however, OpenXRay does not yet support it, I've designed a super basic converter that would prepare files respecting the needed encoding and place them into "releases".

Feel free to give it a try: just put your sources into gamedata_UTF-8 and in a few moments you can collect the 7z archive language pack in Polish with correct encoding and all the prefixes preconfigured.

Also pardon me for organising all that a little chaotically. Lots of ideas on how to improve stuff, not everything was ready from the start. Still some essential bits, such as validating XML files before packing them need to be implemented etc.

@genjonakasone
Copy link
Collaborator Author

After a few hours more of playing with Gamma fonts it still looks like there is something wrong, not sure if it's about wrong size of the DDS file or True Stalker reads the symbols in a different manner. I've heard OGSR community made a font generator to support other languages, here's a link for it, unfortunately I do not grasp the idea of how to properly use it - https://github.com/OGSR/Fonts_generator

I also found exported True Stalekr's XML files on C-Consciousness Discord server, they seem to be by default encoded in UTF-8 with encoding="UTF-8" parameter in XML and they work, so now I am not sure whether they really have to be transcoded into windows-125X. When I did it manually the game would not start, with these it just works. Archive includes the game's fonts too in case it would be possible to convert them to support more symbols than they do now.

TS_unpack.zip

@lehrax
Copy link
Contributor

lehrax commented Jan 12, 2024

After a few hours more of playing with Gamma fonts it still looks like there is something wrong, not sure if it's about wrong size of the DDS file or True Stalker reads the symbols in a different manner. I've heard OGSR community made a font generator to support other languages, here's a link for it, unfortunately I do not grasp the idea of how to properly use it - https://github.com/OGSR/Fonts_generator

I also found exported True Stalekr's XML files on C-Consciousness Discord server, they seem to be by default encoded in UTF-8 with encoding="UTF-8" parameter in XML and they work, so now I am not sure whether they really have to be transcoded into windows-125X. When I did it manually the game would not start, with these it just works. Archive includes the game's fonts too in case it would be possible to convert them to support more symbols than they do now.

TS_unpack.zip

Thanks. Will take a look at those later.

@genjonakasone
Copy link
Collaborator Author

I received fixed font files from the True Stalker's dev, attached them below. These will be added to the mod in the next patch, but its release date is unknown. I had to change the files encoding + xml headers encoding to ISO 8859-2, now all Polish diacritics are shown as they should be. Windows-125X does not display them at all, neither UTF-8, which makes me wonder since these encodings are supposed to support Polish language.
TS_font_update.zip

@lehrax
Copy link
Contributor

lehrax commented Jan 13, 2024

Interesting... So, it appears, whatever was in xml files' declaration string was wrong initially. Prefixes and all this mess. I hope, OpenXRay devs are able to figure out the unicode way of doing things soon

@lehrax
Copy link
Contributor

lehrax commented Jan 14, 2024

@genjonakasone, used some default strings from Call of Pripyat and compiled a Polish language pack demo with the fonts you provided:
image

Sources in testing branch: https://github.com/true-community/true-localisation/tree/testing/gamedata_UTF-8/configs/text/pol (you will see a lot of placeholders in game, since only default CoP strings are translated)
Final archive of gamedata: https://github.com/true-community/true-artifacts/tree/releases/localisation/testing

Does it look ok?

@genjonakasone
Copy link
Collaborator Author

genjonakasone commented Jan 14, 2024

The second from top option "Ustawienia jakoci" should be "Ustawienia jakości", it's missing the "ś" letter. Not sure if that's the font's fault or the encoding. Mine with ISO-8859-2 does not seem to escape or "loose" any Polish diacritics, albeit I haven't tried it with the CoP font.

First option in the selector should say "Pełne ośw. dynamiczne" so it's not displaying the characters either. When I open the ui.st.mm.xml file with Notepad++, it defaults to the header encoding (so windows-1251) and it looks like this:
image
So I'd say that at least for Polish they should be translated as ISO-8859-2 (because the diacritics don't turn into "shrubs") if there's no UTF-8 support.

@lehrax
Copy link
Contributor

lehrax commented Jan 14, 2024

Aaah, okay, I did not change the encoding in https://github.com/true-community/true-localisation/blob/staging/language_selector.json#L26 and the output was in 1250 again.
Let's try changing it this time and see if result is any different

@lehrax
Copy link
Contributor

lehrax commented Jan 14, 2024

Nope, with ISO-8859-2 the game does not even launch

@genjonakasone
Copy link
Collaborator Author

What if you change the file format from Windows CRLF to Unix LF? Does your game run if you replace the ui_st_mm.xml file with the one I have? Maybe not all files can be displayed in ISO-8859-2 because of some engine's quirk. Some files open as UTF-8, some as UTF-8-BOM by default for me.
ui_st_mm.zip

@lehrax
Copy link
Contributor

lehrax commented Jan 14, 2024

What if you change the file format from Windows CRLF to Unix LF

Highly doubt that would change much, but we could try that of course.


... or better yet, you can try it yourself :)
I've given you access to repository so go ahead and make necessary changes in the testing branch.

change the encoding in https://github.com/true-community/true-localisation/blob/staging/language_selector.json#L26

This file configures the encoding to convert sources into, and has prefixes, as you probably noticed.

And I suggest you to use VS Code for heavier editing as Notepad++ may be somewhat basic as IDE (not sure about now, but last time I checked it was).

Or, if you do not want to wait for pipeline to compile stuff, you could just edit the final gamedata and make a note on success here.

I need to call it a day. Need to have some rest before workday

@lehrax
Copy link
Contributor

lehrax commented Jan 14, 2024

image

Also the original files from CoP are already present in True Stalker's resources and, weirdly, those have west prefix O_o

@lehrax
Copy link
Contributor

lehrax commented Jan 14, 2024

So my only assumption here is that font is to blame. Didn't have a chance to try the font compiler you suggested, maybe that would solve all issues.

But, off the record, my brutally honest thought would be some unbroken things should remain unfixed. The game wouldn't turn out any worse with original fonts. I actually would never have noticed they were replaced at all, but here we are looking for a fix for the fix 😅

@genjonakasone
Copy link
Collaborator Author

genjonakasone commented Jan 14, 2024

I tried multiple encodings on the test files from the polski.7z archive you uploaded. I have written the sentences how they should be displayed with red font for comparison with what True Stalker displays. Here are the results:

  • windows-1251
    image

  • windows-1250
    image

  • iso-8859-2
    image

So the last one is the only one that gets the symbols correct. Not sure why the game wouldn't start with ISO on your PC, unless the headers were set wrong or maybe the compatibility is added with one of the four patches that were released so far. I've seen Czech translation posted on ModDB, it's also encoded in iso-8859-2 and it also worked for me, albeit they used the eng folder instead of custom "ces" or however it would be called. Chinese translation is encoded in UTF-8, but they used the compiler, and apparently the engine "will switch to unicode mode when texts are saved in UTF-8 and there are multi-byte fonts available".

And it is font's issue indeed. I do not know how the DDS letter-finding system works and despite me trying I couldn't get the compiler to work.

@lehrax
Copy link
Contributor

lehrax commented Jan 15, 2024

the engine "will switch to unicode mode when texts are saved in UTF-8 and there are multi-byte fonts available"

If that is so, getting the correct font is a solution to all the problems and prefixes would not even be needed anymore!

@lehrax
Copy link
Contributor

lehrax commented Jan 19, 2024

@genjonakasone, now it builds correctly UTF-8 > ISO-8859-2, I had to edit the encoding converter command a bit.
You were right about the encoding's choice and also the fonts did help, I think.

image

Anyway, to get the language pack ready we should add a bunch of strings :)
I am thinking of automating the filling of missing strings via the DeepL or alternative but need to investigate the subject.
Closing the issue 🥳

@lehrax lehrax closed this as completed Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants