Questions tagged [character-encoding]
A character encoding system consists of a code that pairs each character from a given repertoire with something else — such as a bit pattern, sequence of natural numbers, octets, or electrical pulses — in order to facilitate the transmission of data (generally numbers or text) through telecommunication networks or for data storage.
308 questions
23 votes
2 answers
3k views
Running a BAT file and accidentally finding obscure Chinese poem
Accidentally found obscure Chinese poem (BAT file) Something really weird just happened. I ran this in a batch file: wmic timezone get caption>>tmp_ist.bak time/date>>tmp_ist.bak This ...
0 votes
0 answers
15 views
How to grep on accented vowels? [duplicate]
I have a logfile, written in French, containing the following information: Nouvelles données [Status : 32 I was looking for all possible values of the mentioned status, but I didn't find anything: ...
0 votes
1 answer
725 views
UTF-8 Decoders fail to decode the encoded strings
I have some encoded values values which I believe is UTF-8. Now I dont really know if it is UTF-8 or not because other online tool and steps to decode UTF-8 is not working, BUT an open source tool ...
0 votes
1 answer
1k views
notepad is displaying txt file contents as weird symbols
I am writing integer values into file from kotlin (kotlin Int type) using something like this var1BufferedWriter?.write(String.format("%d\n", intvar ) ) var2BufferedWriter?.write(String....
2 votes
1 answer
375 views
VIM uses wrong encoding - but only in status messages
I ran into a strange issue with my ArchLinux setup. Vim uses correct encoding for reading/displaying files but these status messages (which displays the current mode or reports back when the buffer is ...
5 votes
1 answer
2k views
How to grep search for text in an ISO-8859-1 encoded file?
I'm attempting to use grep to search for text patterns from an ISO-8859-1 encoded file: https://github.com/jfoclpf/words-pt When I execute a search, all of the matches are returned, but the accented ...
4 votes
1 answer
5k views
How to identify a file encoding?
I'm trying to figure out the encoding of a text file. I did try a lot of the common ones (with Notepad++), but I've failed so far. A few hints: The file was originally an Eudora mbx file, with mostly ...
3 votes
0 answers
474 views
ffmpeg printing unknown glyph between characters on utf-8 subtitle
FFmpeg is printing unknown glyphs between some two characters but the weird thing is it's not replacing a character that doesn't exist in the font, it's just printing a new unknown glygh and I can't ...
0 votes
0 answers
3k views
How to change Excel character set?
I have an automatically-generated CSV file which contains accented characters. These appear fine when the file is opened with Notepad++. However, accented characters appear mangled in Excel (e.g. é ...
0 votes
1 answer
3k views
Wrong character encoding in ssh session – but not for all connectios
I have an odd issue when connecting to my (Ubuntu) server via SSH. If I connect from my Gentoo box, all is fine. All Umlauts etc. work, I can type "ÄÖÜ" and so on. If I do the same from my ...
0 votes
1 answer
1k views
How to read Linux text files in Windows system?
For example, I run the top command and store it to a file in Linux, after that I open that file in Windows it contains some gibberish. Here is the file viewed in Notepad++: The option to convert to ...
1 vote
1 answer
557 views
does batch program support any ascii characters or is there's a way to create encoder and decoder program with another tools?
i wrote batch encoder and decoder from adrianvdh and customize some of the text string input, but the decoder one aren't working, because i put special symbols inside there. here's the string of the ...
0 votes
1 answer
332 views
AWK: "invalid regexp: Invalid collation character" -- how do I make it valid?
I have an awk script that must process millions of records, but I need to remove any containing a multibyte character. In one environment where I work, the following simplified shell sequence ...
1 vote
0 answers
632 views
How to use ISO8859-9 encoding in terminal?
I maked a file containing "ırmak" with a text editor via encoding ISO8859-9. Then, I tried to print the content with "cat" command in the terminal. But I could not. I use the ...
1 vote
1 answer
1k views
Convert Korean files that are showing up incorrectly to utf-8 - character shows Çѱ¹Ÿî
I was just about to ask this after a long time of searching so decided to answer my own question... I downloaded Korean subtitles in an .smi file that was in zip archive. When I extracted it, the ...
0 votes
2 answers
1k views
Restoring corrupted UTF-8 files
After my PC broke down I managed to make a backup of the relevant files before reinstalling Windows. Now that I'm restoring those files and setting the system up I noticed that some of the files got ...
3 votes
1 answer
4k views
How can I set my system's default encoding to UTF-16?
My daily activity involves usage of English, French, Spanish, and when I save personal copies of web pages or other documents, the full character range of those languages finds its way into filenames ...
4 votes
2 answers
18k views
My text file is riddled with question marks. How can I make it readable?
When I open one of my text files in Visual Studio Code, the text contains a lot of question marks where I had expected to see Swedish letters, such as å, ä, ö : ^ click to enlarge Down to the right (...
1 vote
0 answers
121 views
Mounting .iso image in Linux Mint with incorrect filename encoding
I am trying to mount some ISO files but have filename encoding issue. The .iso file in question is here: https://archive.org/download/cpcfan-200510b/200510B.iso Mount command and incorrect filename: $ ...
2 votes
1 answer
815 views
How to restore Arabic text of very old emails
I have an old Yahoo mail account, and I have very old emails since 2006 written in Arabic, but the encoding of these emails looks very weird, something like: Óæì Ø *Ý æÚæÏß ßáÇã Ý * ßáÇã. ÇÍÈß ÊÕæÑ ...
1 vote
1 answer
857 views
How does Git for Windows' cat.exe deal with charset encoding?
I'm testing the behaviour of Windows terminal (cmd.exe) in relation to charset encodings. I have some test files in several encodings (Win1252, CP437, UTF-8, etc) with the Spanish text: "qué tal&...
0 votes
1 answer
3k views
Using iconv and file - how do I change an incorrect character encoding setting?
I have a folder full of ASCII text files that have the file information set so text editors on the mac believe they are Turkish. The original notes in the folder claim it is Windows Latin 1 (Turkish ...
0 votes
1 answer
1k views
Printf in gawk with the correct encoding?
I'm wondering: can gawk printf in any format besides ASCII? Currently, I'm using gawk match() to search through some UTF-8 text. When I go ahead and print out the matches gawk finds, it ends up like ...
2 votes
2 answers
2k views
How to decompress an encrypted zip file with ANSI encoded password?
I need to decompress zip files generated in Windows with Japanese language. I'm using unzip. If I use unzip files.zip I will get bad file names. So, according to this question, I used unzip with -O ...
2 votes
1 answer
2k views
Change default text encoding in Excel 2019?
How can I change the default text encoding in Excel 2019? I already tried this hack/workaround using the registry editor https://superuser.com/a/1179248/1468612 ( based on this: http://www.lukemiller....
13 votes
7 answers
16k views
Why does UTF-16 have big or little endian but UTF-8 doesn't?
UTF-16 uses 2 bytes for one character, so it has big or little endian difference. For example, the character 哈 is 54 C8 in hex. Its UTF-8 representation therefore is: 11100101 10010011 10001000 UTF-8 ...
1 vote
0 answers
124 views
How to fix accentuation encoding with cmd.exe running inside bash?
I installed https://www.msys2.org/ and setup an ssh server for it. With this I can connect to my machine and work remotely. The problem is that some application as Visual Studio tools or windows ...
1 vote
2 answers
3k views
Does clipboard change character encoding?
I am having a UTF8-related issue in a piece of software - it complains some text containing special characters isn't valid UTF8. But whenever I copy-paste it into an online validation tool or into an ...
0 votes
1 answer
935 views
Why does copying text between Notepad++ files create files with different bytes?
I've created a simple pdf [hi.pdf] with the word hi and when I open it in Notepad++, its encoding is ANSI, which I assume is Notepad++'s best guess, with it opening successfully when I Save as ...
0 votes
1 answer
488 views
convert character set (German)
I have a text file that uses various characters in the 128+ range in currently non-standard ways. The file command just says Non-ISO extended-ASCII. From the context I can recognise these: Octal 201: ...
1 vote
2 answers
1k views
Incorrect display of Chr() function return
I have got a new work PC (it has a higher build of Windows 10, the same highest version of O365), configured it pretty much the same as my previous one (exactly like that in terms of region and ...
0 votes
0 answers
997 views
What is this for a file name encoding and how to fix it?
On my Linux machine I found old files (at least from 2004 if not older), so possibly Win9x days. Maybe they came over some old FAT drive on my disk or some old Samba share. Umlaute are very weirdly ...
1 vote
1 answer
923 views
Unicode compatible devnagari font
The pages on the following URL used to render correctly last time I checked. http://www.cfilt.iitb.ac.in/marathi_Corpus/aesthetics/literatureBio_and_autobio/ahe_manohar_tari/BA00B005-1112_utf.txt Now ...
1 vote
1 answer
209 views
How to prevent Safari from implicitly converting character in XHR request?
I picked this character 〉 as a separator for my combo-key-field for my DynamoDb database. That character surfaces in the browser as part of a next-page-query token. (in an endless scroll list view) ...
0 votes
1 answer
359 views
WinSCP Displays Weird Characters in script output file
I'm viewing the outputs of a bash script that logs into juniper devices and runs commands. When I view the output of the script via the unix server using cat <filename> it appears just fine. ...
4 votes
3 answers
27k views
How to display special non-ASCII Unicode characters in Notepad++? (regardless if file saved and reopened)
When bookmarking a URL like: https://addons.mozilla.org/firefox/addon/youcare-search/ You get this bookmark title: YouCare - The charitable search engine – Get this Extension for 🦊 Firefox Then, if ...
0 votes
2 answers
2k views
Finding the encoding of a text file containing weird characters
I recently received a file, of Turkish origin, where the file has some English words which I can easily read, and some weird characters. I wonder if this file is encoded, encrypted or sth else. I ...
1 vote
0 answers
78 views
Word 365 character encoding annoying
When I open some text files with Microsoft Word 365 (v2001 build 12430.20184 - 16.0.12430.20172) it asks me to choose the characters encoding. My question is: How to clean the original file to let ...
0 votes
0 answers
70 views
Is URL encoding guarranteed to output printable characters only?
I refer to the URL encoding used in Burp and in the browsers Is URL encoding guaranteed to output printable characters only?
1 vote
2 answers
7k views
Windows 10 Missing Left Double Quotation Mark Glyph in Character Map
This is so weird. I am simply trying to type an open double quotes symbol in Windows 10 that should look like this: Actually, I'm trying to replace wrongly interpolated quotes symbols in an ANSI ...
1 vote
0 answers
443 views
Wireshark - Don't mask non-printable characters (Windows)
Is there any way to have Wireshark running under Windows not mask non-printable characters in the packet view on the bottom? Currently, any characters < 0x20 and >= 0x7f are shown as . This ...
0 votes
1 answer
248 views
BER decode SubjectAltName and CHOICE?
I'm having trouble working out the syntax when decoding a SubjectAltName in a TLS self-signed certificate. I believe the certificate is well formed. The trouble is, I don't understand how to decode ...
5 votes
1 answer
442 views
Can I tinker with the encoding when using pdftotext to convert PDF to text?
Sometimes when I do pdftotext it results in perfect text. I assume this is because the actual unicode text data is embedded directly in the PDF itself, and simply read out. But other times (around ...
0 votes
0 answers
651 views
Readable text when copy-pasted from PDF becomes completely unreadable
What is this that when I select a completely readable, Latin (no language-specific characters, actually a company name and street address) text in PDF document and then copy and paste it to any ...
1 vote
4 answers
297 views
Which ANSI standard is Joel referring to?
I was rereading Joel Spolsky's classic blog post The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) and noticed this passage: ...
2 votes
1 answer
3k views
In cmd.exe with codepage 437, why are characters whose "number" is greater than 127 prepended with a ┬ when using type?
I have a file that contains some characters whose "number" is greater than 127. If I use type file.txt to display the content of the file in a cmd.exe console whose code page (chcp) is set to 437, ...
8 votes
1 answer
2k views
How to make Chrome URL display spaces instead of %20
Browsers like Firefox display URLs containing spaces (including nbsp's) with an actual space (); Chrome always displays spaces as %20 (and nbsp's as %C2%A0) in the address bar. (ie, Firefox displays ...
0 votes
1 answer
323 views
NetTerm shows characters instead of lines
My issue is almost the same as Why does YaST now show lines as lqqqqqqqqqqqqqqq? , but instead of PuTTY and YaST, I'm using NetTerm and MLS (= WMS). My lines are also shown in characters as you can ...
1 vote
1 answer
3k views
How to fix the character encoding problems after using filezilla?
I have the following problem: I used the filezilla application to get the files of a page through FTP to backup them in case it needs to be restored later. If it were to restore the site with the ...
2 votes
1 answer
433 views
How to send an e‑mail to an address with Latin9/iso‑8859‑15 characters inside the username part of the address?
As part of finding a job, I need to send an e‑mail to an address which contains latin letters with accents inside the username. I know this is not standard, but they did it and there’s less than 1000 ...