Python Forum
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character
#1
hello. I donit know what to to with this error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character maps to <undefined>
This is the Python Code:

import fileinput import glob import os import re with open('c:\\Folder6\\merged.txt', 'w', encoding='UTF-8') as f: for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\*.txt'))): f.write(line) fileinput.close() print(f)
And this is the ERROR:

Traceback (most recent call last): File "E:\Carte\BB\17 - Site Leadership\alte\Ionel Balauta\Aryeht\Task 1 - Traduce tot site-ul\Doar Google Web\Andreea\Meditatii\Sedinta 31 august 2022\merge txt - versiune 3 .py", line 8, in <module> for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\*.txt'))): File "C:\Program Files\Python39\lib\fileinput.py", line 256, in __next__ line = self._readline() File "C:\Program Files\Python39\lib\fileinput.py", line 389, in _readline return self._readline() File "C:\Program Files\Python39\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character maps to <undefined>
This is a print screen:

[Image: McUwGS.jpg]

What can I do, as not to apear that error again? Can anyone help me?
Reply
#2
You should set the encoding when you read the file (fileinput). Windows must thing it is something other than utf-8.
Reply
#3
hello, sir. Thank you for answer.

Can you modify my script as to work with your solution? I don't know Python very good...I am a beginner.

I don't know if this is good, as I make it now. Doesn't do nothing...but have no error..

import fileinput import glob import os import re def read_text_from_file(file_path): with open(file_path, encoding='utf8') as f: text = f.read() return text def write_to_file(text, file_path): with open(file_path, 'wb') as f: f.write(text.encode('utf8', 'ignore')) with open('c:\\Folder6\\translated\\merged.txt', 'w', encoding='UTF-8') as f: for file_name in sorted(glob.glob('c:\\Folder6\\translated\\*.txt')): contents = read_text_from_file(file_name) f.write(line) fileinput.close() print(f)
OR, SECOND VERSION:

import fileinput import glob import os import re with open('c:\\Folder6\\translated\\merged.txt', 'w', encoding='UTF-8') as f: current_content = f.read() modified = new_content != current_content if modified and args.diff: for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\translated\\*.txt'))) : f.write(line) fileinput.close() print(f)
OR, 3' SOLUTION

import fileinput import glob import os import re read_files = sorted(glob.glob("c:\\Folder6\\translated\\merged.txt\\*.txt")) with open("c:\\Folder6\\translated\\merged.txt", "wb") as outfile: for f in read_files: with open(f, "rb") as infile: outfile.write(infile.read()) fileinput.close() print(f)
None of them works. It creates the file, but does not write it
Reply
#4
I found a solution:

import fileinput import glob import os import re def read_files(file_path): with open(file_path, encoding='utf8') as f: text = f.read() return text def read_files(text, file_path): with open(file_path, 'rb') as f: f.write(text.encode('utf8', 'ignore')) read_files = sorted(glob.glob("c:\\Folder6\\translated\\*.txt")) with open("c:\\Folder6\\translated\\merged.txt", "wb") as outfile: for f in read_files: with open(f, "rb") as infile: outfile.write(infile.read()) outfile.write(b"\n\n") fileinput.close() print(f)
Reply
#5
In your first code it would be like this.
import fileinput import glob import os with open('c:\\Folder6\\merged.txt', 'w', encoding='UTF-8') as f: for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\*.txt')), encoding="utf-8"): print(line) f.write(line) fileinput.close()
This need Python 3.10 to work as in fileinput doc
Quote:Changed in version 3.10: The keyword-only parameter encoding and errors are added.
Melcu54 likes this post
Reply
#6
(Sep-26-2022, 09:09 AM)snippsat Wrote: In your first code it would be like this.
import fileinput import glob import os with open('c:\\Folder6\\merged.txt', 'w', encoding='UTF-8') as f: for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\*.txt')), encoding="utf-8"): print(line) f.write(line) fileinput.close()
This need Python 3.10 to work as in fileinput doc
Quote:Changed in version 3.10: The keyword-only parameter encoding and errors are added.

ok, thanks. But if I want to put an [b]f.write("\n\n") in order to have a dividing line between the files, where should I put it?[/b]
Reply
#7
(Sep-26-2022, 09:25 AM)Melcu54 Wrote: But if I want to put an [b]f.write("\n\n") in order to have a dividing line between the files
Change line 8:
f.write(f'{line}\n\n')
Reply
#8
(Sep-26-2022, 09:38 AM)snippsat Wrote: Change line 8:
f.write(f'{line}\n\n')

I try also this. But, in this case, will double all my lines from all text files, into one file.

See the duplicate lines after using your code (is better with f.write('\n')) , except this will put a new empty lines between all paragraphs)

[Image: zCgDSZ.jpg]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  "with open(os.path.join(args.input_dir, 'charmap.pickle'), 'rb') as f: IndentationErr uek67 13 987 Dec-12-2025, 01:13 PM
Last Post: uek67
  ASCII-Codec in Python3 [SOLVED] AlphaInc 6 12,644 Jul-19-2025, 08:53 AM
Last Post: Gribouillis
Question UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 562: ord ctrldan 23 12,719 Apr-24-2023, 03:40 PM
Last Post: ctrldan
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 16: invalid cont Melcu54 3 13,994 Mar-26-2023, 12:12 PM
Last Post: Gribouillis
  Decode string ? JohnnyCoffee 1 1,939 Jan-11-2023, 12:29 AM
Last Post: bowlofred
  [SOLVED] [Debian] UnicodeEncodeError: 'ascii' codec Winfried 1 2,296 Nov-16-2022, 11:41 AM
Last Post: Winfried
  UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in position 14: ordin Armandito 6 5,453 Apr-29-2022, 12:36 PM
Last Post: Armandito
  [solved] unexpected character after line continuation character paul18fr 4 10,298 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  python error: bad character range \|-t at position 12 faustineaiden 0 5,472 May-28-2021, 09:38 AM
Last Post: faustineaiden
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 error from Mysql call AkaAndrew123 1 5,060 Apr-28-2021, 08:16 AM
Last Post: AkaAndrew123

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.