Posted on Jun 28 • Edited on Jul 2

String in Python (3)

#python #string #str #function

Buy Me a Coffee☕

*Memos:

My post explains a string.

encode() can encode a string and decode() can decode the encoded string as shown below:

*Memos:

The 1st argument is encoding(Optional-Default:'utf-8'): *Memos:
- 'utf-8', 'utf-7', 'utf-16', 'big5', 'ascii', etc can be set to it.
- You can see Standard Encodings for more possible values.
The 2nd argument is errors(Optional-Default:'strict'): *Memos:
- It controls encoding or decoding error with the error handlers, 'strict', 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace', etc.
- 'strict' raises UnicodeError if the character, which cannot be encoded or decoded, exists.
- 'ignore' ignores the character which cannot be encoded or decoded.
- 'replace' replaces the character, which cannot be encoded or decoded, with ? for encode() or � for decode().
- 'xmlcharrefreplace' replaces the character, which cannot be encoded or decoded, with a XML character e.g. ё, φ, etc.
- 'backslashreplace' replaces the character, which cannot be encoded or decoded, with \\uxxxx for encode() e.g. \\u0451 or \uxxxx for decode() e.g. \u0451.
- You can see more error handlers.
- You can create your own error handler with codecs.register_error().
After using encode(), decode() appears and encode() disappears.
After using decode(), encode() appears and decode() disappears.

v = 'Hёllφ!' ev = v.encode() ev = v.encode(encoding='utf-8', errors='strict') print(ev) # b'H\xd1\x91ll\xcf\x86!'  dv = ev.decode() dv = ev.decode(encoding='utf-8', errors='strict') print(dv) # Hёllφ!

v = 'Hёllφ!' ev = v.encode(encoding='utf-7') print(ev) # b'H+BFE-ll+A8Y!'  dv = ev.decode(encoding='utf-7') print(dv) # Hёllφ!

v = 'Hёllφ!' ev = v.encode(encoding='utf-16') print(ev) # b'\xff\xfeH\x00Q\x04l\x00l\x00\xc6\x03!\x00'  dv = ev.decode(encoding='utf-16') print(dv) # Hёllφ!

v = 'Hёllφ!' ev = v.encode(encoding='big5') print(ev) # b'H\xc7\xcell\xa3p!'  dv = ev.decode(encoding='big5') print(dv) # Hёllφ!

import codecs def hashreplace_handler(s): return ((s.end - s.start) * '#', s.end) codecs.register_error('hashreplace', hashreplace_handler) v = 'Hёllφ!' print(v.encode(encoding='ascii', errors='ignore')) # b'Hll!'  print(v.encode(encoding='ascii', errors='replace')) # b'H?ll?!'  print(v.encode(encoding='ascii', errors='xmlcharrefreplace')) # b'H&#1105;ll&#966;!'  print(v.encode(encoding='ascii', errors='backslashreplace')) # b'H\\u0451ll\\u03c6!'  print(v.encode(encoding='ascii', errors='hashreplace')) # b'H#ll#!'  print(v.encode(encoding='ascii')) print(v.encode(encoding='ascii', errors='strict')) # UnicodeEncodeError: 'ascii' codec can't encode character '\u0451' # in position 1: ordinal not in range(128)

import codecs def hashreplace_handler(s): return ((s.end - s.start) * '#', s.end) codecs.register_error('hashreplace', hashreplace_handler) v = 'Hёllφ!' ev = v.encode() print(ev) # b'H\xd1\x91ll\xcf\x86!'  print(ev.decode(encoding='ascii', errors='ignore')) # Hll!  print(ev.decode(encoding='ascii', errors='replace')) # H��ll��!  print(ev.decode(encoding='ascii', errors='hashreplace')) # H##ll##!  print(ev.decode(encoding='ascii', errors='strict')) # UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 # in position 1: ordinal not in range(128)

v = 'Hёllφ!' ev = v.encode(encoding='ascii', errors='xmlcharrefreplace') print(ev) # b'H&#1105;ll&#966;!'  print(ev.decode(encoding='ascii', errors='xmlcharrefreplace')) # H&#1105;ll&#966;!

v = 'Hёllφ!' ev = v.encode(encoding='ascii', errors='backslashreplace') print(ev) # b'H\\u0451ll\\u03c6!'  print(ev.decode(encoding='ascii', errors='backslashreplace')) # H\u0451ll\u03c6!

DEV Community

String in Python (3)

Top comments (0)