Reading UTF-8 - BOM marker in java

Reading UTF-8 - BOM marker in java

In Java, you can read and detect the UTF-8 Byte Order Mark (BOM) marker, which is a special character sequence (0xEF, 0xBB, 0xBF) at the beginning of a UTF-8 encoded file, using the java.nio.charset.Charset class. Here's how you can do it:

import java.io.*; import java.nio.charset.Charset; public class UTF8BOMDetector { public static void main(String[] args) { String filePath = "path/to/utf8file.txt"; // Replace with the path to your UTF-8 encoded file try (InputStream inputStream = new FileInputStream(filePath)) { byte[] bomBytes = new byte[3]; int bytesRead = inputStream.read(bomBytes); if (bytesRead == 3 && bomBytes[0] == (byte) 0xEF && bomBytes[1] == (byte) 0xBB && bomBytes[2] == (byte) 0xBF) { System.out.println("UTF-8 BOM detected."); } else { System.out.println("UTF-8 BOM not detected."); } // Read the rest of the file BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, Charset.forName("UTF-8"))); String line; while ((line = reader.readLine()) != null) { System.out.println(line); } } catch (IOException e) { e.printStackTrace(); } } } 

In this code:

  1. Open the file using an InputStream to read the first three bytes (the potential BOM marker).
  2. Check if the first three bytes match the UTF-8 BOM (0xEF, 0xBB, 0xBF).
  3. If the BOM is detected, you can process the file accordingly. If not, you can still read the file as UTF-8.

This code allows you to detect the presence of the UTF-8 BOM marker and then read the file correctly as UTF-8, whether the BOM marker is present or not.


More Tags

geography css-float lombok database-performance skrollr aop axios-cookiejar-support gitignore edit spaces

More Java Questions

More Internet Calculators

More Bio laboratory Calculators

More Transportation Calculators

More Various Measurements Units Calculators