 
  Data Structure Data Structure
 Networking Networking
 RDBMS RDBMS
 Operating System Operating System
 Java Java
 MS Excel MS Excel
 iOS iOS
 HTML HTML
 CSS CSS
 Android Android
 Python Python
 C Programming C Programming
 C++ C++
 C# C#
 MongoDB MongoDB
 MySQL MySQL
 Javascript Javascript
 PHP PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Reading UTF8 data from a file using Java
In general, data is stored in a computer in the form of bits (1 or, 0). There are various coding schemes available specifying the set of bytes represented by each character.
Unicode (UTF) − Stands for Unicode Translation Format. It is developed by The Unicode Consortium. if you want to create documents that use characters from multiple character sets, you will be able to do so using the single Unicode character encodings. It provides 3 types of encodings.
- UTF-8 − It comes in 8-bit units (bytes), a character in UTF8 can be from 1 to 4 bytes long, making UTF8 variable width. 
- UTF-16 − It comes in 16-bit units (shorts), it can be 1 or 2 shorts long, making UTF16 variable width. 
- UTF-32 − It comes in 32-bit units (longs). It is a fixed-width format and is always 1 "long" in length. 
Writing UTF data to a file
The readUTF() method of the java.io.DataOutputStream reads data that is in modified UTF-8 encoding, into a String and returns it. Therefore to read UTF-8 data to a file −
- Instantiate the FileInputStream class by passing a String value representing the path of the required file, as a parameter. 
- Instantiate the DataInputStream class bypassing the above created FileInputStream object as a parameter. 
- read UTF data from the InputStream object using the readUTF() method. 
Example
import java.io.DataInputStream; import java.io.EOFException; import java.io.FileInputStream; import java.io.IOException; public class UTF8Example {    public static void main(String args[]) {       StringBuffer buffer = new StringBuffer();       try {          //Instantiating the FileInputStream class          FileInputStream fileIn = new FileInputStream("D:\test.txt");          //Instantiating the DataInputStream class          DataInputStream inputStream = new DataInputStream(fileIn);          //Reading UTF data from the DataInputStream          while(inputStream.available()>0) {             buffer.append(inputStream.readUTF());          }       }       catch(EOFException ex) {          System.out.println(ex.toString());       }       catch(IOException ex) {          System.out.println(ex.toString());       }       System.out.println("Contents of the file: "+buffer.toString());    } } Output
Contents of the file: ??????????? ??????? ?? ????????
The new bufferedReader() method of the java.nio.file.Files class accepts an object of the class Path representing the path of the file and an object of the class Charset representing the type of the character sequences that are to be read() and, returns a BufferedReader object that could read the data which is in the specified format.
The value for the Charset could be StandardCharsets.UTF_8 or, StandardCharsets.UTF_16LE or, StandardCharsets.UTF_16BE or, StandardCharsets.UTF_16 or, StandardCharsets.US_ASCII or, StandardCharsets.ISO_8859_1
Therefore to read UTF-8 data to a file −
- Create/get an object of the Path class representing the required path using the get() method of the java.nio.file.Paths class. 
- Create/get a BufferedReader object, that could read UtF-8 data, bypassing the above-created Path object and StandardCharsets.UTF_8 as parameters. 
- Using the readLine() method of the BufferedReader object read the contents of the file. 
Example
import java.io.BufferedReader; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; public class UTF8Example {    public static void main(String args[]) throws Exception{       //Getting the Path object       String filePath = "D:\samplefile.txt";       Path path = Paths.get(filePath);       //Creating a BufferedReader object       BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);       //Reading the UTF-8 data from the file       StringBuffer buffer = new StringBuffer();       int ch = 0;       while((ch = reader.read())!=-1) {          buffer.append((char)ch+reader.readLine());       }       System.out.println("Contents of the file: "+buffer.toString());    } } Output
Contents of the file: ??????????? ??????? ?? ????????
