java - How to remove duplicate words (words are going not in a row) in file using regex?

Java - How to remove duplicate words (words are going not in a row) in file using regex?

To remove duplicate words from a file in Java using regular expressions, you can read the file line by line, extract the words from each line using a regular expression, and then filter out the duplicate words. Here's how you can do it:

import java.io.*; import java.util.*; import java.util.regex.*; public class Main { public static void main(String[] args) { // Specify the file path String filePath = "input.txt"; // Read the file and remove duplicate words removeDuplicateWords(filePath); } public static void removeDuplicateWords(String filePath) { try { // Open the file for reading BufferedReader reader = new BufferedReader(new FileReader(filePath)); String line; // Compile a regex pattern to match words Pattern pattern = Pattern.compile("\\b(\\w+)\\b"); // Set to store unique words Set<String> uniqueWords = new HashSet<>(); // Open a temporary file for writing PrintWriter writer = new PrintWriter("output.txt"); // Process each line in the file while ((line = reader.readLine()) != null) { // Find all words in the line using the regex pattern Matcher matcher = pattern.matcher(line); // Append non-duplicate words to the output file while (matcher.find()) { String word = matcher.group(1).toLowerCase(); // Convert to lowercase if (!uniqueWords.contains(word)) { writer.print(word + " "); uniqueWords.add(word); } } writer.println(); // Add a new line after each processed line } // Close the reader and writer reader.close(); writer.close(); // Rename the temporary file to the original file File originalFile = new File(filePath); File outputFile = new File("output.txt"); outputFile.renameTo(originalFile); System.out.println("Duplicate words removed successfully."); } catch (IOException e) { System.out.println("Error reading or writing file: " + e.getMessage()); } } } 

This code:

  • Reads the input file line by line.
  • Uses a regular expression (\b(\w+)\b) to match words in each line. \b matches word boundaries, and \w+ matches one or more word characters (letters, digits, or underscores).
  • Converts each word to lowercase to treat words case-insensitively.
  • Maintains a set of unique words encountered so far to filter out duplicates.
  • Writes the non-duplicate words to a temporary output file.
  • Finally, renames the temporary output file to the original input file.

Make sure to replace "input.txt" with the actual path to your input file.

Examples

  1. Java remove duplicate words from a file using regex

    • Description: Explains how to remove duplicate words from a file using regular expressions in Java.
    • Code:
      import java.io.*; import java.util.regex.*; public class RemoveDuplicateWords { public static void main(String[] args) throws IOException { BufferedReader reader = new BufferedReader(new FileReader("input.txt")); BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt")); String line; while ((line = reader.readLine()) != null) { line = line.replaceAll("\\b(\\w+)(\\s+\\1\\b)+", "$1"); writer.write(line + "\n"); } reader.close(); writer.close(); } } 
  2. Java remove duplicate words from a file preserving order using regex

    • Description: Demonstrates how to remove duplicate words from a file while preserving the order of words using regex in Java.
    • Code:
      import java.io.*; import java.util.*; public class RemoveDuplicateWordsPreserveOrder { public static void main(String[] args) throws IOException { BufferedReader reader = new BufferedReader(new FileReader("input.txt")); BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt")); String line; Set<String> uniqueWords = new HashSet<>(); while ((line = reader.readLine()) != null) { String[] words = line.split("\\s+"); for (String word : words) { if (uniqueWords.add(word)) { writer.write(word + " "); } } writer.newLine(); } reader.close(); writer.close(); } } 
  3. Java remove duplicate words from a file case-insensitive using regex

    • Description: Shows how to remove duplicate words from a file in a case-insensitive manner using regular expressions in Java.
    • Code:
      import java.io.*; import java.util.regex.*; public class RemoveDuplicateWordsCaseInsensitive { public static void main(String[] args) throws IOException { BufferedReader reader = new BufferedReader(new FileReader("input.txt")); BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt")); String line; while ((line = reader.readLine()) != null) { line = line.replaceAll("(?i)\\b(\\w+)(\\s+\\1\\b)+", "$1"); writer.write(line + "\n"); } reader.close(); writer.close(); } } 
  4. Java remove duplicate words from a file with custom delimiter using regex

    • Description: Illustrates how to remove duplicate words from a file with a custom delimiter using regex in Java.
    • Code:
      import java.io.*; import java.util.regex.*; public class RemoveDuplicateWordsCustomDelimiter { public static void main(String[] args) throws IOException { BufferedReader reader = new BufferedReader(new FileReader("input.txt")); BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt")); String line; while ((line = reader.readLine()) != null) { line = line.replaceAll("(?<=\\s|^)(\\w+)(\\s+\\1\\b)+", "$1"); writer.write(line + "\n"); } reader.close(); writer.close(); } } 
  5. Java remove duplicate words from a file with multiple spaces using regex

    • Description: Demonstrates how to remove duplicate words from a file where words are separated by multiple spaces using regex in Java.
    • Code:
      import java.io.*; import java.util.regex.*; public class RemoveDuplicateWordsMultipleSpaces { public static void main(String[] args) throws IOException { BufferedReader reader = new BufferedReader(new FileReader("input.txt")); BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt")); String line; while ((line = reader.readLine()) != null) { line = line.replaceAll("\\b(\\w+)(\\s+\\1\\b)+", "$1"); writer.write(line + "\n"); } reader.close(); writer.close(); } } 
  6. Java remove duplicate words from a file ignoring punctuation using regex

    • Description: Shows how to remove duplicate words from a file while ignoring punctuation using regex in Java.
    • Code:
      import java.io.*; import java.util.regex.*; public class RemoveDuplicateWordsIgnorePunctuation { public static void main(String[] args) throws IOException { BufferedReader reader = new BufferedReader(new FileReader("input.txt")); BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt")); String line; while ((line = reader.readLine()) != null) { line = line.replaceAll("\\b(\\w+)([\\p{Punct}\\s]+\\1\\b)+", "$1"); writer.write(line + "\n"); } reader.close(); writer.close(); } } 
  7. Java remove duplicate words from a file with hyphenated words using regex

    • Description: Illustrates how to remove duplicate words from a file containing hyphenated words using regex in Java.
    • Code:
      import java.io.*; import java.util.regex.*; public class RemoveDuplicateWordsHyphenated { public static void main(String[] args) throws IOException { BufferedReader reader = new BufferedReader(new FileReader("input.txt")); BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt")); String line; while ((line = reader.readLine()) != null) { line = line.replaceAll("\\b(\\w+(?:-\\w+)?)(\\s+\\1\\b)+", "$1"); writer.write(line + "\n"); } reader.close(); writer.close(); } } 
  8. Java remove duplicate words from a file with non-word characters using regex

    • Description: Demonstrates how to remove duplicate words from a file containing non-word characters using regex in Java.
    • Code:
      import java.io.*; import java.util.regex.*; public class RemoveDuplicateWordsNonWord { public static void main(String[] args) throws IOException { BufferedReader reader = new BufferedReader(new FileReader("input.txt")); BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt")); String line; while ((line = reader.readLine()) != null) { line = line.replaceAll("(?<=\\b|\\s)(\\S+)(\\s+\\1\\b)+", "$1"); writer.write(line + "\n"); } reader.close(); writer.close(); } } 

More Tags

serializearray mingw32 inspect scalar dojo-1.8 ca split specflow spring-websocket aspbutton

More Programming Questions

More Dog Calculators

More Investment Calculators

More Biochemistry Calculators

More Other animals Calculators