[Part 2] Finding Second Highest Frequent Characters using Java

With Java, finding character frequency in a text is a popular problem. When multiple characters got same frequency, it becomes interesting.

Initial part of finding second highest frequency characters in a text was discussed previously and covered the scenarios where a single character is the most frequent. Since this is the continuation post of that, I strongly recommend you to read the first part before this. The intention of this second post is to cover the scenarios at which the previous program failed, specifically the scenarios where multiple characters are of the same frequencies.

Part 1: The requirement is to return the character that is the second most frequent. Before writing any code, it is important to break down this requirement into a simple set of test cases.

The requirement of Part 2 is to return the characters those are the second most frequent. Before writing any code, it is important to break down this requirement into a simple set of test cases.

Previous production codes can return only one character as the second most frequent. If there are multiple characters matching the criteria, the return type of the code most be modified to a character array. Along with that change, all the previous test cases must be modified to support the returned character array. When there are no matches, it is better to expect the code to return an empty array rather than null, so the test cases has to be modified to reflect that.

Similar to the previous post, new test cases are identified prior to writing any implementation code. While supporting these multiple character scenarios, both map based and sorting based programs are modified to satisfy the test cases. You can find those below in this post.

New Test Cases

1. shouldReturnCorrectWhenManyMostFrequentsAndOneSecondMostFrequent()
> result should be this second most frequent character
2. shouldReturnCorrectWhenManyMostFrequentsAndManySecondMostFrequents()
> result should be these second most frequent characters

Above two test cases along with the modified previous test cases are written as follows.

package com.digizol.puzzle;

import static java.lang.Character.valueOf;
import static org.junit.Assert.assertEquals;
import java.util.*;
import org.junit.*;

public class SecondFrequentCharactersTest {

    private MapBasedFrequentCharacters sut;

    @Before
    public void setup() {
        sut = new MapBasedFrequentCharacters();
        // sut = new SortBasedFrequentCharacters();
    }

    @Test
    public void shouldReturnCorrectWhenManyMostFrequentsAndOneSecondMostFrequent() {
        // both i & c are the most frequent, y is the second most frequent
        assertEquals(valueOf('y'), 
                     valueOf(sut.getSecondMostFrequent(
                                        "iiixiiyycbcccc")[0])
                    );
    }
    
    @Test
    public void shouldReturnCorrectWhenManyMostFrequentsAndManySecondMostFrequents() {
        // both i & c are the most frequent, 
        // x & y are the second most frequent
        char[] secondMostFrequents = 
                             sut.getSecondMostFrequent("iiixxiiyycbcccc");
        assertEquals(2, secondMostFrequents.length);
        
        List<Character> expected = new ArrayList<Character>();
        expected.add('x');
        expected.add('y');
        for (int i = 0; i < secondMostFrequents.length; i++) {
            expected.remove(Character.valueOf(secondMostFrequents[i]));
        }
        assertEquals(0, expected.size());
    }
    
    // previous test cases are modified as below
    
    @Test
    public void shouldReturnNullForEmptyText() {
        assertEquals(0, sut.getSecondMostFrequent("").length);
    }

    @Test
    public void shouldReturnNullForTextOfSameCharacter() {
        assertEquals(0, sut.getSecondMostFrequent("dddddddd").length);
    }

    @Test
    public void shouldReturnCorrectCharWhenTextHasTwoCharsOneBeingMostFrequent() {
        assertEquals(valueOf('y'), 
                     valueOf(sut.getSecondMostFrequent("iiiiiyiiiii")[0])
                    );
    }

    @Test
    public void shouldReturnCorrectOneForATextWithOneBeingSecondMostFrequent() {
        // most frequent is 'i', second most is 'd'
        assertEquals(valueOf('d'), 
                     valueOf(sut.getSecondMostFrequent(
                                                   "iaibicidieidif")[0])
                    );
    }
}

Production Code

Similar to previous post, map based solution is modified first followed by the sorting based one to satisfy all above test cases.

Map based solution

Previous map based solution had the character frequencies recorded in the map. At the 'second most frequent character' finding logic, two 'char' variables were used. With the new scenarios there are be multiple characters for both most frequents and second most frequents, so the char variables can be replaced with two lists as shown below.

package com.digizol.puzzle;

import java.util.*;
import java.util.Map.Entry;

public class MapBasedFrequentCharacters {

    public char[] getSecondMostFrequent(String text) {

        char[] charArray = text.toCharArray();

        // calculate char frequencies
        Map<Character, Integer> charFrequenciesMap 
                                      = new HashMap<Character, Integer>();

        // loop1
        for (char c : charArray) {
            int frequency = 1;
            if (charFrequenciesMap.get(c) != null) {
                frequency = charFrequenciesMap.get(c) + 1;
            }
            charFrequenciesMap.put(c, frequency);
        }

        int currentMostFrequency = 0;
        int currentSecondMostFrequency = 0;
        List<Character> mostFrequentChars = new ArrayList<Character>();
        List<Character> secondMostChars = new ArrayList<Character>();

        // find second most frequent char
        Iterator<Entry<Character, Integer>> charFrequencies 
                               = charFrequenciesMap.entrySet().iterator();

        // loop2
        while (charFrequencies.hasNext()) {
            Entry<Character, Integer> entry = charFrequencies.next();

            char currentChar = entry.getKey();
            int frequency = entry.getValue();

            if (frequency > currentMostFrequency) {
                secondMostChars.clear();
                secondMostChars.addAll(mostFrequentChars);
                mostFrequentChars.clear();
                mostFrequentChars.add(currentChar);
                currentSecondMostFrequency = currentMostFrequency;
                currentMostFrequency = frequency;
            } else if (frequency == currentMostFrequency) {
                mostFrequentChars.add(currentChar);
            } else if (frequency > currentSecondMostFrequency) {
                secondMostChars.clear();
                secondMostChars.add(currentChar);
                currentSecondMostFrequency = frequency;
            } else if (frequency == currentSecondMostFrequency) {
                secondMostChars.add(currentChar);
            }
        }

        char[] result = new char[secondMostChars.size()];
        for (int i = 0; i < secondMostChars.size(); i++) {
            result[i] = secondMostChars.get(i);
        }

        return result;
    }
}

Sorting based solution

Previous sorting based solution had sorted the list of 'Frequency' in the descending order of frequencies; and returned the second element in the list for the second most frequent. However with the new scenarios, there can be multiple most frequent characters so the second element may contain another most frequent character. Similarly, there can be multiple 'second most frequent' elements in the list. Java Comparator is used to sort the frequency data in this program. It is required to traverse through the list from the start and extract an array of elements with the second most frequency. Following is the modified program.

package com.digizol.puzzle;

import java.util.*;

public class SortBasedFrequentCharacters {

    public char[] getSecondMostFrequent(String text) {

        char[] charArray = text.toCharArray();
        Arrays.sort(charArray);

        List<Frequency> list = new ArrayList<Frequency>();
        char previous = '\u0000';
        Frequency f = null;

        for (int i = 0; i < charArray.length; i++) {
            char c = charArray[i];

            if (i == 0 || previous != c) {
                f = new Frequency(1, c);
                list.add(f);
                previous = c;
            } else {
                f.count += 1;
            }
        }

        Collections.sort(
                        list, 
                        new Comparator<Frequency>() {
                            public int compare(Frequency fr0, 
                                               Frequency fr1) {
                                //sort in descending order
                                return fr1.count - fr0.count;
                            }
                        }
                    );

        // supporting multiple characters being most frequent

        int currentFrequency = 0;
        boolean secondMostFound = false;
        int start = -1;
        int end = -1;
        for (int i = 0; i < list.size(); i++) {
            Frequency frequency = list.get(i);
            if (i == 0) {
                currentFrequency = frequency.count;
            } else if (currentFrequency != frequency.count) {
                if (secondMostFound) {
                    end = i;
                    break;
                } else {
                    secondMostFound = true;
                    start = i;
                    end = i;
                }
            }
        }

        char values[] = null;
        if (secondMostFound) {
            List<Frequency> secondMostFrequencies = list
                    .subList(start, end + 1);
            values = new char[secondMostFrequencies.size()];
            for (int i = 0; i < secondMostFrequencies.size(); i++) {
                values[i] = secondMostFrequencies.get(i).character;
            }
        } else {
            values = new char[0];
        }
        return values;
    }

    private class Frequency {
        int count;
        char character;

        public Frequency(int count, char character) {
            this.count = count;
            this.character = character;
        }
    }
}

In summary, the puzzle can be solved in many ways as shown in this series of articles. One important point to note is the use of test cases to identify the scenarios before any production code. As you may have already noticed, the same set of test cases were used to test two different implementations. If you have noticed any scenarios that are not covered here, please let me know via comments section so that I can address those.

Digizol

Header$type=social_icons

[Part 2] Finding Second Highest Frequent Characters using Java

New Test Cases

Production Code

Map based solution

Sorting based solution

Labels:

COMMENTS

Read More...

Follow us$type=social_counter

/fa-clock-o/ THIS WEEK'S HOT$type=list$hide=page

LATESTS$type=blogging$m=0$cate=0$sn=0$rm=0$c=4$va=0$hide=page

RESPONSES$type=list-tab$com=0$c=4$va=0$src=recent-comments$hide=page

$hide=page

/fa-fire/ MOST POPULAR$type=one$hide=page

$hide=page

Footer Social$type=social_icons

[Part 2] Finding Second Highest Frequent Characters using Java

New Test Cases

Production Code

Map based solution

Sorting based solution

Labels:

SHARE:

COMMENTS

Read More...

Follow us$type=social_counter

/fa-clock-o/ THIS WEEK'S HOT$type=list$hide=page

LATESTS$type=blogging$m=0$cate=0$sn=0$rm=0$c=4$va=0$hide=page

RESPONSES$type=list-tab$com=0$c=4$va=0$src=recent-comments$hide=page

$hide=page

/fa-fire/ MOST POPULAR$type=one$hide=page

$hide=page

Footer Social$type=social_icons