In Java interviews, finding character frequencies is a popular question. Here we are discussing and implementing a program to find the second most frequent.
Finding second highest frequency character in a text is last weeks Thursdays Dzone Code Puzzle. Intention of this post is to try and work out this interview oriented problem in Java with you by explaining the way I am approaching this task. In return, both you and myself can learn through a discussion.
The requirement is to return the character that is the second most frequent. Before writing any code, it is important to break down this requirement into a simple set of test cases. As a TDD (Test Driven Development) oriented developer, I always try to approach a problem in that manner. The most correct way would be to start with one test case and proceed with the implementation code, but for the easiness in explanation, I will show four initial test cases and production code to support that requirement.
> result should be null character for empty string
2. shouldReturnNullForTextOfSameCharacter()
> result should be null character for a text with the same character since there has to be at least two different characters to have a second most frequency
3. shouldReturnCorrectCharWhenTextHasTwoCharsOneBeingMostFrequent()
> result should be the character that is not the most frequent character
4. shouldReturnCorrectOneForATextWithOneBeingSecondMostFrequent()
> result should be the second most frequent character
Above test cases are implemented in the following manner with JUnit4. All test cases check the equality of expected values against actual.
Above solution uses two loops to complete the operation, and number of iterations in second loop is related to size of character set. This might be considerable if the character set is larger in size e.g: UNICODE. If that is a concern, number of iterations can be reduced by combining two loops; using the frequency recording loop itself to record the second most frequent character as well.
To check this, I did a simple test only to compare the time taken by each variation of the above program against a text with over 5 million ASCII characters (5,264,279 to be exact). As per the results, two loops based program performs slightly quicker than the other. This may be due to the small size of ASCII character set. Anyway one loop variation is not shown here since it is simple enough for you to work out.
Both of the above production codes are not re-factored and purposefully kept it as an exercise for you.
As this article is growing in length, Part 2 covers these extra scenarios.
The requirement is to return the character that is the second most frequent. Before writing any code, it is important to break down this requirement into a simple set of test cases. As a TDD (Test Driven Development) oriented developer, I always try to approach a problem in that manner. The most correct way would be to start with one test case and proceed with the implementation code, but for the easiness in explanation, I will show four initial test cases and production code to support that requirement.
Part 2: will cover additional scenarios which this program does not support.
Initial Test Cases
1. shouldReturnNullForEmptyText()> result should be null character for empty string
2. shouldReturnNullForTextOfSameCharacter()
> result should be null character for a text with the same character since there has to be at least two different characters to have a second most frequency
3. shouldReturnCorrectCharWhenTextHasTwoCharsOneBeingMostFrequent()
> result should be the character that is not the most frequent character
4. shouldReturnCorrectOneForATextWithOneBeingSecondMostFrequent()
> result should be the second most frequent character
Above test cases are implemented in the following manner with JUnit4. All test cases check the equality of expected values against actual.
package com.digizol.puzzle;
import static java.lang.Character.valueOf;
import static org.junit.Assert.assertEquals;
import org.junit.Before;
import org.junit.Test;
public class SecondFrequentCharacterInitialTest {
private MapBasedFrequentCharacter sut;
@Before
public void setup() {
sut = new MapBasedFrequentCharacter();
// sut = new SortBasedFrequentCharacter();
}
@Test
public void shouldReturnNullForEmptyText() {
assertEquals(valueOf('\u0000'),
valueOf(sut.getSecondMostFrequent("")));
}
@Test
public void shouldReturnNullForTextOfSameCharacter() {
assertEquals(valueOf('\u0000'),
valueOf(sut.getSecondMostFrequent("dddddddd")));
}
@Test
public void shouldReturnCorrectCharWhenTextHasTwoCharsOneBeingMostFrequent() {
assertEquals(
valueOf('y'),
valueOf(sut.getSecondMostFrequent("iiiiiyiiiii")));
}
@Test
public void shouldReturnCorrectOneForATextWithOneBeingSecondMostFrequent() {
// most frequent is 'i', second most is 'd'
assertEquals(
valueOf('d'),
valueOf(sut.getSecondMostFrequent("iaibicidieidif")));
}
}
Production Code
As per the test cases, production code should have a method to take the text as an input and to return a character. One approach to solve this is to use a map to record the frequencies of each character and extract the second most frequent. Another approach is to record the character frequencies in a list, then sort by the frequency and extract the second most frequent. There is another approach to record the frequencies in an array where the array index is the integer representation of the character, but size of the character set must be known prior. In this post, both map based and sort based approaches are implemented, not the array based one.Map based solution
In the following implementation, the characters in the text is added to a map and the frequency of those are incremented. Completed production code looks as follows.package com.digizol.puzzle;
import java.util.*;
import java.util.Map.Entry;
public class MapBasedFrequentCharacter {
public char getSecondMostFrequent(String text) {
char[] charArray = text.toCharArray();
// calculate char frequencies
Map<Character, Integer> charFrequenciesMap
= new HashMap<Character, Integer>();
// loop1
for (char c : charArray) {
int frequency = 1;
if (charFrequenciesMap.get(c) != null) {
frequency = charFrequenciesMap.get(c) + 1;
}
charFrequenciesMap.put(c, frequency);
}
int currentMostFrequency = 0;
int currentSecondMostFrequency = 0;
char mostFrequentChar = '\u0000';
char secondMostChar = '\u0000';
// find second most frequent char
Iterator<Entry<Character, Integer>> charFrequencies
= charFrequenciesMap.entrySet().iterator();
// loop2
while (charFrequencies.hasNext()) {
Entry<Character, Integer> entry = charFrequencies.next();
char currentChar = entry.getKey();
int frequency = entry.getValue();
if (frequency > currentMostFrequency) {
secondMostChar = mostFrequentChar;
currentSecondMostFrequency = currentMostFrequency;
currentMostFrequency = frequency;
mostFrequentChar = currentChar;
} else if (frequency > currentSecondMostFrequency) {
currentSecondMostFrequency = frequency;
secondMostChar = currentChar;
}
}
return secondMostChar;
}
}
Above solution uses two loops to complete the operation, and number of iterations in second loop is related to size of character set. This might be considerable if the character set is larger in size e.g: UNICODE. If that is a concern, number of iterations can be reduced by combining two loops; using the frequency recording loop itself to record the second most frequent character as well.
To check this, I did a simple test only to compare the time taken by each variation of the above program against a text with over 5 million ASCII characters (5,264,279 to be exact). As per the results, two loops based program performs slightly quicker than the other. This may be due to the small size of ASCII character set. Anyway one loop variation is not shown here since it is simple enough for you to work out.
Sorting based solution
In this approach, characters in text are chronologically ordered before starting to record the frequencies. A new type is introduced named 'Frequency' to hold the character and it's frequency. After recording the frequencies, a Comparator is used to sort the list of 'Frequency' in descending order of character frequencies to facilitate the extraction of second most frequent.package com.digizol.puzzle;
import java.util.*;
public class SortBasedFrequentCharacter {
public char getSecondMostFrequent(String text) {
char[] charArray = text.toCharArray();
Arrays.sort(charArray);
List<Frequency> list = new ArrayList<Frequency>();
char previous = '\u0000';
Frequency f = null;
for (int i = 0; i < charArray.length; i++) {
char c = charArray[i];
if (i == 0 || previous != c) {
f = new Frequency(1, c);
list.add(f);
previous = c;
} else {
f.count = f.count + 1;
}
}
Collections.sort(
list,
new Comparator<Frequency>() {
public int compare(Frequency fr0,
Frequency fr1) {
// sort in descending order
return fr1.count - fr0.count;
}
}
);
char value = '\u0000';
if (list.size() > 1) {
value = list.get(1).character;
}
return value;
}
private class Frequency {
int count;
char character;
public Frequency(int count, char character) {
this.count = count;
this.character = character;
}
}
}
More Test Cases
After this initial implementation, we need to consider other test cases that belongs to the requirement. As you may have already noted, above implementation does not support the scenarios where multiple characters being of the same frequencies. As an example, following test case fails against both of the above programs.@Test
public void shouldReturnCorrectWhenManyMostFrequentsAndOneSecondMostFrequent() {
// both i & c are the most frequent, y is the second most frequent
assertEquals(valueOf('y'),
valueOf(sut.getSecondMostFrequent("iiiiiyccccc")));
}
COMMENTS