With Java, finding character frequency in a text is a popular problem. When multiple characters got same frequency, it becomes interesting.
Initial part of finding second highest frequency characters in a text was discussed previously and covered the scenarios where a single character is the most frequent. Since this is the continuation post of that, I strongly recommend you to read the first part before this. The intention of this second post is to cover the scenarios at which the previous program failed, specifically the scenarios where multiple characters are of the same frequencies.
Previous production codes can return only one character as the second most frequent. If there are multiple characters matching the criteria, the return type of the code most be modified to a character array. Along with that change, all the previous test cases must be modified to support the returned character array. When there are no matches, it is better to expect the code to return an empty array rather than null, so the test cases has to be modified to reflect that.
Similar to the previous post, new test cases are identified prior to writing any implementation code. While supporting these multiple character scenarios, both map based and sorting based programs are modified to satisfy the test cases. You can find those below in this post.
> result should be this second most frequent character
2. shouldReturnCorrectWhenManyMostFrequentsAndManySecondMostFrequents()
> result should be these second most frequent characters
Above two test cases along with the modified previous test cases are written as follows.
In summary, the puzzle can be solved in many ways as shown in this series of articles. One important point to note is the use of test cases to identify the scenarios before any production code. As you may have already noticed, the same set of test cases were used to test two different implementations. If you have noticed any scenarios that are not covered here, please let me know via comments section so that I can address those.
Part 1: The requirement is to return the character that is the second most frequent. Before writing any code, it is important to break down this requirement into a simple set of test cases.The requirement of Part 2 is to return the characters those are the second most frequent. Before writing any code, it is important to break down this requirement into a simple set of test cases.
Previous production codes can return only one character as the second most frequent. If there are multiple characters matching the criteria, the return type of the code most be modified to a character array. Along with that change, all the previous test cases must be modified to support the returned character array. When there are no matches, it is better to expect the code to return an empty array rather than null, so the test cases has to be modified to reflect that.
Similar to the previous post, new test cases are identified prior to writing any implementation code. While supporting these multiple character scenarios, both map based and sorting based programs are modified to satisfy the test cases. You can find those below in this post.
New Test Cases
1. shouldReturnCorrectWhenManyMostFrequentsAndOneSecondMostFrequent()> result should be this second most frequent character
2. shouldReturnCorrectWhenManyMostFrequentsAndManySecondMostFrequents()
> result should be these second most frequent characters
Above two test cases along with the modified previous test cases are written as follows.
package com.digizol.puzzle;
import static java.lang.Character.valueOf;
import static org.junit.Assert.assertEquals;
import java.util.*;
import org.junit.*;
public class SecondFrequentCharactersTest {
private MapBasedFrequentCharacters sut;
@Before
public void setup() {
sut = new MapBasedFrequentCharacters();
// sut = new SortBasedFrequentCharacters();
}
@Test
public void shouldReturnCorrectWhenManyMostFrequentsAndOneSecondMostFrequent() {
// both i & c are the most frequent, y is the second most frequent
assertEquals(valueOf('y'),
valueOf(sut.getSecondMostFrequent(
"iiixiiyycbcccc")[0])
);
}
@Test
public void shouldReturnCorrectWhenManyMostFrequentsAndManySecondMostFrequents() {
// both i & c are the most frequent,
// x & y are the second most frequent
char[] secondMostFrequents =
sut.getSecondMostFrequent("iiixxiiyycbcccc");
assertEquals(2, secondMostFrequents.length);
List<Character> expected = new ArrayList<Character>();
expected.add('x');
expected.add('y');
for (int i = 0; i < secondMostFrequents.length; i++) {
expected.remove(Character.valueOf(secondMostFrequents[i]));
}
assertEquals(0, expected.size());
}
// previous test cases are modified as below
@Test
public void shouldReturnNullForEmptyText() {
assertEquals(0, sut.getSecondMostFrequent("").length);
}
@Test
public void shouldReturnNullForTextOfSameCharacter() {
assertEquals(0, sut.getSecondMostFrequent("dddddddd").length);
}
@Test
public void shouldReturnCorrectCharWhenTextHasTwoCharsOneBeingMostFrequent() {
assertEquals(valueOf('y'),
valueOf(sut.getSecondMostFrequent("iiiiiyiiiii")[0])
);
}
@Test
public void shouldReturnCorrectOneForATextWithOneBeingSecondMostFrequent() {
// most frequent is 'i', second most is 'd'
assertEquals(valueOf('d'),
valueOf(sut.getSecondMostFrequent(
"iaibicidieidif")[0])
);
}
}
Production Code
Similar to previous post, map based solution is modified first followed by the sorting based one to satisfy all above test cases.Map based solution
Previous map based solution had the character frequencies recorded in the map. At the 'second most frequent character' finding logic, two 'char' variables were used. With the new scenarios there are be multiple characters for both most frequents and second most frequents, so the char variables can be replaced with two lists as shown below.package com.digizol.puzzle;
import java.util.*;
import java.util.Map.Entry;
public class MapBasedFrequentCharacters {
public char[] getSecondMostFrequent(String text) {
char[] charArray = text.toCharArray();
// calculate char frequencies
Map<Character, Integer> charFrequenciesMap
= new HashMap<Character, Integer>();
// loop1
for (char c : charArray) {
int frequency = 1;
if (charFrequenciesMap.get(c) != null) {
frequency = charFrequenciesMap.get(c) + 1;
}
charFrequenciesMap.put(c, frequency);
}
int currentMostFrequency = 0;
int currentSecondMostFrequency = 0;
List<Character> mostFrequentChars = new ArrayList<Character>();
List<Character> secondMostChars = new ArrayList<Character>();
// find second most frequent char
Iterator<Entry<Character, Integer>> charFrequencies
= charFrequenciesMap.entrySet().iterator();
// loop2
while (charFrequencies.hasNext()) {
Entry<Character, Integer> entry = charFrequencies.next();
char currentChar = entry.getKey();
int frequency = entry.getValue();
if (frequency > currentMostFrequency) {
secondMostChars.clear();
secondMostChars.addAll(mostFrequentChars);
mostFrequentChars.clear();
mostFrequentChars.add(currentChar);
currentSecondMostFrequency = currentMostFrequency;
currentMostFrequency = frequency;
} else if (frequency == currentMostFrequency) {
mostFrequentChars.add(currentChar);
} else if (frequency > currentSecondMostFrequency) {
secondMostChars.clear();
secondMostChars.add(currentChar);
currentSecondMostFrequency = frequency;
} else if (frequency == currentSecondMostFrequency) {
secondMostChars.add(currentChar);
}
}
char[] result = new char[secondMostChars.size()];
for (int i = 0; i < secondMostChars.size(); i++) {
result[i] = secondMostChars.get(i);
}
return result;
}
}
Sorting based solution
Previous sorting based solution had sorted the list of 'Frequency' in the descending order of frequencies; and returned the second element in the list for the second most frequent. However with the new scenarios, there can be multiple most frequent characters so the second element may contain another most frequent character. Similarly, there can be multiple 'second most frequent' elements in the list. Java Comparator is used to sort the frequency data in this program. It is required to traverse through the list from the start and extract an array of elements with the second most frequency. Following is the modified program.package com.digizol.puzzle;
import java.util.*;
public class SortBasedFrequentCharacters {
public char[] getSecondMostFrequent(String text) {
char[] charArray = text.toCharArray();
Arrays.sort(charArray);
List<Frequency> list = new ArrayList<Frequency>();
char previous = '\u0000';
Frequency f = null;
for (int i = 0; i < charArray.length; i++) {
char c = charArray[i];
if (i == 0 || previous != c) {
f = new Frequency(1, c);
list.add(f);
previous = c;
} else {
f.count += 1;
}
}
Collections.sort(
list,
new Comparator<Frequency>() {
public int compare(Frequency fr0,
Frequency fr1) {
//sort in descending order
return fr1.count - fr0.count;
}
}
);
// supporting multiple characters being most frequent
int currentFrequency = 0;
boolean secondMostFound = false;
int start = -1;
int end = -1;
for (int i = 0; i < list.size(); i++) {
Frequency frequency = list.get(i);
if (i == 0) {
currentFrequency = frequency.count;
} else if (currentFrequency != frequency.count) {
if (secondMostFound) {
end = i;
break;
} else {
secondMostFound = true;
start = i;
end = i;
}
}
}
char values[] = null;
if (secondMostFound) {
List<Frequency> secondMostFrequencies = list
.subList(start, end + 1);
values = new char[secondMostFrequencies.size()];
for (int i = 0; i < secondMostFrequencies.size(); i++) {
values[i] = secondMostFrequencies.get(i).character;
}
} else {
values = new char[0];
}
return values;
}
private class Frequency {
int count;
char character;
public Frequency(int count, char character) {
this.count = count;
this.character = character;
}
}
}
In summary, the puzzle can be solved in many ways as shown in this series of articles. One important point to note is the use of test cases to identify the scenarios before any production code. As you may have already noticed, the same set of test cases were used to test two different implementations. If you have noticed any scenarios that are not covered here, please let me know via comments section so that I can address those.
COMMENTS