Finding the top ( k ) most frequent elements in an array is a common question in coding interviews and a useful task in various applications like data analysis and natural language processing. This guide will walk you through three effective methods to solve this problem: using a HashMap with sorting, Min-Heap, and Bucket Sort.
Problem Statement
Given an array of integers and an integer ( k ), find the top ( k ) most frequent elements.
Method 1: HashMap with Sorting
Description: The simplest approach is to count the frequency of each element using a HashMap, then sort the elements based on their frequencies.
Steps:
- Use a HashMap to count the frequency of each element.
- Sort the elements by their frequencies in descending order.
- Select the top ( k ) elements.
Time Complexity: ( O(n \log n) ) due to sorting.
Python Implementation:
from collections import Counter
def top_k_frequent_elements_sort(arr, k):
count = Counter(arr)
return [item for item, freq in count.most_common(k)]
# Example usage:
array = [1, 1, 1, 2, 2, 3]
k = 2
print(top_k_frequent_elements_sort(array, k)) # Output: [1, 2]
Method 2: Min-Heap
Description: Use a Min-Heap to keep track of the top ( k ) most frequent elements.
Steps:
- Use a HashMap to count the frequency of each element.
- Use a Min-Heap of size ( k ) to store the elements based on their frequencies.
- If the heap size exceeds ( k ), remove the smallest frequency element.
- The heap will contain the top ( k ) most frequent elements.
Time Complexity: ( O(n \log k) ).
Python Implementation:
import heapq
from collections import Counter
def top_k_frequent_elements_heap(arr, k):
count = Counter(arr)
min_heap = []
for num, freq in count.items():
heapq.heappush(min_heap, (freq, num))
if len(min_heap) > k:
heapq.heappop(min_heap)
return [num for freq, num in min_heap]
# Example usage:
array = [1, 1, 1, 2, 2, 3]
k = 2
print(top_k_frequent_elements_heap(array, k)) # Output: [2, 1]
Method 3: Bucket Sort
Description: Use Bucket Sort to group elements by their frequencies, allowing for direct access to the most frequent elements.
Steps:
- Use a HashMap to count the frequency of each element.
- Create a list of buckets where the index represents the frequency, and each bucket contains elements with that frequency.
- Iterate through the buckets in reverse order to collect the top ( k ) most frequent elements.
Time Complexity: ( O(n) ).
Python Implementation:
from collections import Counter, defaultdict
def top_k_frequent_elements_bucket(arr, k):
count = Counter(arr)
freq_map = defaultdict(list)
for num, freq in count.items():
freq_map[freq].append(num)
result = []
for freq in range(len(arr), 0, -1):
if freq in freq_map:
result.extend(freq_map[freq])
if len(result) >= k:
return result[:k]
# Example usage:
array = [1, 1, 1, 2, 2, 3]
k = 2
print(top_k_frequent_elements_bucket(array, k)) # Output: [1, 2]
Conclusion
Each method has its strengths and trade-offs. Here’s a quick summary to help you decide which method to use:
- HashMap with Sorting: Simple and straightforward, but less efficient for large datasets due to sorting.
- Min-Heap: Efficient for large datasets and when ( k ) is much smaller than ( n ).
- Bucket Sort: Most efficient in terms of time complexity, though a bit more complex to implement.
Example Usage of Bucket Sort Approach
Here’s a practical example to demonstrate how to use the Bucket Sort approach:
array = [1, 1, 1, 2, 2, 3]
k = 2
print(top_k_frequent_elements_bucket(array, k)) # Output: [1, 2]
In this example, the array [1, 1, 1, 2, 2, 3] is processed to find the top 2 most frequent elements. The Bucket Sort method efficiently finds [1, 2] as the top 2 most frequent elements.
Final Thoughts
Understanding and mastering these methods will prepare you well for tackling similar problems in coding interviews and practical applications. Each method has its place, and knowing when to use each one is key to efficient problem-solving. Happy coding!