My code:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String>();
if (s == null || s.length() <= 10)
return ret;
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
for (int i = 0; i <= s.length() - 10; i++) {
String tmp = s.substring(i, i + 10);
int key = toNum(tmp);
if (!map.containsKey(key)) {
map.put(key, 1);
}
else {
int val = map.get(key);
val++;
map.put(key, val);
if (val == 2)
ret.add(tmp);
}
}
return ret;
}
private int toNum(String tmp) {
int ret = 0;
for (int i = 0; i < tmp.length(); i++) {
char temp = tmp.charAt(i);
int b = 0;
switch (temp) {
case 'A':
b |= 0;
break;
case 'C':
b |= 1;
break;
case 'G':
b |= 2;
break;
case 'T':
b |= 3;
break;
default:
break;
}
ret |= b;
ret = (ret << 2);
}
return ret;
}
}
这道题木我也没做出来。状态不是很好。
然后看了解法,感觉实在是好暴力。。。
做法就是,
把一个字符串,每个10位都存放到hashmap中。
也就是不停地移动一个字母,然后把接下来的10位字符串存入hashmap并且记录次数。
然后如果key是string,内存会爆掉。
于是将string编码。
A: 1
C: 2
G: 3
T: 4
然后一个10位的字符串可以由一个20比特的数字表示,也就是一个int(32位)来表示。
然后就可以做了。
我参考了这个博客。
http://betterpoetrythancode.blogspot.com/2015/02/repeated-dna-sequences-leetcode-bit.html
**
总结:
hashmap, 编码 <- bit manipulation
**
Anyway, Good luck, Richardo!
My code:
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> ret = new ArrayList<String>();
if (s == null || s.length() == 0) {
return ret;
}
int end = 10;
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
HashSet<String> repeat = new HashSet<String>();
while (end <= s.length()) {
String sub = s.substring(end - 10, end);
int hashcode = hashCode(sub);
if (!map.containsKey(hashcode)) {
map.put(hashcode, 1);
}
else {
map.put(hashcode, map.get(hashcode) + 1);
if (!repeat.contains(sub)) {
repeat.add(sub);
ret.add(sub);
}
}
end++;
}
return ret;
}
private int hashCode(String s) {
int ret = 0;
for (int i = 0; i < s.length(); i++) {
char curr = s.charAt(i);
int b = 0;
switch (curr) {
case 'A':
b |= 0;
break;
case 'C':
b |= 1;
break;
case 'G':
b |= 2;
break;
case 'T':
b |= 3;
break;
default:
break;
}
ret |= b;
ret = (ret << 2);
}
return ret;
}
}
reference:
https://discuss.leetcode.com/topic/8894/clean-java-solution-hashmap-bits-manipulation
一开始自己写出来了纯 HashMap的解法。很简单。
然后以为 Bit manipulation 是加速的,其实不是,是用来节省内存空间的,类似于一种 hashcode
因为 HashMap key is string, memory cost is big
each string has 10 characters.
each character in Java is 16 bit
one string should domain 160 bits
If we use Integer rather than String as key, the cost is 32 bits per key
So how to use Integer to represent a 10-character string?
We try to design some hash functions.
A = 0X00
B = 0X01
C = 0X10
D = 0X11
根据每个character,然后不停地异或,不停地移位,向左移两位,
得出这个integer。
其他的差不多了。为了避免重复,还需要多加一个set
以前的做法,没加,看来testcase有问题。
Anyway, Good luck, Richardo! -- 09/22/2016