- 中文名布隆过滤,适用于排除某个值不在一个集合内,本文不讨论布隆过滤的缺陷
- 首先给出一组字符串集合,然后判断某个字符串是否在这个集合中
char *httphead[] = {
"Uri=",
"Host=",
"Referer=",
"User-Agent=",
};
- 初始化筛选器,通过计算多个hash算法,得出多个key值,然后在bit位中key值的地方置1,得出一个bloom筛选器
- 判断字符串是否在筛选器中,对字符串进行多次hash计算,同样得出多个key值,然后获取bit位中相应key值的值,如果不是都为1, 那么字符串肯定不在集合中。但是反过来,值都为1,只能说字符串可能再集合中,这就涉及到一个hash冲突,误识别率的问题,暂时不讨论
- 字符串hash算法
- 具体实现
#define MAX_BIT_COUNT 1000
#define MAX_HTTPHEAD_NUM 32
#define MAX_HASHFUNC_NUM 8
unsigned char bits[MAX_BIT_COUNT] = {0};
typedef unsigned int (* hashfunc)(char *);
hashfunc func[] = {
SDBMHash,
RSHash,
JSHash,
PJWHash,
ELFHash,
BKDRHash,
DJBHash,
APHash
};
void add_to_bloomfilter(char *str)
{
int hashvalue = 0, i = 0;
for(i = 0 ; i < MAX_HASHFUNC_NUM; i++)
{
hashvalue = func[i](str) % (MAX_BIT_COUNT * 8);
bits[hashvalue/8] |= (0x01 << (hashvalue % 8));\
}
}
unsigned char judge_bloomfilter(char *str)
{
int hashvalue = 0, i = 0;
for (i = 0 ; i < MAX_HASHFUNC_NUM; i++)
{
hashvalue = func[i](str) % (MAX_BIT_COUNT * 8);
if ((bits[hashvalue/8] & (0x01 << (hashvalue % 8))) == 0)
{
break;
}
}
if (i == MAX_HASHFUNC_NUM)
{
return 1;
}
else
{
return 0;
}
}
int main(int argc, char **argv)
{
int i = 0;
for (i = 0; i < MAX_HTTPHEAD_NUM; i++)
{
add_to_bloomfilter(httphead[i]);
}
for (i = 0; i < MAX_HTTPHEAD_NUM; i++)
{
if (judge_bloomfilter(httphead[i]) > 0)
{
printf("%s hash been find\n", httphead[i]);
}
}
if (judge_bloomfilter("axjhfdsaew") > 0)
{
printf("%s hash been find\n", "aaaaaaa");
}
return 0;
}