讲解:CS 130A、PrintHashTable、CS/Python、Java/PythonPython|Da

CS 130AAssignment II - The Hottest of Them AllAssigned: October 18th, 2018Due On Demo Day: November 2nd, 2018PLEASE NOTE: Solutions have to be your own. No collaboration or cooperation among students is permitted. 5% of the points will be deducted for each day the assignment was late, with a maximumof 4 days Requests for a regrade must be submitted within seven days from the day when wereturn the assignment.1 IntroductionThe heavy hitter or top-k problem is a famous problem in computer science. Typically usedin settings where a stream of data (sensor data, click stream, advertisment requests, etc)is being analyzed to detect the most popular items. An easy solution is to keep a counterfor each item in the stream, and increment the counter every time the corresponding itemis received. Unfortunately, often the domain of items is too large to maintain a counter foreach element, eg, the domain might be all IP addresses. In this case, we typically maintaina smaller finite set of counters. So, for each new item, we create a counter which correspondto that item. The counter is incremented every time this item is encountered. This worksfine until the set of counters is full. We need to eliminate one counter and add a newone. Usually, the element with the smallest count is deleted. Different strategies have beenproposed regarding the initialization of the new counter. One very successful strategy is toinitialize the new counter with the value of the deleted (smallest count) counter. Of coursethe resulting top-k elements are an approximation of the real top-k, but it has been shownto be quite accurate for realistic distributions. In this assignment, we will use this strategy.12 Basic Data StructuresIn this assignment, you will implement a simple version of the heavy hitter problem, wherewe would like you to find the most popular words in a document. You will be given a .txtfile, which is an article containing words. You will read the words one by one from the fileand at the end identify an approximation of the most popular, ie, most frequent, 15 words.Given that we often need to retrieve (and delete) the word with the smallest frequency, amin heap is a natural choice. However, finding an item in a min heap is not easy. Therefore,you need an additional data structure to easily retrieve each element in the heap. For this,use a Hash table.A binary min heap is a complete binary tree which satisfies the following min heapordering invariant. the min heap invariant: the value of each node is greater than or equal to thevalue of its parent, with the minimum-value element at the root.A min heap can be uniquely represented by storing its level order traversal in anarray as shown in Figure 1. Consider the kth element of the array, its left child is located atindex 2*k,its right child is located at index 2*k+1 and its parent is located at index k/2.Figure 1: min heapIn this assignment, you will implement a min heap using an array object where eachitem in the heap contains the frequency of a word.In order to keep track of the words corresponding to the frequencies in the min heap,you will implement a hash table. Each entry in the hash table contains the word hashedto that location as well as a pointer to the corresponding entry in the min heap (since the2min-heap is implemented using an array, this is simply an index in the array). We leave it toyou to decide whether to use a chaining or a probing hash table, as well as the hash functionand the details of collision handling. However, since the minimum word in the min heap willbe deleted and replaced, you need to support deletion in the hash table (as well as insert).Every time a word is read, it is looked up in the hash table, there are three cases to consider:1. If the string already exists, increment its frequency by one in the min heap and percolatedown to the correct place in the min heap while updating all the correspondingpointers in hash table.2. If the word does not exist in the hash table and the min heap is not full, the newword is inserted into the min heap and is ini代做CS 130A作业、代写PrintHashTable作业、代做CS/Python编程作业、代写Java/Pythontialized to a frequency of one. Furthermore,the word is inserted into the hash table with a pointer to the corresponding frequencyin the min heap (the root in this case).3. If the word does not exist in the hash table and the min heap is full, retrieve theexisting word with the minimum frequency, delete it from the hash table and replaceit with the new word while keeping the frequency as before. The new word is alsoinserted in the hash table with a pointer to the corresponding frequency in the minheap.NOTE Your algorithm should be case insensitive, meaning that, for example, ”he” and ”He”should be treated as the same string. We only care about strings that are words in the file, meaning that, blank space andcommas and etc, should not be considered as strings. The algorithm will give an approximation of the most frequent 15 strings. The size of the heap(array) should always be 16 because we keep the index 0 empty. Figure 2 shows a high level sketch of the two data structures used in this assignmentand how the hash table needs to have pointers to the corresponding entries in the heap.3 Implementation detailsAs a part of this homework, you will implement 4 functions as explained below: Insert:This function will take a string as an input.3– If the newly inserted string already exists in the hashtable, then first locate theposition of the string in the min heap using hashtable, then update the frequencyof the string in the min heap, percolate the element to its correct place in thearray, and lastly update the hashtable to point to the updated position.– If the newly inserted string does not exist in the hashtable, check if the minheapis full or not. If it is full, then simply replace the root entry of the heap withthe newly inserted word and keep the frequency, and then update the hashtable(delete the old word and insert the new word in the correct place using the hashfunction). If the minheap is not full, then insert the string to the heap, andthen update the hashtable. (This is essentially achieved by implementing theReplaceMin function as explained below.) ReplaceMin: This function will be called when the newly inserted string does not existin the hashtable and the minheap is full. Replace the first element of the array (indexone of the array), which has the lowest frequency, with the newly inserted string andupdate the hashtable (i.e, you have to locate the string in the hashtable, and thendelete the entry, and use the hash function to place the newly inserted word in thecorrect place). PrintHeap: This function will print out the most frequent 15 words associated withtheir corresponding frequencies. PrintHashTable: This function will print out the current hash table.3.1 Program FlowNOTE: We do not want any front end UI for this project. Your project will be run on theterminal and the input/output for the demo will use stdio. The file name will be provided asan input to your program. After running your program, we will ask you to call the PrintHeapfunction, which will print out the 15 most frequent strings associated with its correspondingfrequencies, and PrintHashTable function, which will print out the whole hash table. Andthen we will interact with your program (i.e. we will let you call insert(tree)), and thenyou will be asked to call PrintHeap and PrintHashTable. This process might be repeatedmultiple times during the demo.3.2 Extra Credit and Sanity CheckGiven that there are about 250,000 words in English, it is not so unreasonable to maintainan array of size 250,000, where each entry maintains the frequency of the corresponding wordin the file you are analyzing. As extra credit, you can try and figure out a way to maintainthe frequency of all words in the given file and then retrieve the most frequent 15. Comparethem to what your approximate fixed size min heap solution gives.44 DemoWe will have a short demo for each project. It will be on November 2nd, 2018 in CSIL.Time details will be announced later. Please be ready with the working program at the timeof your demo.Figure 2: example转自:http://ass.3daixie.com/2018110427688141.html

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,098评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,213评论 2 380
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,960评论 0 336
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,519评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,512评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,533评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,914评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,574评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,804评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,563评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,644评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,350评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,933评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,908评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,146评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,847评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,361评论 2 342

推荐阅读更多精彩内容