Re: [問題] Google Interview Question (4)

看板Prob_Solve (計算數學 Problem Solving)作者 (qqaa)時間11年前 (2013/03/08 16:51), 編輯推噓1(101)
留言2則, 1人參與, 最新討論串9/13 (看更多)
剛剛和 Leon 討論了一下,發現我們對題目的了解有一點不同 原題目: Given a document and a query of K words, how do u find the smallest window that covers all the words at least once in that document? (given you know the inverted lists of all K words, that is, for each word, you have a list of all its occurrrences). This one is really hard. Could someone propose an algorithm in O(n)? 我們可以確定: 1. document is given 2. 知道我們關心的是哪 K 個字 3. 這 K 個字的 occurrence list 也是給定的 問題是,n 指的是什麼? 1. n = document size (in number of words) 在這種情況下, O(n) 是可能的,方法基本上就是 Leon 提出來的。 補充一個 O(document size) 的做法 // all indices starts from 1 Given occ (list of list of occurrences) array = [-1, ..., -1] (size = document size) i = 0 for list in occ i = i + 1 for index in list array[index] = i count = [0, 0, ..., 0] (size = K) nKinds = 0 start = 1 end = 1 minWindowSize = INF while end <= document size while end <= document size and nKinds < K word = array[end] end = end + 1 if word == -1 // we are not interested continue count[word] = count[word] + 1 if count[word] == 1 nKinds = nKinds + 1 while nKinds == K word = array[start] if word == -1 // we are not interested start = start + 1 continue count[word] = count[word] - 1 if count[word] == 0 // check this window nKinds = nKinds - 1 if end - start < minWindowSize minWindowSize = end - start start = start + 1 return minWindowSizejk 2. n = 這 K 個字出現的總次數 Ex: K = 3, 要找出包含 {a, b, c} 的最小 window occurrences: a = { 1, 1000, 2500} b = {400, 1001, 2000} c = {500, 1002, 1500} document size = 2500 甚至更高 (其他的位置是其他的字) n = 3 + 3 + 3 = 9 而不是 2500 在這種情況下,已知可以做到 O(n logK) (RockLee 寫的方法) 不過,在這種狀況下,給定原本的 document 似乎就沒什麼用了 而且在讀入 input 時, 讀入原本的 document 會花掉 O(document size) 的時間 所以在和 Leon 討論時,他對這種解釋方法有一點懷疑 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.112.175.115 ※ 編輯: stimim 來自: 140.112.175.115 (03/08 17:09)

03/09 03:21, , 1F
我 (ledia 應該也是) 自動採用 2. 的解釋, 因為 inverse
03/09 03:21, 1F

03/09 03:22, , 2F
list 可以 offline 建好, 之後這個演算法可以重複多次
03/09 03:22, 2F
文章代碼(AID): #1HEQQPbG (Prob_Solve)
討論串 (同標題文章)
文章代碼(AID): #1HEQQPbG (Prob_Solve)