algorithm - 亚马逊访谈：时间戳排序：找出三页子集序列重复最大次数

Amazon的面试问题是：

"给定一个包含( 。User_Id，URL，时间戳) 用户的日志文件，用户可以从一个到另一个浏览页面。 查找三个页面子集序列重复最大次数。 记录按时间戳排序。"

( 尽管在面试之后，我还没有被告知，这个文件是按时间戳排序的。 我所要求的第一件事是，如果日志是排序的，我的面试员说不是。

``````
public Tuple<string,string,string> GetMostFrequentTriplet(
IEnumerable<LogEntry> entries,
TimeSpan? maxSeparation = null)
{
//Assuming 'entries' is already ordered by timestamp

//Store the last two URLs for each user
var lastTwoUrls = new Dictionary<int,Tuple<string,string,DateTime>>();
//Count the number of occurences of each triplet of URLs
var counters = new Dictionary<Tuple<string,string,string>,int>();

foreach (var entry in entries)
{
Tuple<string,string,DateTime> lastTwo;
if (!lastTwoUrls.TryGetValue(entry.UserId, out lastTwo))
{
//No previous URLs
lastTwoUrls[entry.UserId] = Tuple.Create((string) null, entry.Url, entry.Timestamp);
}
//(comparison with null => false)
else if (entry.Timestamp - lastTwo.Item3> maxSeparation) {
//Treat a longer separation than maxSeparation as two different sessions.
lastTwoUrls[entry.UserId] = Tuple.Create((string) null, entry.Url, entry.Timestamp);
}
else
{
//One or two previous URLs
if (lastTwo.Item1!= null)
{
//Two previous URLs; Three with this one.

//Sort the three URLs, so that their internal order won't matter
var urls = new List<string> { lastTwo.Item1, lastTwo.Item2, entry.Url };
urls.Sort();
var key = Tuple.Create(urls[0], urls[1], urls[2]);

//Increment count
int count;
counters[key] = count + 1;
}

//Shift in the new value, discarding the oldest one.
lastTwoUrls[entry.UserId] = Tuple.Create(lastTwo.Item2, entry.Url, entry.Timestamp);
}
}

Tuple<string,string,string> maxKey = null;
int maxCount = 0;

//Find the key with the maximum count
foreach (var pair in counters)
{
if (maxKey == null || pair.Value> maxCount)
{
maxKey = pair.Key;
maxCount = pair.Value;
}
}

return maxKey;
}

``````

For，method，users，users，method，method，method，method，method，method，method，unique，unique，unique，unique，unique，unique，store，store，store，store，store，store，store等。 3 + ) ) ) ) 。

Edit: 按请求之间的持续时间来推断会话。 如果它们的差异超过 `maxSeparation`，则新请求将被视为该用户的第一个。