青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

qiezi的學習園地

AS/C/C++/D/Java/JS/Python/Ruby

  C++博客 :: 首頁 :: 新隨筆 ::  ::  :: 管理 ::
周末抽空做了點小測試,根據http://blog.vckbase.com/jzhang/archive/2006/03/28/18807.html中m網友修改的算法,python版本中讀取所有行以后就做一個排序,再去除重復項。這個算法在我的機器上執行時間是1735ms左右,屬于python版本中最快的一個。

D版本暫還沒想到有更優化的做法,D在處理以char[]作key的關聯數組時,判斷方法是先判斷hash,如果hash相等,則繼續做字符串判斷。它執行時間是1120ms左右。

以D版本為基礎,自己寫了一個C++的Email類:

class?Email
{
private:
????????
string?mail;
????????size_t?hash;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?mail(mail_),?hash(my_hash(mail_))
????????{
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
if?(lhs.hash?==?rhs.hash)
????????????????
return?lhs.mail?<?rhs.mail;
????????
return?lhs.hash?<?rhs.hash;
}

把它插入set并判斷是否有重復。

這個程序由于string的大量拷貝,以及大量內存分配,執行時間相當長,在我的機器上是5s左右。D和python版本由于對象拷貝成本較低,加上都有內存分配策略,自然有一些優勢。

退而求其次,既然hash沖突的幾率較低,試試只保存hash:

class?Email
{
private:
????????size_t?hash;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?hash(my_hash(mail_))
????????{
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
return?lhs.hash?<?rhs.hash;
}

這次測試就比較快了,耗時僅1020ms左右,比D版本還要快,當然它不是完善的版本。

考慮到構造成本,于是改為只用一個set<int>來保存hash值再測試,這次耗時是930ms。

實際上可以做一個改進的C++版本,一次性讀入文件的全部內容到一個大緩沖區,把所有的\n字符修改為\0,用一個動態數組保存緩沖區的所有字符串指針,hash值也須計算并保存到數組。再用D的索引方式,先hash比較,再字符串比較,效率應該也不低。

實現了一個:
#include?<iostream>
#include?
<string>
#include?
<set>
#include?
<fstream>
#include?
<iterator>
#include?
<sys/time.h>
using?namespace?std;


size_t?my_hash?(
const?char*?str)
{
????????size_t?ret?
=?0;
????????
while?(*str)
????????????????ret?
=?11?*?ret?+?*str++;
????????
return?ret;
}

class?Email
{
private:
????????size_t?hash;
????????
const?char*?mail;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?hash(my_hash(mail_)),?mail(mail_)
????????{
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
if?(lhs.hash?==?rhs.hash)
????????????????
return?strcmp(lhs.mail,?rhs.mail)?<?0;
????????
return?lhs.hash?<?rhs.hash;
}

int?main(int?argc,?char**?argv)
{
????????
if?(argc?<?3)
????????{
????????????????cout?
<<?"Wrong?arguments"?<<?endl;
????????????????
return?1;
????????}

????????FILE
*?fin?=?fopen(argv[1],?"r");
????????
if?(!fin)
????????{
????????????????cout?
<<?"Invalid?input?file"?<<?endl;
????????????????
return?2;
????????}
????????FILE
*?fout?=?fopen(argv[2],?"w");
????????
if?(!fout)
????????{
????????????????fclose(fin);
????????????????cout?
<<?"Invalid?output?file"?<<?endl;
????????????????
return?3;
????????}

????????timeval?start,?end;

????????
const?int?BUF_SIZE?=?20?*?1024?*?1024;
????????
char*?buffer?=?new?char[BUF_SIZE];
????????memset(buffer,?
0,?BUF_SIZE);

????????gettimeofday(
&start,?0);
????????
set<Email>?emails;

????????size_t?read?
=?fread?(buffer,?BUF_SIZE,?1,?fin);
????????
char*?begin?=?buffer;
????????
char*?current?=?buffer;

????????
while?(*current?!=?'\0')
????????{
????????????????
if?(*current?==?'\n')
????????????????{
????????????????????????
*current?=?'\0';
????????????????????????
if?(emails.insert(begin).second){
????????????????????????????????fputs(begin,?fout);
????????????????????????????????fwrite(
"\n",?1,?1,?fout);
????????????????????????}
????????????????????????begin?
=?current?+?1;
????????????????}
????????????????
++?current;
????????}

??????? fclose(fout);
??????? fclose(fin);

????????gettimeofday(&end,?0);

????????printf(
"Time?used:?%d?ms\n",?((end.tv_sec?-?start.tv_sec)?*?1000?+?(end.tv_usec?-?start.tv_usec)?/?1000));

??????? delete[] buffer;
????????
return?0;
}

memset不是必須的,不過我不知道如何獲取讀入的大小。fread讀取后,如果讀到EOF,則返回值為0。所以我這里用memset先初始化內存,但不把它計入耗時。new操作也沒計入,因為其它語言比如python、D在啟動時都由運行時做了類似工作。

這個程序在我的機器上耗時為1350ms左右。我想可慢在set上,對象拷貝?內存分配?

做了幾個優化版本,沒多大提高。


重新測試了下:
A、python(m網友版本):
lijie t # python test.py
1689.0411377
lijie t # python test.py
1711.40599251
lijie t # python test.py
1699.63312149
lijie t # python test.py
1712.00013161
lijie t # python test.py
1713.8838768

B、D版本:
lijie t # ./testd email.txt email-new.txt
1091
lijie t # ./testd email.txt email-new.txt
1070
lijie t # ./testd email.txt email-new.txt
1062
lijie t # ./testd email.txt email-new.txt
1062
lijie t # ./testd email.txt email-new.txt
1096

C、C++只比較hash,set<Email>版本:
lijie t # ./test3 email.txt email-new.txt
Time used: 981 ms
lijie t # ./test3 email.txt email-new.txt
Time used: 1000 ms
lijie t # ./test3 email.txt email-new.txt
Time used: 980 ms
lijie t # ./test3 email.txt email-new.txt
Time used: 986 ms
lijie t # ./test3 email.txt email-new.txt
Time used: 987 ms

D、C++只比較hash,set<int>版本:
lijie t # ./test4 email.txt email-new.txt
Time used: 951 ms
lijie t # ./test4 email.txt email-new.txt
Time used: 953 ms
lijie t # ./test4 email.txt email-new.txt
Time used: 947 ms
lijie t # ./test4 email.txt email-new.txt
Time used: 950 ms
lijie t # ./test4 email.txt email-new.txt
Time used: 962 ms

E、C++大緩沖區,比較hash和字符串,set<Email>版本:
lijie t # ./test5 email.txt email-new.txt
Time used: 1375 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1359 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1369 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1378 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1396 ms

F、C++大緩沖區,比較字符串版本:
lijie t # ./test6 email.txt email-new.txt
Time used: 1168 ms
lijie t # ./test6 email.txt email-new.txt
Time used: 1169 ms
lijie t # ./test6 email.txt email-new.txt
Time used: 1171 ms
lijie t # ./test6 email.txt email-new.txt
Time used: 1179 ms
lijie t # ./test6 email.txt email-new.txt
Time used: 1169 ms

從C、E和F來看,對象拷貝成本是比較高的,E版本僅僅比C版本多了個const char*成員變量,hash值比較散,很少會真的執行到strcmp。保持E版本對象結構不變,把operator <里面的實現改為C版本,測試結果如下:
lijie t # ./test5 email.txt email-new.txt
Time used: 1355 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1360 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1348 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1353 ms
lijie t # ./test5 email.txt email-new.txt
Time used: 1379 ms

效率只提高了一點點,這個版本僅僅比C版本多了個成員變量拷貝,竟然慢了這么多。說明Email對象的2個成員變量拷貝成本的確很高。

F版本相比之下反而效率很不錯,主要原因是email數據不夠復雜,僅通過前幾位就可以比較出結果。如果每行數據比較長,而且很多行要到后幾個字符才能比較出來,肯定就不那么快了。

hash值的計算雖然執行了一系列乘法,不過還是相當迅速。

D語言版本執行了hash值和字符串比較,是比較完善的,效率很不錯。C++相應版本看來要提高set的效率才能達到。


jzhang的第一個python版本在我的機器上執行如下:
lijie t # python test2.py
3122.9569912 ms
lijie t # python test2.py
3209.42997932 ms
lijie t # python test2.py
3141.47305489 ms
lijie t # python test2.py
3129.57286835 ms
lijie t # python test2.py
3196.03514671 ms

我做了點修改,執行速度提高了一些:
#remove?duplicated?email?address?from?file
import?datetime
from?time?import?time
if?__name__?==?"__main__":
?start?
=?time()
?hashtable?
=?{}
?f?
=?file("email.txt","r")
?f2?
=?file("email_new.txt","w")
?
for?line?in?f.xreadlines():
??
if?not?hashtable.has_key(line):
???hashtable[line]?
=?1
???f2.write(line)
?f.close()
?f2.close()
?print?(time()?
-?start)?*?1000,?"ms"
在我的機器上執行結果如下:
lijie t # python test1.py
2239.22801018 ms
lijie t # python test1.py
2301.00703239 ms
lijie t # python test1.py
2282.06086159 ms
lijie t # python test1.py
2296.57006264 ms
lijie t # python test1.py
2281.25810623 ms

不過還是沒有m網友的效率高。


在F版本的基礎上,借鑒m網友的做法,實現一個G版本:

G、排序并去除重復元素,比較hash和字符串版本:
#include?<iostream>
#include?
<string>
#include?
<fstream>
#include?
<iterator>
#include?
<sys/time.h>
#include?
<vector>
using?namespace?std;


size_t?my_hash?(
const?char*?str)
{
????????size_t?ret?
=?0;
????????
while?(*str)
????????????????ret?
=?11?*?ret?+?*str++;
????????
return?ret;
}

class?Email
{
private:
????????size_t?hash;
????????
const?char*?mail;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?hash(my_hash(mail_)),?mail(mail_)
????????{
????????}

????????
bool?operator?==?(const?Email&?rhs)
????????{
????????????????
if?(hash?==?rhs.hash)
????????????????????????
return?strcmp(mail,?rhs.mail)?==?0;
????????????????
return?false;
????????}

????????
const?char*?getEmail()const
????????{
????????????????
return?mail;
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
if?(lhs.hash?==?rhs.hash)
????????????????
return?strcmp(lhs.mail,?rhs.mail)?<?0;
????????
return?lhs.hash?<?rhs.hash;
}

int?main(int?argc,?char**?argv)
{
????????
if?(argc?<?3)
????????{
????????????????cout?
<<?"Wrong?arguments"?<<?endl;
????????????????
return?1;
????????}

????????FILE
*?fin?=?fopen(argv[1],?"r");
????????
if?(!fin)
????????{
????????????????cout?
<<?"Invalid?input?file"?<<?endl;
????????????????
return?2;
????????}
????????FILE
*?fout?=?fopen(argv[2],?"w");
????????
if?(!fout)
????????{
????????????????fclose(fin);
????????????????cout?
<<?"Invalid?output?file"?<<?endl;
????????????????
return?3;
????????}

????????timeval?start,?end;

????????
const?int?BUF_SIZE?=?20?*?1024?*?1024;
????????
char*?buffer?=?new?char[BUF_SIZE];
????????memset(buffer,?
0,?BUF_SIZE);

????????gettimeofday(
&start,?0);
????????vector
<Email>?emails;

????????size_t?read?
=?fread?(buffer,?BUF_SIZE,?1,?fin);
????????
char*?begin?=?buffer;
????????
char*?current?=?buffer;

????????
while?(*current?!=?'\0')
????????{
????????????????
if?(*current?==?'\n')
????????????????{
????????????????????????
*current?=?'\0';
????????????????????????emails.push_back(begin);
????????????????????????begin?
=?current?+?1;
????????????????}
????????????????
++?current;
????????}
????????fclose(fin);
????????sort(emails.begin(),?emails.end());
????????emails.erase?(unique(?emails.begin(),?emails.end()?),?emails.end());

????????
for?(vector<Email>::const_iterator?iter?=?emails.begin();
?????????????iter?
!=?emails.end();
?????????????iter?
++)
????????{
????????????????fputs((
*iter).getEmail(),?fout);
????????????????fwrite(
"\n",?1,?1,?fout);
????????}

????????fclose(fout);

????????gettimeofday(
&end,?0);

????????printf(
"Time?used:?%d?ms\n",?((end.tv_sec?-?start.tv_sec)?*?1000?+?(end.tv_usec?-?start.tv_usec)?/?1000));

????????delete[]?buffer;

????????
return?0;
}

在我的機器上執行如下:
lijie t # ./test7 email.txt email-new.txt
Time used: 676 ms
lijie t # ./test7 email.txt email-new.txt
Time used: 675 ms
lijie t # ./test7 email.txt email-new.txt
Time used: 671 ms
lijie t # ./test7 email.txt email-new.txt
Time used: 669 ms
lijie t # ./test7 email.txt email-new.txt
Time used: 673 ms

比F版本快了2倍,也快過了其它所有版本。不過由于數據是vector保存的,在數據大量重復的情況下,性能可能會有較大的降低。

把operator<和operator==的實現改為strcmp比較,執行結果如下:
lijie t # ./test8 email.txt email-new.txt
Time used: 1275 ms
lijie t # ./test8 email.txt email-new.txt
Time used: 1267 ms
lijie t # ./test8 email.txt email-new.txt
Time used: 1297 ms
lijie t # ./test8 email.txt email-new.txt
Time used: 1296 ms
lijie t # ./test8 email.txt email-new.txt
Time used: 1271 ms


修改了下,增加了計時,修正了fread使用錯誤。

#include?<iostream>
#include?
<string>
#include?
<fstream>
#include?
<iterator>
#include?
<vector>
using?namespace?std;

#ifdef?_WIN32
#?include?
<windows.h>
#else?//?_WIN32
#?include?
<sys/time.h>
#endif?//?_WIN32


size_t?my_hash?(
const?char*?str)
{
????????size_t?ret?
=?0;
????????
while?(*str)
????????????????ret?
=?11?*?ret?+?*str++;
????????
return?ret;
}

class?Email
{
private:
????????size_t?hash;
????????
const?char*?mail;
????????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????????Email?(
const?char*?mail_)
????????????????:?hash(my_hash(mail_)),?mail(mail_)
????????{
????????}

????????
bool?operator?==?(const?Email&?rhs)
????????{
????????????????
if?(hash?==?rhs.hash)
????????????????????????
return?strcmp(mail,?rhs.mail)?==?0;
????????????????
return?false;
????????}

????????
const?char*?getEmail()const
????????{
????????????????
return?mail;
????????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????????
if?(lhs.hash?==?rhs.hash)
????????????????
return?strcmp(lhs.mail,?rhs.mail)?<?0;
????????
return?lhs.hash?<?rhs.hash;
}

#ifndef?_WIN32
class?Timer
{
????????timeval?begin,?end;
public:
????????
void?start?()?{gettimeofday(&begin,?0);}
????????
void?stop?()?{gettimeofday(&end,?0);}
????????size_t?milliseconds?()?
const?{
????????????????
return?(end.tv_sec?-?begin.tv_sec)?*?1000?+?(end.tv_usec?-?begin.tv_usec)?/?1000;
????????}
};
#else?//?_WIN32
class?Timer
{
????????DWORD?begin,?end;
public:
????????
void?start?()?{begin?=?GetTickCount();}
????????
void?stop?()?{end?=?GetTickCount();}
????????size_t?milliseconds?()?
const?{
????????????????
return?end?-?begin;
????????}
};
#endif?//?_WIN32


int?main(int?argc,?char**?argv)
{
????????
if?(argc?<?3)
????????{
????????????????cout?
<<?"Wrong?arguments"?<<?endl;
????????????????
return?1;
????????}

????????
for?(int?i=0;?i<10;?++i)?{

???????FILE
*?fin?=?fopen(argv[1],?"r");
????????
if?(!fin)
????????{
????????????????cout?
<<?"Invalid?input?file"?<<?endl;
????????????????
return?2;
????????}
????????FILE
*?fout?=?fopen(argv[2],?"w");
????????
if?(!fout)
????????{
????????????????fclose(fin);
????????????????cout?
<<?"Invalid?output?file"?<<?endl;
????????????????
return?3;
????????}

????????Timer?total,?part;
????????total.start();
????????part.start();

????????
const?int?BUF_SIZE?=?20?*?1024?*?1024;

????????
char*?buffer?=?new?char[BUF_SIZE];
????????
if?(!buffer){
????????????????
//?cout?<<?"Alloc?buffer?failed"?<<?endl;
????????????????return?4;
????????}

????????part.stop();
????????
//?cout?<<?"Alloc?buffer,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????size_t?read?
=?fread?(buffer,?1,?BUF_SIZE,?fin);
????????fclose(fin);
????????buffer[read]?
=?'\0';
????????part.stop();
????????
//?cout?<<?"Read?file,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????vector
<Email>?emails;

????????
char*?begin?=?buffer;
????????
char*?current?=?buffer;

????????
while?(*current?!=?'\0')
????????{
????????????????
if?(*current?==?'\n')
????????????????{
????????????????????????
*current?=?'\0';
????????????????????????emails.push_back(begin);
????????????????????????begin?
=?current?+?1;
????????????????}
????????????????
++?current;
????????}

????????part.stop();
????????
//?cout?<<?"Put?emails?into?vector,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????sort(emails.begin(),?emails.end());
????????part.stop();
????????
//?cout?<<?"Sort?emails,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????emails.erase?(unique(?emails.begin(),?emails.end()?),?emails.end());
????????part.stop();
????????
//?cout?<<?"Unique?emails,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;
????????part.start();

????????
for?(vector<Email>::const_iterator?iter?=?emails.begin();
?????????????iter?
!=?emails.end();
?????????????iter?
++)
????????{
????????????????fputs((
*iter).getEmail(),?fout);
????????????????fwrite(
"\n",?1,?1,?fout);
????????}

????????fclose(fout);

????????part.stop();
????????
//?cout?<<?"Write?emails?into?new?file,?"?<<?part.milliseconds()?<<?"?ms?used."?<<?endl;

????????total.stop();

????????cout?
<<?"Total?used:?"?<<?total.milliseconds()?<<?"?ms."?<<?endl;

????????delete[]?buffer;
????????}

????????
return?0;
}

使用“-O3 -fomit-frame-pointer -funroll-loops -mtune=pentium4”選項編譯,耗時從680ms減少到620ms


優化文件讀寫版:
#include?<iostream>
#include?
<string>
#include?
<fstream>
#include?
<iterator>
#include?
<vector>
using?namespace?std;

//?config
#define?USE_CACHE
//#define?PROFILE
//?end?config


#ifdef?_WIN32
#?include?
<windows.h>
#else?//?_WIN32
#?include?
<sys/time.h>
#endif?//?_WIN32

#ifndef?_WIN32
class?Timer
{
????timeval?begin,?end;
public:
????
void?start?()?{gettimeofday(&begin,?0);}
????
void?stop?()?{gettimeofday(&end,?0);}
????size_t?milliseconds?()?
const?{
????????
return?(end.tv_sec?-?begin.tv_sec)?*?1000?+?(end.tv_usec?-?begin.tv_usec)?/?1000;
????}
};
#else?//?_WIN32
class?Timer
{
????DWORD?begin,?end;
public:
????
void?start?()?{begin?=?GetTickCount();}
????
void?stop?()?{end?=?GetTickCount();}
????size_t?milliseconds?()?
const?{
????????
return?end?-?begin;
????}
};
#endif?//?_WIN32


#ifdef?PROFILE
#?define?PROFILE_OUTPUT(timer,info)?\
????timer.stop();?\
????cout?
<<?info?<<?":?"?<<?timer.milliseconds()?<<?"?ms?used."?<<?endl;?\
????timer.start()
#else?//?PROFILE
#?define?PROFILE_OUTPUT(timer,info)
#endif?//?PROFILE



size_t?my_hash?(
const?char*?str)
{
????size_t?ret?
=?0;
????
while?(*str)
????????????ret?
=?11?*?ret?+?*str++;
????
return?ret;
}

class?Email
{
private:
????size_t?hash;
????
const?char*?mail;
????friend?
bool?operator?<?(const?Email&?lhs,?const?Email&?rhs);
public:
????Email?(
const?char*?mail_)
????????:?hash(my_hash(mail_)),?mail(mail_)
????{
????}

????
bool?operator?==?(const?Email&?rhs)
????{
????????
if?(hash?==?rhs.hash)
????????????
return?strcmp(mail,?rhs.mail)?==?0;
????????
return?false;
????}

????
const?char*?getEmail()?const
????{
????????
return?mail;
????}

????size_t?getLength()?
const
????{
????????
return?strlen(mail);
????}
};

bool?operator?<?(const?Email&?lhs,?const?Email&?rhs)
{
????
if?(lhs.hash?==?rhs.hash)
????????
return?strcmp(lhs.mail,?rhs.mail)?<?0;
????
return?lhs.hash?<?rhs.hash;
}



#ifdef?USE_CACHE
class?OfstreamBuffer
{
????ofstream
&?ofs;
????size_t?buf_size;
????size_t?offset;
????
char*?buffer;

public:
????OfstreamBuffer?(ofstream
&?ofs_,?size_t?buf_size_)
????????:?ofs(ofs_),?buf_size(buf_size_),?offset(
0)
????{
????????buffer?
=?new?char[buf_size_];
????}

????
~OfstreamBuffer?()
????{
????????flush();
????????delete[]?buffer;
????}

????
void?write?(const?char*?ptr,?size_t?size)
????{
????????
while?(size?>?0)
????????{
????????????size_t?copy_size?
=?buf_size?-?offset;
????????????
if?(copy_size?>?size)
????????????????copy_size?
=?size;
????????????memcpy?(buffer?
+?offset,?ptr,?copy_size);
????????????offset?
+=?copy_size;
????????????ptr?
+=?copy_size;
????????????size?
-=?copy_size;

????????????
if?(offset?==?buf_size)
????????????????flush?();
????????}
????}

????
void?flush?()
????{
????????
if?(offset?>?0)
????????{
????????????ofs.write(buffer,?offset);
????????????offset?
=?0;
????????}
????????ofs.flush();
????}
};
#else//?USE_CACHE
class?OfstreamBuffer
{
????ofstream
&?ofs;

public:
????OfstreamBuffer?(ofstream
&?ofs_,?size_t?buf_size_)
????????:?ofs(ofs_)
????{
????}

????
void?write?(const?char*?ptr,?size_t?size)
????{
????????ofs.write(ptr,?size);
????}

????
void?flush?()
????{
????????ofs.flush();
????}
};
#endif?//?USE_CACHE




int?main(int?argc,?char**?argv)
{
????
if?(argc?<?3)
????{
????????cout?
<<?"Wrong?arguments"?<<?endl;
????????
return?1;
????}

????
for?(int?i=0;?i<1;?++i)?{

????????ifstream?ifs(argv[
1],?ios::binary);
????????
if?(!ifs)
????????{
????????????cout?
<<?"Invalid?input?file"?<<?endl;
????????????
return?2;
????????}

????????ofstream?ofs(argv[
2],?ios::binary);
????????
if?(!ofs)
????????{
????????????cout?
<<?"Invalid?output?file"?<<?endl;
????????????
return?3;
????????}

????????Timer?total,?part;
????????total.start();
????????part.start();

????????ifs.seekg(
0,?ios_base::end);
????????size_t?file_size?
=?(size_t)ifs.tellg()?+?1;
????????cout?
<<?"file?size:?"?<<?file_size?<<?endl;
????????ifs.seekg(
0,?ios_base::beg);

????????
char*?buffer?=?new?char[file_size];
????????
if?(!buffer){
????????????cout?
<<?"Alloc?buffer?failed"?<<?endl;
????????????
return?4;
????????}

????????PROFILE_OUTPUT(part,?
"Alloc?buffer");

????????ifs.read(buffer,?file_size);
????????buffer[file_size]?
=?'\0';

????????PROFILE_OUTPUT(part,?
"Read?file");

????????vector
<Email>?emails;
????????emails.reserve(
1000000);

????????
char*?begin?=?buffer;
????????
char*?current?=?buffer;

????????
while?(*current?!=?'\0')
????????{
????????????
if?(*current?==?'\n')
????????????{
????????????????
*current?=?'\0';
????????????????emails.push_back(begin);
????????????????begin?
=?current?+?1;
????????????}
????????????
++?current;
????????}

????????PROFILE_OUTPUT(part,?
"Put?emails?into?vector");

????????sort(emails.begin(),?emails.end());

????????PROFILE_OUTPUT(part,?
"Sort?emails");

????????emails.erase?(unique(?emails.begin(),?emails.end()?),?emails.end());

????????PROFILE_OUTPUT(part,?
"Unique?emails");

????????OfstreamBuffer?ofsBuffer?(ofs,?
4?*?1024);

????????
for?(vector<Email>::const_iterator?iter?=?emails.begin();
?????????????iter?
!=?emails.end();
?????????????iter?
++?)
????????{
????????????ofsBuffer.write(iter
->getEmail(),?iter->getLength());
????????????ofsBuffer.write(
"\n",?1);
????????}
????????ofsBuffer.flush();

????????PROFILE_OUTPUT(part,?
"Write?emails?into?new?file");

????????total.stop();
????????cout?
<<?"Total?used:?"?<<?total.milliseconds()?<<?"?ms."?<<?endl;


????????delete[]?buffer;
????}

????
return?0;
}
這個版本在windows上用dev-cpp所帶的gcc 3.4.2來編譯,最好成績是609ms。在cygwin里可以達到650ms。
posted on 2006-04-03 11:00 qiezi 閱讀(2808) 評論(27)  編輯 收藏 引用 所屬分類: C++雜談
青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            久久久久久久久久码影片| 欧美顶级艳妇交换群宴| 美女成人午夜| 欧美一区二区三区男人的天堂 | 欧美激情亚洲国产| 卡通动漫国产精品| 毛片av中文字幕一区二区| 久久午夜色播影院免费高清| 久久国产99| 久久综合免费视频影院| 欧美jizz19性欧美| 91久久在线观看| 欧美国产先锋| 亚洲乱码国产乱码精品精98午夜| 亚洲第一精品电影| 亚洲美女黄色片| 亚洲一区二区三区四区在线观看 | 999亚洲国产精| 一本久久综合亚洲鲁鲁| 亚洲一区日韩在线| 久久久一区二区| 亚洲福利视频专区| 亚洲网站在线| 久久综合色综合88| 欧美日本精品| 国产亚洲欧洲| 亚洲免费久久| 欧美一区影院| 亚洲国产三级| 亚洲欧美综合另类中字| 免费一级欧美在线大片| 国产精品日韩久久久| 亚洲福利免费| 亚洲砖区区免费| 欧美电影专区| 亚洲欧美bt| 免费的成人av| 国产拍揄自揄精品视频麻豆| 亚洲国产毛片完整版| 亚洲女女女同性video| 欧美激情精品久久久久久免费印度| 亚洲精品中文字幕女同| 久久久久国产精品人| 欧美调教视频| 亚洲区一区二区三区| 欧美在线免费播放| 亚洲九九精品| 欧美激情第五页| 狠狠色综合网| 久久大综合网| 亚洲免费精彩视频| 噜噜噜噜噜久久久久久91 | aⅴ色国产欧美| 久久裸体艺术| 亚洲调教视频在线观看| 免费久久99精品国产| 国产专区精品视频| 欧美一区二区三区的| aaa亚洲精品一二三区| 欧美激情在线观看| 亚洲国产精品99久久久久久久久| 久久丁香综合五月国产三级网站| 亚洲一区二区动漫| 国产精品乱码一区二三区小蝌蚪| 亚洲人精品午夜| 欧美岛国激情| 美女视频黄免费的久久| 曰韩精品一区二区| 久久综合色8888| 久久精品国产综合精品| 黑人操亚洲美女惩罚| 欧美综合国产| 久久精品一区二区国产| 国产最新精品精品你懂的| 久久久夜色精品亚洲| 久久久精品性| 最新中文字幕一区二区三区| 欧美激情在线狂野欧美精品| 蜜臀av一级做a爰片久久| 91久久久久久久久| 亚洲日韩成人| 国产精品国产三级国产普通话蜜臀| 一区二区三区四区蜜桃| 正在播放欧美视频| 国产丝袜一区二区三区| 久久久蜜桃精品| 麻豆乱码国产一区二区三区| 91久久中文| 日韩一区二区高清| 国产精品自在线| 久久午夜精品| 欧美黄色免费网站| 亚洲淫片在线视频| 欧美一级久久| 91久久中文字幕| 一区二区三区免费观看| 国产婷婷97碰碰久久人人蜜臀| 麻豆国产精品一区二区三区 | 欧美色视频日本高清在线观看| 亚洲深夜福利在线| 香蕉久久国产| 亚洲美女免费视频| 午夜精品久久久久久久蜜桃app | 一区二区视频在线观看| 日韩亚洲精品视频| 伊人春色精品| 亚洲一区二区三区在线| 最新中文字幕一区二区三区| 亚洲欧美日韩国产中文| 99re6热在线精品视频播放速度| 亚洲欧美国产不卡| 欧美一区二区在线视频| 国产精品久久久一区二区三区| 久久精品国产亚洲a| 欧美激情亚洲一区| 老司机成人网| 国产精品日韩久久久久| 91久久精品国产| 伊人狠狠色j香婷婷综合| 亚洲色图在线视频| 99re热这里只有精品免费视频| 性欧美大战久久久久久久久| 日韩视频一区二区在线观看 | 亚洲欧美中文在线视频| 久久综合国产精品| 欧美在线高清| 亚洲欧美日韩国产成人| 国产亚洲亚洲| 99riav久久精品riav| 欧美色道久久88综合亚洲精品| 你懂的亚洲视频| 国产精品成人一区二区三区夜夜夜| 久久成人国产精品| 欧美激情视频一区二区三区在线播放| 国产有码一区二区| 日韩天堂在线观看| 在线精品一区二区| 久久视频这里只有精品| 午夜精品久久久久久99热软件| 老司机67194精品线观看| 欧美一区二区视频在线观看| 欧美高清视频一二三区| 久久久一二三| 亚洲高清精品中出| 欧美一激情一区二区三区| 亚洲小说区图片区| 男男成人高潮片免费网站| 99综合视频| 欧美色网一区二区| 亚洲精品久久久久久一区二区 | 欧美激情亚洲视频| 狠狠色综合色区| 亚洲丝袜av一区| 黑人一区二区三区四区五区| 久久久久五月天| 久久久久国内| 国产精品久久77777| 亚洲国产激情| 国产一区91精品张津瑜| 久久久欧美一区二区| 久久久久国产一区二区三区四区| 国产精品va在线播放| 亚洲伦理在线观看| 亚洲伦理在线| 一本色道久久综合一区| 国产一区二区三区免费观看 | 中国成人亚色综合网站| 你懂的成人av| 亚洲第一区中文99精品| 欧美另类人妖| 欧美黑人国产人伦爽爽爽| 一区精品在线播放| 久久精品视频在线观看| 性色一区二区三区| 狠狠色综合色区| 久久看片网站| 亚洲第一天堂无码专区| 亚洲级视频在线观看免费1级| 久久婷婷综合激情| 亚洲国产成人在线| 亚洲永久精品大片| 国产欧美三级| 久久久亚洲综合| 亚洲国产天堂久久国产91| 一本色道久久88综合日韩精品| 欧美日韩国产二区| 欧美国产乱视频| 亚洲欧美在线看| 国产日韩在线看| 麻豆精品网站| 欧美一区精品| 国产精品久久久久久亚洲调教 | 国产精品高清网站| 午夜精品久久| 亚洲一区二区免费看| 1769国内精品视频在线播放| 欧美顶级少妇做爰| 亚洲一级二级| 欧美高清你懂得| 久久久91精品|