自己做采集程序自己做采集程序
现在网上的采集程序很多,但是有时候你发现一个好的网站,想自己做个采集工具采集一些信息,就需要自己去写程序了,其
实这样的采集程序并不难写,主要是去分析源网站的网页结构。
首先去下载个XMLHTTP的类文件:
<%
Class xhttp
private cset,sUrl,sError
Private Sub Class_Initialize()
‘cset=”UTF-8″
cset=”GB2312″
sError=””
end sub
Private Sub Class_Terminate()
End Sub
Public Property LET URL(theurl)
sUrl=theurl
end property
public property GET BasePath()
BasePath=mid(sUrl,1,InStrRev(sUrl,”/”)-1)
end property
public property GET FileName()
FileName=mid(sUrl,InStrRev(sUrl,”/”)+1)
end property
public property GET Html()
Html=BytesToBstr(getBody(sUrl))
end property
public property GET xhttpError()
xhttpError=sError
end property
private Function BytesToBstr(body)
on error resume next
‘Cset:GB2312 UTF-8
dim objstream
set objstream = Server.CreateObject(“adodb.stream”)
with objstream
.Type = 1 ‘
.Mode = 3 ‘
.Open
.Write body ‘
.Position = 0 ‘
.Type = 2 ‘
.Charset = Cset ‘
BytesToBstr = .ReadText ‘
.Close
end with
set objstream = nothing
End Function
private function getBody(surl)
on error resume next
dim xmlHttp
‘Set xmlHttp=server.createobject(“Msxml2.XMLHTTP.4.0”)
‘set xmlHttp=server.createobject(“Microsoft.XMLHTTP”)
set xmlHttp=server.createobject(“MSXML2.ServerXMLHTTP”)