content of the html file is as follows:
被尊称为“高贵的”
method1:
Import["C:\\Users\\HyperGroups\\Desktop\\test.html", "XML"]
XMLParser
XMLGet::nfprserr: Entity 'ldquo' was not found at Line: 2 Character: 12 in C:\Users\HyperGroups\Desktop\test.html. >>
XMLParser
XMLGet::nfprserr: Entity 'rdquo' was not found at Line: 2 Character: 22 in C:\Users\HyperGroups\Desktop\test.html. >>
(*
XMLObject[Document][{},XMLElement[p,{},{被尊称为\[EntityStart]ldquo\[EntityEnd]高贵的\[EntityStar
t]rdquo\[EntityEnd]}],{}]
*)
method2:
Import["C:\\Users\\HyperGroups\\Desktop\\test.html", "XMLObject"]
(*
XMLObject[Document][{XMLObject[Declaration][Version->1.0,Standalone->yes]},XMLElement[html
,{version->-//W3C//DTD HTML 4.01 Transitional//EN,{http://www.w3.org/2000/xmlns/,xmlns}->h
ttp://www.w3.org/1999/xhtml},{XMLElement[body,{},{XMLElement[p,{},{被尊称为"高贵的"}]}]}],{}]
*)
Why method1 generates those information, and how to avoid that and get the result just like method2 ?
Comments
Post a Comment