Skip to main content

import - How to manipulate web pages on Mathematica?


I wanted to input some data on this page through Mathematica. Then I would press the "buscar" button also through Mathematica coding and get the results on Mathematica. The most similar post I've found was this one.


I'm trying to import through the Import[] function:



Import["http://www.fundamentus.com.br/buscaavancada.php", "Data"]

When I do that, it shows this:


{{{{"Página inicial", "Investimento consciente", 
"Entre em contato"}, {"Detalhes", {"Balanço patrimonial",
"Demonstrativos de resultados",
"Indicadores fundamentalistas"}, {"Balanços em Excel",
"Proventos"},
"Histórico de cotações"}}, {{"Preencha apenas os campos que serão \
usados na pesquisa", "Use . (ponto) como separador decimal",

"Utilize o formato decimal para representar porcentagens. Ex: \
Para 15% digite 0.15"}, {{"? P/L (Preço sobre lucro)",
"min max"}, {"? P/VP (Preço sobre Valor Patrimonial)",
"min max"}, {"? PSR (Preço sobre Receita Líquida)",
"min max"} [...]

And from here I'm stuck. Any tips?



Answer



The first thing we need to do is to determine how the initial page assembles the parameters and transmits the request to the server. One way to do this would be to open the initial page using the developer tools in the web browser. But since this is a Mathematica forum, let's try to use the tools it makes available to us.


We could load the page text and then try to extract the information we need using string manipulation functions. However, this can get tricky as we must account for line breaks in inconvenient locations, decode HTML entities, and so on. Instead, we will examine the page's Document Object Model (DOM). In Mathematica, the DOM is accessed by importing the page using "XMLObject" format:



$initialUrl = "http://www.fundamentus.com.br/buscaavancada.php";
$dom = Import[$initialUrl, "XMLObject"];

Fewer and fewer pages these days are using simple HTML forms to send requests to the server -- let's see if this page contains any FORM elements:


$forms = Cases[$dom, XMLElement["form", ___], Infinity];
Length @ $forms


2




We are in luck. Let's look at the attributes of the forms:


Cases[$forms, XMLElement[_, attrs_, _] :> attrs]


{
 {enctype->application/x-www-form-urlencoded,method->get,
    class->busca,action->detalhes.php},
 {enctype->application/x-www-form-urlencoded,method->post,
    class->avancada,name->formbusca,action->resultado.php}
}




The first form ("detalhes") uses HTTP GET to get its results. The second ("resultado") uses POST. Resultado sounds promising. Let's extract the input elements for that form:


Cases[$forms[[2]], XMLElement["input", ___], Infinity] // Column


XMLElement[input,{type->text,name->pl_min},{}]
XMLElement[input,{type->text,name->pl_max},{}]
XMLElement[input,{type->text,name->pvp_min},{}]
... lines omitted ...
XMLElement[input,{type->text,name->roe_min},{}]

XMLElement[input,{type->text,name->roe_max},{}]
XMLElement[input,{type->text,name->liq_min},{}]
XMLElement[input,{type->text,name->liq_max},{}]
... lines omitted ...



Yes, this looks like the form that we are interested in. Let's assemble the components of a request:


$resultUrl = StringReplace[$initialUrl, "buscaavancada.php" -> "resultado.php"]


http://www.fundamentus.com.br/resultado.php




$parameters = {
"roe_min" -> "0.1"
, "liq_min" -> "500000"
, "liq_max" -> "800000"
};

... and transmit the request using HTTP POST:


$results = Import[
$resultUrl

, "Data"
, "RequestMethod" -> "POST"
, "RequestParameters" -> $parameters
]


{{{{Página inicial,Investimento consciente,Entre em contato},{Detalhes,{Balanço patrimonial,Demonstrativos de resultados,Indicadores fundamentalistas},{Balanços em Excel,Proventos},Histórico de cotações}},{{Papel,Cotação,P/L,P/VP,PSR,Div.Yield,P/Ativo,P/Cap.Giro,P/EBIT,P/Ativ Circ.Liq,EV/EBIT,Mrg Ebit,Mrg. Líq.,Liq. Corr.,ROIC,ROE,Liq.2meses,Patrim. Líq,Dív.Brut/ Patrim.,Cresc. Rec.5a},{{PRTX3,2,72,-38,34,-255,71,905,562,0,00%,1,977,-5,52,-63,14,-2,42,-72,49,-1.434,22%,-2.361,99%,0,35,-3,96%,666,96%,537.768,00,-10.557.000,00,-59,73,0,00%}
... and more ...



This time we have imported using the "Data" format which let's Mathematica do all the hard work of extracting the HTML TABLE elements out of the web page.



At this point, we have successfully imported all of the data into Mathematica. We can now use the usual Mathematica tools to extract and reformat those parts that interest us. After a bit of experimentation, we can see that the interesting data is the the second element of the first row:


$interesting = $results[[1, 2]];
$interesting // TableForm

raw data table


We can extract the property names:


$propertyNames = $interesting[[1, 2;;]]


{Cotação,P/L,P/VP,PSR,Div.Yield,P/Ativo,P/Cap.Giro,P/EBIT,P/Ativ Circ.Liq,EV/EBIT,Mrg Ebit,Mrg. Líq.,Liq. Corr.,ROIC,ROE,Liq.2meses,Patrim. Líq,Dív.Brut/ Patrim.,Cresc. Rec.5a}




... and the ticker symbols:


$symbols = $interesting[[2, All, 1]]


{PRTX3,BRTO3,FHER3,PINE4}



... and the data itself:


$data = $interesting[[2, All, 2;;]]



{{2,72,-38,34,-255,71,905,562,0,00%,1,977,-5,52,-63,14,-2,42,-72,49,-1.434,22%,-2.361,99%,0,35,-3,96%,666,96%,537.768,00,-10.557.000,00,-59,73,0,00%},{12,15,3,87,0,68,0,771,2,46%,0,256,4,35,2,62,-0,85,2,97,29,44%,19,90%,1,22,12,50%,17,68%,750.626,00,10.699.600.000,00,0,53,-3,53%},{12,25,4,48,1,38,0,135,0,00%,0,201,-12,78,1,64,-2,14,3,72,8,25%,3,02%,0,98,22,47%,30,87%,686.507,00,429.309.000,00,2,52,7,74%},{12,39,7,58,1,21,0,000,7,39%,0,000,0,00,0,00,0,00,0,00,0,00%,0,00%,0,00,0,00%,15,91%,509.960,00,1.015.080.000,00,0,00,-10,02%}}



Since the numbers and percentages were not in a format that Mathematica recognizes, they were imported as strings. We need to convert those strings into Mathematica syntax so that we can parse them:


parse[s_String] /; StringMatchQ[s, __~~"%"] :=
parse[StringDrop[s, -1]] / 100

parse[s_String] /; StringMatchQ[s, (DigitCharacter|"-"|","|".")..] :=
ToExpression[StringReplace[s, {"," -> ".", "." -> ""}]]


parse[s_] := s

$data2 = $data /. s_String :> parse[s]


{{2.72,-38.34,-255.71,905.562,0.00,1.977,-5.52,-63.14,-2.42,-72.49,-1434.22,-2361.99,0.35,-3.96,666.96,537768.00,-10557000.00,-59.73,0.00},{12.15,3.87,0.68,0.771,2.46,0.256,4.35,2.62,-0.85,2.97,29.44,19.90,1.22,12.50,17.68,750626.00,10699600000.00,0.53,-3.53},{12.25,4.48,1.38,0.135,0.00,0.201,-12.78,1.64,-2.14,3.72,8.25,3.02,0.98,22.47,30.87,686507.00,429309000.00,2.52,7.74},{12.39,7.58,1.21,0.000,7.39,0.000,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,15.91,509960.00,1015080000.00,0.00,-10.02}}



Note that percentages were converted into fractions.


We can now display the data or manipulate it as we see fit, for example:


TableForm[Transpose @ $data2, TableHeadings -> {$propertyNames, $symbols}]


formatted data table


Comments

Popular posts from this blog

mathematical optimization - Minimizing using indices, error: Part::pkspec1: The expression cannot be used as a part specification

I want to use Minimize where the variables to minimize are indices pointing into an array. Here a MWE that hopefully shows what my problem is. vars = u@# & /@ Range[3]; cons = Flatten@ { Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; Minimize[{Total@((vec1[[#]] - vec2[[u[#]]])^2 & /@ Range[1, 3]), cons}, vars, Integers] The error I get: Part::pkspec1: The expression u[1] cannot be used as a part specification. >> Answer Ok, it seems that one can get around Mathematica trying to evaluate vec2[[u[1]]] too early by using the function Indexed[vec2,u[1]] . The working MWE would then look like the following: vars = u@# & /@ Range[3]; cons = Flatten@{ Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; NMinimize[ {Total@((vec1[[#]] - Indexed[vec2, u[#]])^2 & /@ R...

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]