Site only with br instead of classes how to take the data

-6

I would like to know how to remove from the site that I will put down some elements that are below a class but not but are not separated by several siblings but by breaklines! How do I get them one at a time and not all of them? The div they are on is called "inside2" on this site: link

I wanted to take all the information but well divided with title and everything, so far only managed to get the whole information and not well organized! Here is the code and its result:

                           Document document1 = Jsoup.connect(http://www.dges.gov.pt/guias/detcursopi.asp?codc=9771&code=4002).get();

                      Elements inside = document1.select(".inside2");


                    for(Element inside2 :inside){
                        Log.d(Tag,inside2.text());
                    }

Log Output =

  

Application Guide for 2017 - Course Detail Address and Contacts of the Institution Travessa da Galé, 36-3º Standard Electrical Building 1349-028 LISBOA Map Tel: 213617320 Fax: 213623833 link [email protected] Characteristics of the pair Institution / CourseCode: 4002/9771 Degree: Licenciatura - 1º ciclo Duration: 6 Semesters ECTS: 180 Type of education: Private Higher Education Polytechnic Contests: Institutional 2017-2018 vacancies: 40 PrerequisitesType: Selection + Serial Group R - Musical Aptitude Entrance Examinations One of the following tests: 05 Spanish 12 Hist. of Culture and Arts 13 English 15 Portuguese Literature 16 Mathematics 18 Portuguese Minimum ClassificationsNote of application: 98 points Entrance tests: 95 points Calculation formulaMedia of secondary: 50% Entrance tests: 35% Pre-requisite: 15% Previous 2014 2015 2016 Vacancies 40 40 40 Other Information Statistical information about the course (It will be directed to the Directorate General of Education and Science Statistics) Information about the evaluation and accreditation of this course (It will be directed to the Agency of Evaluation and Accreditation To view PDF documents you need the Adobe Acrobat Reader. You can download it by clicking the following button:

I wanted it to go out organized but it goes like this, I think because the site is not well divided.

    
asked by anonymous 31.08.2017 / 14:17

1 answer

0

Falotu or .select ("*"):

Document document1 = Jsoup.connect("http://www.dges.gov.pt/guias/detcursopi.asp?codc=9771&code=4002").get();
Elements inside = document1.select(".inside2").select("*");

for(Element inside2 :inside){
    switch(inside2.tagName()) 
    {
        case "h2" :
            System.out.println();
            System.out.println("Título:" + inside2.ownText());
            break;

        case "a" :
            System.out.println("Link:" + inside2.ownText() + "{" + inside2.attr("href") + "}");
            break;

        case "br" :                
            break;

        default : System.out.println("[" + inside2.tagName() + "]" + inside2.ownText());
    }               
}

Line-by-line output, now just spruce the way you want.

    
05.10.2017 / 13:22