FtpWebResponse.GetResponseStream returning an HTML

2

I'm doing an FTP connection, with FtpWebResponse . So far so good, I'm listing the directories as this answer .

When I simulate an FTP server locally with FileZilla Server included in XAMPP, I make the directory listing and it comes one by one on each line of ResponseStream , as in the example:

config/
app/
public/
file.xml

But I tested today on two remote servers and comes a gigantic.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd
">
<!-- HTML listing generated by Squid 2.6.STABLE21 -->
<!-- Wed, 27 May 2015 17:42:13 GMT -->
<HTML><HEAD><TITLE>
FTP Directory: ftp://[email protected]/
</TITLE>
<STYLE type="text/css"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}--></STYLE>
</HEAD><BODY>
<PRE>
--------- Welcome to Pure-FTPd [privsep] [TLS] ----------
You are user number 2 of 50 allowed.
Local time is now 18:43. Server port: 21.
This is a private system - No anonymous login
IPv6 connections are also welcome on this server.
</PRE>
<HR noshade size="1px">
<H2>
FTP Directory: <A HREF="/">ftp://[email protected]</A>/</H2>
<PRE>
<A HREF="etc/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/ic
ons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="etc/">etc</A>. . . . . . . . . . . . . . . Jan 13 20
:39
<A HREF="logs/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/i
cons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="logs/">logs</A> . . . . . . . . . . . . . . May 14
19:06
<A HREF="mail/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/i
cons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mail/">mail</A> . . . . . . . . . . . . . . Dec 16
20:53
<A HREF="public_ftp/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-st
atic/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="public_ftp/">public_ftp</A> . . . . . . . . .
 . . Aug  4  2014
<A HREF="public_html/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-s
tatic/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="public_html/">public_html</A>. . . . . . . .
 . . . May 25 17:21
<A HREF="ssl/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/ic
ons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="ssl/">ssl</A>. . . . . . . . . . . . . . . Aug  5  2
014
<A HREF="tmp/"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/ic
ons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="tmp/">tmp</A>. . . . . . . . . . . . . . . May  5 12
:57
<A HREF="www"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-internal-static/ico
ns/anthony-link.gif" ALT="[LINK]"></A> <A HREF="www">www</A>. . . . . . . . . . . . . . . Sep 30  20
14         <A HREF="www;type=a"><IMG border="0" SRC="http://proxy.domain.local:8080/squid-i
nternal-static/icons/anthony-text.gif" ALT="[VIEW]"></A> <A HREF="www;type=i"><IMG border="0" SRC="h
ttp://proxy.domain.local:8080/squid-internal-static/icons/anthony-box.gif" ALT="[DOWNLOAD]"
></A> -> <A HREF="public_html">public_html</A>
</PRE>
<HR noshade size="1px">
<ADDRESS>
Generated Wed, 27 May 2015 17:42:13 GMT by proxy.domain.local (squid/2.6.STABLE21)
</ADDRESS></BODY></HTML>

I've removed some parts to not get too large and also some sensitive information ...

How to force the response to be just the directories and files line by line, or at least one XML?

Edit

I inspected the request with the Fiddler Web Debugger , and in the Inspector > Raw contains the following:

GET ftp://user:[email protected]/ HTTP/1.1
Host: domain.com
Proxy-Connection: Keep-Alive
I do not know why, Fiddler is interrupting my request and my application does not complete it, it stops until time is over, it only concludes when I close Fiddler.

/ sub>

Edit 2

I tested the program at home and it worked normally, returning only the list of directories. As discussed in the comments the problem is probably the company's proxy.

    
asked by anonymous 27.05.2015 / 19:37

1 answer

1

This happens because proxy performs the request through the HTTP >, and do not FTP , proxy , then perform the required commands and will return the result to you within an HTTP response.

HTTP Proxies usually return an HTML page as a result, so the user can click to get the relevant files.

Since you do not have proxy settings, an alternative is to parse HTML and extract the relevant information.

One way to do this in C # is to use HTML Agile Pack , below the adapted code, what was mentioned in question :

using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Linq;
....
...

static List<string> retornarDiretoriosFTP(string URI, string usuario, string senha) {
    FtpWebRequest ftpRequest = (FtpWebRequest)WebRequest.Create(URI);
    List<string> diretorios = new List<string>();
    string resposta = string.Empty;

    ftpRequest.Credentials = new NetworkCredential(usuario, senha);
    ftpRequest.Method = WebRequestMethods.Ftp.ListDirectory;
    FtpWebResponse resultado = (FtpWebResponse)ftpRequest.GetResponse();

    using (StreamReader streamReader = new StreamReader(resultado.GetResponseStream())) {
        resposta = streamReader.ReadToEnd();
    }

    var documento = new HtmlAgilityPack.HtmlDocument();
    documento.LoadHtml(resposta);

    // Se a resposta conter um HTML Válido
    if (documento.ParseErrors.Count() == 0) {
        foreach (var diretorio in documento.DocumentNode.Descendants("a").Select(x => x.Attributes["href"])) {
            diretorios.Add(diretorio.Value);
        }
    }
    // Se não, provavelmente é a listagem dos diretórios
    else {
        foreach (var diretorio in resposta) {
            diretorios.Add(diretorio.ToString());
        }
    }          
    return diretorios;
}

To use it, do:

static void Main(string[] args) {
    List<string> diretorios = retornarDiretoriosFTP("Proxy", "Usuário", "Senha");
    foreach (var diretorio in diretorios) {
        Console.WriteLine(diretorio);
    }

}

Note : You must refer HTML Agile Pack in the project.

    
28.05.2015 / 15:12