Search engine behaves same using multi-thread or a single thread

0

Good afternoon, I am developing a routine to do search in pdf files. My idea is to distribute the search processing in each file on different threads to optimize the response time. The SearchPDFText method below looks up and returns the files correctly, however when I do one test processing the files on different threads and another test processing one by one on the main application thread the average response time is the same.

    int ctFilesToSearch;
    int ctFilesSearching;

    /// <summary>
    /// Files where the search text was found.
    /// </summary>
    List<FileInfo> lFilesGood;
    Queue<FileInfo> qFilesToSearch;

    public SearchEngine()
    {
        lFilesGood = new List<FileInfo>();
    }

    public IEnumerable<FileInfo> SearchPDFText(IEnumerable<FileInfo> lFiles, string searchText)
    {
        try
        {
            qFilesToSearch = new Queue<FileInfo>(lFiles);
            int totalFiles = lFiles.Count();
            ctFilesToSearch = totalFiles;
            ctFilesSearching = 0;

            while (ctFilesSearching < totalFiles)
            {
                ctFilesSearching++;
                Thread tr = new Thread(() => Search(qFilesToSearch.Dequeue(), searchText));
                tr.Start(); //Multi thread.
                //Search(qFilesToSearch.Dequeue(), searchText); //Processamento 1 a 1.
            }

            while (ctFilesToSearch > 0) ; //Aguarda todos os arquivos a serem processados.
            return lFilesGood;
        }
        catch { throw; }
    }

    private void Search(FileInfo file, string searthText)
    {
        if (SearchPdfFile(file.FullName, searthText))
            lFilesGood.Add(file);
        ctFilesToSearch--;
    }

    private bool SearchPdfFile(string fileName, string searthText)
    {
        bool textFound = false;
        if (File.Exists(fileName))
        {
            using (PdfReader pdfReader = new PdfReader(fileName))
            {
                for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                    string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
                    if (currentPageText.Contains(searthText))
                    {
                        textFound = true;
                        break;
                    }
                }
            }
        }
        return textFound;
    }

Note: I am using the following dlls:

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading;

My test was based on the processing of 200 .pdf files with 16 pages each. The response time in both scenarios was on average 1m40s. I hoped that by being processing in parallel the result would be much better. Is the way I'm doing to achieve my goal (parallelism) correct?

    
asked by anonymous 20.12.2017 / 19:42

0 answers