Out Of Memory When Running a GetFiles in a Very Large Directory

5

I currently have a directory with multiple subdirectories that have more than 3 million files. I need to map only the directory where these files are. And this mapping should be separated by file type.

The big problems are when I run a .GetFiles("*.WAV", SearchOption.AllDirectories) . This command takes an average of 40 to 50 minutes to execute and then returns the exception Exceção do tipo 'System.OutOfMemoryException' foi acionada.

But on my server I currently have 24 Gb of ram.

DETAILS

  • I'm developing this application in C # Console Application.
  • .Net FrameWork 4.5
  • The server is windows serve 2012

Can anyone suggest an alternative to solve this problem?

    
asked by anonymous 18.07.2016 / 15:39

3 answers

2

If the file system of interest is NTFS, there is another way to actually list these files much faster: by reading the MFT (Master File Table) .

There is already an answer to this question in SOEN , on which I will base this answer.

Overview of how to read the Master File Table (MFT)

The MFT is read in three steps:

  • verify that the user has sufficient privileges
  • get a handle to the volume to be read (the logical drive, eg C :, D :, etc.)
  • call the Windows API DeviceIOControl in loop to enumerate the files
  • Each of these steps deserves more than an entire response, so I will limit myself to describing what is happening very superficially in each one.

    In the program below, they are marked where the steps occur.

    The ProcessEntry method receives each entry to be processed.

    Program.cs

    internal static class Program
    {
        private static void Main(string[] args)
        {
            // passo 1
            if (!Privileges.HasBackupAndRestorePrivileges)
                Console.WriteLine("Could not assert privileges");
    
            // passo 2
            using (var volume = WinFiles.GetVolumeHandle(@"\.\C:"))
                try
                {
                    // passo 3
                    WinFiles.ReadMft(volume, ProcessEntry);
                }
                catch (Exception ex)
                {
                    Console.WriteLine(ex.Message);
                }
        }
    
        private static void ProcessEntry(NativeMethods.UsnRecord usnRecord)
        {
            Console.WriteLine("FRN:" + usnRecord.FileReferenceNumber);
            Console.WriteLine("Parent FRN:" + usnRecord.ParentFileReferenceNumber);
            Console.WriteLine("File name:" + usnRecord.FileName);
            Console.WriteLine("Attributes: "
                + (NativeMethods.EFileAttributes)usnRecord.FileAttributes);
            Console.WriteLine("Timestamp:" + usnRecord.TimeStamp);
        }
    }
    

    Verify privileges

    Verifying privileges is given by trying to release the privilege to Access token of the process, using the AdjustTokenPrivileges ". If this method does not fail, it means that the user has the privilege in question, and from that moment the process can use that privilege.

    The required privileges are "SeBackupPrivilege" and "SeRestorePrivilege" . Administrators and backup operators have these privileges, for example.

    Privilleges.cs

    public static class Privileges
    {
        private static int asserted;
        private static bool hasBackupPrivileges;
    
        public static bool HasBackupAndRestorePrivileges => AssertPriveleges();
    
        /// <remarks>
        /// First time this method is called, it attempts to set
        /// backup privileges for the current process.
        /// Subsequently, it returns the results of that first call.
        /// </remarks>
        private static bool AssertPriveleges()
        {
            var wasAsserted = Interlocked.CompareExchange(ref asserted, 1, 0);
            if (wasAsserted == 0)  // first time here?  come on in!
            {
                var success = AssertPrivelege(NativeMethods.SE_BACKUP_NAME)
                               && AssertPrivelege(NativeMethods.SE_RESTORE_NAME);
    
                hasBackupPrivileges = success;
            }
            return hasBackupPrivileges;
        }
    
    
        private static bool AssertPrivelege(string privelege)
        {
            IntPtr token;
            if (!NativeMethods.OpenProcessToken(
                    NativeMethods.GetCurrentProcess(),
                    NativeMethods.TOKEN_ADJUST_PRIVILEGES,
                    out token))
                return false;
    
            try
            {
                var tokenPrivileges = new NativeMethods.TOKEN_PRIVILEGES
                {
                    Privileges = new NativeMethods.LUID_AND_ATTRIBUTES[1]
                };
    
                if (!NativeMethods.LookupPrivilegeValue(
                        null,
                        privelege,
                        out tokenPrivileges.Privileges[0].Luid))
                    return false;
    
                tokenPrivileges.PrivilegeCount = 1;
                tokenPrivileges.Privileges[0].Attributes
                    = NativeMethods.SE_PRIVILEGE_ENABLED;
                if (!NativeMethods.AdjustTokenPrivileges(
                        token,
                        false,
                        ref tokenPrivileges,
                        Marshal.SizeOf(tokenPrivileges),
                        IntPtr.Zero,
                        IntPtr.Zero))
                    return false;
    
                if (Marshal.GetLastWin32Error() != 0)
                    return false;
            }
            finally
            {
                NativeMethods.CloseHandle(token);
            }
    
            return true;
        }
    }
    

    Get volume handle

    This handle is obtained through the method CreateFile . We will pass the root directory of the volume we want to list, for example C: like: \.\C: . In addition, we will open the file to read ( GenericRead ) in backup mode ( BackupSemantics ).

    The code of the method that reads the handle is just below the file WinFiles.cs .

    Read the MFT

    Finally we use the DeviceIOControl method to get the MFT records. This is a multi-purpose method that receives a huge range of inputs and also returns another huge range of outputs depending on the passed control code.

    In fact we will use the USN Journal, which lists all the changes made to the volume, and enumerate the MFT entries from that list, as each registered change points to the file in the MFT.

    Elements of the USN Journal can point to files that no longer exist and duplicate elements. But we will not have to worry about it because we will use the control code FSCTL_ENUM_USN_DATA , which returns only MFT entries.

    WinFiles.cs

    internal static class WinFiles
    {
        public static SafeFileHandle GetVolumeHandle(
            string pathToVolume)
        {
            var handle = NativeMethods.CreateFile(
                pathToVolume,
                NativeMethods.EFileAccess.GenericRead,
                FileShare.Read | FileShare.Write | FileShare.Delete,
                IntPtr.Zero,
                (uint)NativeMethods.ECreationDisposition.OpenExisting,
                (uint)NativeMethods.EFileAttributes.BackupSemantics,
                IntPtr.Zero);
    
            if (handle.IsInvalid)
                throw new IOException("Bad path");
    
            return handle;
        }
    
        public static unsafe void ReadMft(
            SafeHandle volume,
            Action<NativeMethods.UsnRecordV2> processEntry)
        {
            var input = new NativeMethods.MFTEnumDataV0
            {
                StartFileReferenceNumber = 0,
                LowUsn = 0,
                HighUsn = long.MaxValue
            };
            var usnRecord = new NativeMethods.UsnRecordV2();
            var outputBuffer = new byte[1024 * 1024];
    
            using (var stream = new MemoryStream(outputBuffer, true))
                while (true)
                    fixed (byte* pOutput = outputBuffer)
                    {
                        uint bytesRead;
                        var okay = NativeMethods.DeviceIoControl
                            (
                                volume.DangerousGetHandle(),
                                NativeMethods.DeviceIOControlCode.FsctlEnumUsnData,
                                (byte*)&input,
                                (uint)Marshal.SizeOf(input),
                                pOutput,
                                (uint)outputBuffer.Length,
                                out bytesRead,
                                IntPtr.Zero
                            );
    
                        if (!okay)
                        {
                            var error = Marshal.GetLastWin32Error();
                            if (error != NativeMethods.ERROR_HANDLE_EOF)
                                throw new Win32Exception(error);
    
                            break;
                        }
    
                        using (var reader = new BinaryReader(stream, Encoding.Unicode, true))
                            input.StartFileReferenceNumber = reader.ReadUInt64();
                        while (stream.Position < bytesRead)
                        {
                            usnRecord.Read(stream);
                            processEntry(usnRecord);
                        }
                        stream.Seek(0, SeekOrigin.Begin);
                    }
        }
    }
    

    Dependencies

    NativeMethods.cs

    internal class NativeMethods
    {
        internal const int ERROR_HANDLE_EOF = 38;
    
        //--> Privilege constants....
        internal const UInt32 SE_PRIVILEGE_ENABLED = 0x00000002;
        internal const string SE_BACKUP_NAME = "SeBackupPrivilege";
        internal const string SE_RESTORE_NAME = "SeRestorePrivilege";
        internal const string SE_SECURITY_NAME = "SeSecurityPrivilege";
        internal const string SE_CHANGE_NOTIFY_NAME = "SeChangeNotifyPrivilege";
        internal const string SE_CREATE_SYMBOLIC_LINK_NAME = "SeCreateSymbolicLinkPrivilege";
        internal const string SE_CREATE_PERMANENT_NAME = "SeCreatePermanentPrivilege";
        internal const string SE_SYSTEM_ENVIRONMENT_NAME = "SeSystemEnvironmentPrivilege";
        internal const string SE_SYSTEMTIME_NAME = "SeSystemtimePrivilege";
        internal const string SE_TIME_ZONE_NAME = "SeTimeZonePrivilege";
        internal const string SE_TCB_NAME = "SeTcbPrivilege";
        internal const string SE_MANAGE_VOLUME_NAME = "SeManageVolumePrivilege";
        internal const string SE_TAKE_OWNERSHIP_NAME = "SeTakeOwnershipPrivilege";
    
        //--> For starting a process in session 1 from session 0...
        internal const int TOKEN_DUPLICATE = 0x0002;
        internal const uint MAXIMUM_ALLOWED = 0x2000000;
        internal const int CREATE_NEW_CONSOLE = 0x00000010;
        internal const uint TOKEN_ADJUST_PRIVILEGES = 0x0020;
        internal const int TOKEN_QUERY = 0x00000008;
    
    
        [DllImport("advapi32.dll", SetLastError = true)]
        [return: MarshalAs(UnmanagedType.Bool)]
        internal static extern bool OpenProcessToken(IntPtr ProcessHandle, UInt32 DesiredAccess, out IntPtr TokenHandle);
    
        [DllImport("kernel32.dll")]
        internal static extern IntPtr GetCurrentProcess();
    
        [DllImport("advapi32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
        [return: MarshalAs(UnmanagedType.Bool)]
        internal static extern bool LookupPrivilegeValue(string lpSystemName, string lpName, out LUID lpLuid);
    
        [DllImport("advapi32.dll", SetLastError = true)]
        [return: MarshalAs(UnmanagedType.Bool)]
        internal static extern bool AdjustTokenPrivileges(IntPtr TokenHandle, [MarshalAs(UnmanagedType.Bool)]bool DisableAllPrivileges, ref TOKEN_PRIVILEGES NewState, Int32 BufferLength, IntPtr PreviousState, IntPtr ReturnLength);
    
        [DllImport("kernel32.dll", ExactSpelling = true, SetLastError = true, CharSet = CharSet.Unicode)]
        [return: MarshalAs(UnmanagedType.Bool)]
        internal static unsafe extern bool DeviceIoControl(IntPtr hDevice, DeviceIOControlCode controlCode, byte* lpInBuffer, uint nInBufferSize, byte* lpOutBuffer, uint nOutBufferSize, out uint lpBytesReturned, IntPtr lpOverlapped);
    
        [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
        internal static extern SafeFileHandle CreateFile(string lpFileName, EFileAccess dwDesiredAccess, FileShare dwShareMode, IntPtr lpSecurityAttributes, uint dwCreationDisposition, uint dwFlagsAndAttributes, IntPtr hTemplateFile);
    
        [DllImport("kernel32.dll", SetLastError = true)]
        [return: MarshalAs(UnmanagedType.Bool)]
        internal static extern bool CloseHandle(IntPtr hObject);
    
    
        [Flags]
        internal enum EMethod : uint
        {
            Buffered = 0,
            InDirect = 1,
            OutDirect = 2,
            Neither = 3
        }
    
        [Flags]
        internal enum EFileAccess : uint
        {
            GenericRead = 0x80000000,
            GenericWrite = 0x40000000,
            GenericExecute = 0x20000000,
            GenericAll = 0x10000000,
    
            Delete = 0x10000,
            ReadControl = 0x20000,
            WriteDAC = 0x40000,
            WriteOwner = 0x80000,
            Synchronize = 0x100000,
    
            StandardRightsRequired = 0xF0000,
            StandardRightsRead = ReadControl,
            StandardRightsWrite = ReadControl,
            StandardRightsExecute = ReadControl,
            StandardRightsAll = 0x1F0000,
            SpecificRightsAll = 0xFFFF,
    
            AccessSystemSecurity = 0x1000000,
            MaximumAllowed = 0x2000000
        }
    
    
        [Flags]
        internal enum EFileDevice : uint
        {
            Beep = 0x00000001,
            CDRom = 0x00000002,
            CDRomFileSytem = 0x00000003,
            Controller = 0x00000004,
            Datalink = 0x00000005,
            Dfs = 0x00000006,
            Disk = 0x00000007,
            DiskFileSystem = 0x00000008,
            FileSystem = 0x00000009,
            InPortPort = 0x0000000a,
            Keyboard = 0x0000000b,
            Mailslot = 0x0000000c,
            MidiIn = 0x0000000d,
            MidiOut = 0x0000000e,
            Mouse = 0x0000000f,
            MultiUncProvider = 0x00000010,
            NamedPipe = 0x00000011,
            Network = 0x00000012,
            NetworkBrowser = 0x00000013,
            NetworkFileSystem = 0x00000014,
            Null = 0x00000015,
            ParallelPort = 0x00000016,
            PhysicalNetcard = 0x00000017,
            Printer = 0x00000018,
            Scanner = 0x00000019,
            SerialMousePort = 0x0000001a,
            SerialPort = 0x0000001b,
            Screen = 0x0000001c,
            Sound = 0x0000001d,
            Streams = 0x0000001e,
            Tape = 0x0000001f,
            TapeFileSystem = 0x00000020,
            Transport = 0x00000021,
            Unknown = 0x00000022,
            Video = 0x00000023,
            VirtualDisk = 0x00000024,
            WaveIn = 0x00000025,
            WaveOut = 0x00000026,
            Port8042 = 0x00000027,
            NetworkRedirector = 0x00000028,
            Battery = 0x00000029,
            BusExtender = 0x0000002a,
            Modem = 0x0000002b,
            Vdm = 0x0000002c,
            MassStorage = 0x0000002d,
            Smb = 0x0000002e,
            Ks = 0x0000002f,
            Changer = 0x00000030,
            Smartcard = 0x00000031,
            Acpi = 0x00000032,
            Dvd = 0x00000033,
            FullscreenVideo = 0x00000034,
            DfsFileSystem = 0x00000035,
            DfsVolume = 0x00000036,
            Serenum = 0x00000037,
            Termsrv = 0x00000038,
            Ksec = 0x00000039,
            // From Windows Driver Kit 7
            Fips = 0x0000003A,
            Infiniband = 0x0000003B,
            Vmbus = 0x0000003E,
            CryptProvider = 0x0000003F,
            Wpd = 0x00000040,
            Bluetooth = 0x00000041,
            MtComposite = 0x00000042,
            MtTransport = 0x00000043,
            Biometric = 0x00000044,
            Pmi = 0x00000045
        }
    
        internal enum EFileIOCtlAccess : uint
        {
            Any = 0,
            Special = Any,
            Read = 1,
            Write = 2
        }
    
        internal enum DeviceIOControlCode : uint
        {
            FsctlEnumUsnData = (EFileDevice.FileSystem << 16) | (44 << 2) | EMethod.Neither | (EFileIOCtlAccess.Any << 14),
            FsctlReadUsnJournal = (EFileDevice.FileSystem << 16) | (46 << 2) | EMethod.Neither | (EFileIOCtlAccess.Any << 14),
            FsctlReadFileUsnData = (EFileDevice.FileSystem << 16) | (58 << 2) | EMethod.Neither | (EFileIOCtlAccess.Any << 14),
            FsctlQueryUsnJournal = (EFileDevice.FileSystem << 16) | (61 << 2) | EMethod.Buffered | (EFileIOCtlAccess.Any << 14),
            FsctlCreateUsnJournal = (EFileDevice.FileSystem << 16) | (57 << 2) | EMethod.Neither | (EFileIOCtlAccess.Any << 14)
        }
    
        /// <summary>Control structure used to interrogate MFT data using DeviceIOControl from the user volume</summary>
        [StructLayout(LayoutKind.Sequential)]
        internal struct MFTEnumDataV0
        {
            public ulong StartFileReferenceNumber;
            public long LowUsn;
            public long HighUsn;
        }
    
    
        /// <summary>A structure resurned form USN queries</summary>
        /// <remarks>
        /// FileName is synthetic...composed during a read of the structure and is not technically
        /// part of the Win32 API's definition...although the actual FileName is contained
        /// "somewhere" in the structure's trailing bytes, according to FileNameLength and FileNameOffset.
        /// 
        /// Alignment boundaries are enforced, and so, the RecordLength
        /// may be somewhat larger than the accumulated lengths of the members plus the FileNameLength.
        /// </remarks>
        [StructLayout(LayoutKind.Sequential)]
        internal struct UsnRecord
        {
            public uint RecordLength;
            public ushort MajorVersion;
            public ushort MinorVersion;
            public ulong FileReferenceNumber;
            public ulong ParentFileReferenceNumber;
            public long Usn;
            public long TimeStamp;
            public UsnReason Reason;
            public uint SourceInfo;
            public uint SecurityId;
            public uint FileAttributes;
            public ushort FileNameLength;
            public ushort FileNameOffset;
            public string FileName;
    
            /// <remarks>Note how the read advances to the FileNameOffset and reads only FileNameLength bytes</remarks>
            public void Read(Stream stream)
            {
                var startOfRecord = stream.Position;
    
                using (var reader = new BinaryReader(stream, Encoding.Unicode, true))
                {
                    this.RecordLength = reader.ReadUInt32();
                    this.MajorVersion = reader.ReadUInt16();
                    this.MinorVersion = reader.ReadUInt16();
                    this.FileReferenceNumber = reader.ReadUInt64();
                    this.ParentFileReferenceNumber = reader.ReadUInt64();
                    this.Usn = reader.ReadInt64();
                    this.TimeStamp = reader.ReadInt64();
                    this.Reason = (UsnReason)reader.ReadUInt32();
                    this.SourceInfo = reader.ReadUInt32();
                    this.SecurityId = reader.ReadUInt32();
                    this.FileAttributes = reader.ReadUInt32();
                    this.FileNameLength = reader.ReadUInt16();
                    this.FileNameOffset = reader.ReadUInt16();
    
                    stream.Position = startOfRecord + this.FileNameOffset;
                    this.FileName = Encoding.Unicode.GetString(reader.ReadBytes(this.FileNameLength));
    
                    stream.Position = startOfRecord + this.RecordLength;
                }
            }
        }
    
        [StructLayout(LayoutKind.Sequential)]
        internal struct LUID
        {
            public UInt32 LowPart;
            public Int32 HighPart;
        }
    
    
        [StructLayout(LayoutKind.Sequential)]
        internal struct LUID_AND_ATTRIBUTES
        {
            public LUID Luid;
            public UInt32 Attributes;
        }
    
    
        internal struct TOKEN_PRIVILEGES
        {
            public UInt32 PrivilegeCount;
            [MarshalAs(UnmanagedType.ByValArray, SizeConst = 1)]      // !! think we only need one
            public LUID_AND_ATTRIBUTES[] Privileges;
        }
    
        [Flags]
        internal enum EFileAttributes : uint
        {
            /// <summary/>
            None = 0,
    
            //-->  these are consistent w/ .Net FileAttributes...
            Readonly = 0x00000001,
            Hidden = 0x00000002,
            System = 0x00000004,
            Directory = 0x00000010,
            Archive = 0x00000020,
            Device = 0x00000040,
            Normal = 0x00000080,
            Temporary = 0x00000100,
            SparseFile = 0x00000200,
            ReparsePoint = 0x00000400,
            Compressed = 0x00000800,
            Offline = 0x00001000,
            NotContentIndexed = 0x00002000,
            Encrypted = 0x00004000,
    
            //--> additional CreateFile call attributes...
            Write_Through = 0x80000000,
            Overlapped = 0x40000000,
            NoBuffering = 0x20000000,
            RandomAccess = 0x10000000,
            SequentialScan = 0x08000000,
            DeleteOnClose = 0x04000000,
            BackupSemantics = 0x02000000,
            PosixSemantics = 0x01000000,
            OpenReparsePoint = 0x00200000,
            OpenNoRecall = 0x00100000,
            FirstPipeInstance = 0x00080000
        }
    
        /// <summary>Reasons the file changed (from USN journal)</summary>
        [Flags]
        public enum UsnReason : uint
        {
            BASIC_INFO_CHANGE = 0x00008000,
            CLOSE = 0x80000000,
            COMPRESSION_CHANGE = 0x00020000,
            DATA_EXTEND = 0x00000002,
            DATA_OVERWRITE = 0x00000001,
            DATA_TRUNCATION = 0x00000004,
            EA_CHANGE = 0x00000400,
            ENCRYPTION_CHANGE = 0x00040000,
            FILE_CREATE = 0x00000100,
            FILE_DELETE = 0x00000200,
            HARD_LINK_CHANGE = 0x00010000,
            INDEXABLE_CHANGE = 0x00004000,
            NAMED_DATA_EXTEND = 0x00000020,
            NAMED_DATA_OVERWRITE = 0x00000010,
            NAMED_DATA_TRUNCATION = 0x00000040,
            OBJECT_ID_CHANGE = 0x00080000,
            RENAME_NEW_NAME = 0x00002000,
            RENAME_OLD_NAME = 0x00001000,
            REPARSE_POINT_CHANGE = 0x00100000,
            SECURITY_CHANGE = 0x00000800,
            STREAM_CHANGE = 0x00200000,
    
            None = 0x00000000
        }
    
        internal enum ECreationDisposition : uint
        {
            New = 1,
            CreateAlways = 2,
            OpenExisting = 3,
            OpenAlways = 4,
            TruncateExisting = 5
        }
    
    }
    
        
    24.07.2016 / 04:46
    1

    Try switching to EnumerateFiles() . It does this more efficiently.

    But you need to see if the problem is this one. Have you measured to make sure it's there? Is this not doing multiple times? Without being sure where the problem is, it is difficult to fix it.

        
    18.07.2016 / 15:45
    1

    I use DirectoryInfo.EnumerateFiles

      

    The EnumerateFiles and GetFiles methods differ as follows:   when you use EnumerateFiles, you can start enumerating the collection   of names before the entire collection is returned; When you use   GetFiles, you must wait for the entire array of names to be   returned to be able to access the array.   working with many files and directories, EnumerateFiles can be   more efficient.

    You can still take advantage of Take to page the return of the files:

        DirectoryInfo di = new DirectoryInfo(directoryPath);
                IEnumerable<FileInfo> files;
    
    files = from file in di.EnumerateFiles("*.WAV", SearchOption.AllDirectories)
                            select file;                                          
    
                    return files.Take(100);
    

    Source: link

        
    18.07.2016 / 15:50