Copy-Robust: A script for copying large files from endpoints on the moon
Imagine, for a minute, that you're an incident response analyst and you've just finished a memory dump on a machine you suspect was infected with malware. This is a laptop that's assigned to someone who works remotely from (you have to assume) the goddamn moon, because their network connection is both incredibly slow and even more unreliable. You've tried to copy the memory dump off using your EDR solution. It keeps failing because of the awful network connection.
That's the situation I found myself in when I wrote this script.
Design
The Copy-Robust script was designed very specifically for incident responders
to use during incidents. Its design uses concepts I stole unashamedly from both
forensic tooling and, uh, bittorrent. To summarize the script, it lets you copy
the contents of one folder to another folder (over the network, usually) in a
robust manner by compressing it into a split 7-zip archive, creating a "resume"
file that tracks the status and CRC32 of each chunk, and validating each chunk
was successfully copied through secure SHA256 hashes (one on the source, one on
the target). It will automatically perform up to 3 retries on a failed copy (so
4 attempts total, if necessary) and gracefully exit on a failure. It supports
resuming a copy at an arbitrary point by processing the resume file.
When you're doing incident response work, you need a good balance between speed and forensic soundness. What this script does aims to strike the right balance, giving you a guarantee your copies are correct while working in a way that lets you do things a little faster and in situations where you may not have things like "unlimited hard drive space". This is why it uses CRC32 in the resume file and SHA256 to validate each chunk: CRC32 is much faster, and the resume file is not meant to be forensically sound on its own.
It's designed to run in Powershell on Windows hosts, with minimal requirements. It's also designed to run in, hopefully, just about whatever Powershell version is actually on your hosts: its only external dependency is 7-zip, which is used for compression (to avoid a dependency on newer versions of Powershell) and for calculation of CRC32 hashes (for much the same reason).
It's meant to be deployed through EDR, so it's fully non-interactive.
It uses Powershell's transcript functionality to provide a log of its output and operations, and if run with any VerbosePreference other than "SilentlyContinue", it will output the SHA256 it uses internally for validating every copy operation.
Because the resume file is trusted unconditionally and keeps track of whether each chunk has been copied and verified, it can resume at any point, and if you run out of space on your target server you can move the existing chunks off and pick right back up by executing the script again with the same settings.
Overall, I tried to design this so that it could be fairly robust for as many collection scenarios as possible, giving reasonable confidence in its results without sacrificing too much in the way of speed. It's hard to strike the right balance between speed and forensic soundness, but I think this does it pretty well, personally :)
Technical details
The resume file
The script has two classes built-in, one for the resume file itself, and one for each entry within the resume file. The entries are pretty simple: the base name of each chunk, the calculated CRC32, and flags for whether each chunk has been copied and verified yet.
The resume file itself is a standard Powershell CliXml file serialized from the
ResumeData class, which itself contains an array of ResumeEntry objects, as
well as a little additional metadata: a random GUID used to identify the local
working directory (where the chunks are stored) and a count of how many chunks
are expected.
As a CliXml file, this is technically not safe: deserializing an untrusted file can result in attacker-controlled code execution. Given this script is designed to be run by your security analysts, ideally you'll be able to trust the resume file, but this is something to keep in mind when evaluating this script.1
Starting a copy
When the script is invoked, it first determines the actual target directory for the copy, and does two things:
- Makes sure 7-zip is available in the target directory, and
- Checks for an existing resume file in the actual target directory.
If there's no existing resume file, it'll start the copy process from the beginning:
- Compress the source folder (
-Source) with 7-zip, into a split archive encrypted with the provided passphrase (-Passphrase) and in volumes of no more than the provided size (-TapeSize). These temporary volumes will be stored in the local working folder (defaults toC:\Temp\RobustCopy, but can be specified with-WorkDir; this will always be in a GUID-named folder within that folder). - Loop over all files in the working directory, calculating CRC32 hashes of each file and adding them all the the initial resume file.
- Write out the initial resume file, with every chunk flagged as not copied or verified.
Resuming a copy
If, instead, there is an existing resume file, it validates the resume file against the local working folder before resuming the copy:
- Deserialize the resume file.
- Validate the correct number of files are in the local working folder, based on the resume file.
- Loop through all the files in the resume file, calculating the local CRC32 and comparing it to the one stored in the resume file.
Performing the copy
After the resume file is created or validated, the copy itself starts, looping through each chunk in the resume file once more:
- Calculate the local SHA256.
- Copy the chunk to the target folder.
- Set the "copied" flag for the chunk and atomically update the resume file.
- Calculate the SHA256 after the copy.
- Loop up to 4 times total if the SHA validation fails.
- Set the "verified" flag and atomically update the resume file.
- Based on
VerbosePreference, optionally write out verbose data for each chunk (local path, target path, local SHA, target SHA).
And once the copy is done, unless you specify -NoClean, it will automatically
clean up the contents of the local working folder.
Caveats
There are a couple caveats to this script that mean it won't work in every situation:
- It requires a potentially significant amount of space on the source device, because it needs to create the chunks locally before copying them to your target drive. This makes it impractical for space-constrained environments.
- It's Windows-only. It relies on both Powershell and 7-zip in ways that are not cross-platform. The logic could be replicated for other platforms, but as-is this script is not and cannot be cross-platform. This is a deliberate decision, to ensure this script will actually work in as many Windows environments as possible, but it's an annoying restriction nonetheless.
- It requires, practically, mapping of a network share on the machine you're investigating. This can be a problem if you want network containment on that machine as well, though it's a problem that can be worked around through dedicated infrastructure and allow-listing in your containment solution.
- It requires running an external executable, which can be an issue if your environment has application control. 7-zip was chosen here as the single external dependency because it's a very common application and should be easy enough to allow through app control policies, but this is another potentially anoying restriction.
- As mentioned above, the use of PSSerializer and CliXml for the resume file is a potential security risk, if you run the script on an untrusted resume file.
This isn't necessarily the solution for every situation. There might be better or easier options (rsync is often a good choice!), but this has a niche and overall, I think it's a pretty robust solution for a very specific problem. It certainly has some room for improvement, and I'm 100% open to suggestions, but I'm actually pretty proud of the overall design.
Hopefully you'll get some use out of this, and if you've got any suggestions, let me know :)
-
If you want to run through the threat model here, consider: the resume file will be generated by the script and used only for the lifetime of the copy. In general, it's unlikely you'll be running this on any untrusted inputs. Additionally, in most situations this script will be run on the potentially compromised machine--so if a malicious actor is able to poison the resume file, the impact will be contained to a machine they already had access to, and that they already had easier ways of controlling than by abusing resume files. This may still be a concern in some environments or usecases; please consider your own needs and threat model here.
Nevertheless, at some point I may update this to use a custom serializer to prevent this concern entirely. ↩