Uploading Files

Now that you have access to Thorium, you may want to upload some files and run analysis tools on them. You can do that in either the Web UI or through Thorctl. When uploading a small number of files, the Web UI is usually preferable, while Thorctl is helpful in uploading many files or when a browser is not accessible.

When uploading files there are several options you may set that are described below. Groups is the only required field. If you are not yet a member of any groups then follow the steps in the Adding/Editing Groups section and come back afterward.

FieldDescriptionFormat/Accepted ValuesRequired
GroupsLimits who can see this fileOne or more group namesyes
DescriptionA short text explanation of the sample and/or its sourceAny valid UTF-8 formatted textno
TagsKey/value pairs to help locate and categorize filesAny key/value pair; both key and value are requiredno
OriginsSpecifies where a file came fromDownloaded, Transformed, Unpacked, Wire, Incident, or Memory Dumpno

It is recommended that you provide origin information for any file(s) you upload whenever possible. A key feature of Thorium is its ability to store origin information in a structured format and automatically translate that information into metadata tags. Tags allow you to filter the files that you browse through when looking for a file. As a result, if you don't provide any origin information, it may be difficult to locate your files at a later date.

File Origins

File Origins are the single most important piece of information in describing, locating, and understanding relationships between files. Described below are all the options for file origins and their respective subfields.

Downloaded

The "Downloaded" Origin specifies that the file was downloaded from a specific URL.

SubfieldDescriptionFormat/Accepted ValuesRequired
URLThe URL the file was downloaded fromA valid URLyes
Site NameThe name of the website the file was downloaded fromAny UTF-8 formatted textno

Transformed

The "Transformed" Origin specifies that the file is a result of transforming another file, whether by a tool or some other means.

SubfieldDescriptionFormat/Accepted ValuesRequired
ParentThe SHA256 of the original file that was transformed to produce this fileA valid SHA256 of an existing file in Thorium1yes
ToolThe tool that was used to produce this transformed fileAny UTF-8 formatted textno
FlagsThe tool command-line flags that were used to transform this sampleOne or more hyphenated alphanumeric flags2no
  1. Your account must have access to the parent file in order to specify it in a file's origin
  2. Example: --flag1, --flag2, --flag3, -f

Unpacked

The "Unpacked" Origin specifies that the file was unpacked from some other file, whether by a tool or some other means.

SubfieldDescriptionFormat/Accepted ValuesRequired
ParentThe SHA256 of the original file that this file was unpacked fromA valid SHA256 of an existing file in Thorium1yes
ToolThe tool that was used to unpack this fileAny UTF-8 formatted textno
FlagsThe tool command-line flags that were used to unpack this sampleOne or more hyphenated alphanumeric flags2no
  1. Your account must have access to the parent file in order to specify it in a file's origin
  2. Example: --flag1, --flag2, --flag3, -f

Wire

The "Wire" Origin specifies that a file was captured/sniffed "on the wire" en route to a destination.

SubfieldDescriptionFormat/Accepted ValuesRequired
SnifferThe sniffer1 used to capture this fileAny UTF-8 formatted textyes
SourceThe source IP/hostname this file came from when it was sniffedAny UTF-8 formatted textno
DestinationThe destination IP/hostname where this file was headed to when it was sniffedAny UTF-8 formatted textno
  1. Example: wireshark

Incident

The "Incident" Origin specifies that the file originated from a specific security incident.

SubfieldDescriptionFormat/Accepted ValuesRequired
Incident IDThe name or ID identifying the incident from which the file originatedAny UTF-8 formatted textyes
Cover TermAn optional term for the organization where an incident occurredAny UTF-8 formatted textno
Mission TeamThe name of the mission team that handled the incidentAny UTF-8 formatted textno
NetworkThe name of the network where the incident occurredAny UTF-8 formatted textno
MachineThe IP or hostname of the machine where the incident occurredAny UTF-8 formatted textno
LocationThe physical/geographical location where the incident occurredAny UTF-8 formatted textno

Memory Dump

The "Memory Dump" Origin specifies that the file originated from a memory dump.

SubfieldDescriptionFormat/Accepted ValuesRequired
Memory TypeThe type of memory dump this file originated fromAny UTF-8 formatted textyes
ParentThe SHA256 of the memory dump file in Thorium from which this file originatesA valid SHA256 of an existing file in Thorium1no
ReconstructedThe characteristics that were reconstructed in this memory dumpOne or more UTF-8 formatted stringsno
Base AddressThe virtual address where the memory dump startsAn alphanumeric memory addressno
  1. Your account must have access to the parent file in order to specify it in a file's origin

Carved

The "Carved" Origin specifies that a file was "carved out" of another file (e.g. archive, memory dump, packet capture, etc.). Unlike "Unpacked," "Carved" describes a sample that is a simple, discrete piece of another file. It's extraction can be easily replicated without any dynamic unpacking process.

SubfieldDescriptionFormat/Accepted ValuesRequired
ParentThe SHA256 of the original file that was carved to produce this fileA valid SHA256 of an existing file in Thorium1yes
ToolThe tool that was used to produce this transformed fileAny UTF-8 formatted textno
Carved OriginThe type of file this sample was carved from (and other related metadata)See below Carved origin subtypesno
  1. Your account must have access to the parent file in order to specify it in a file's origin

Carved origins may also have an optional subtype defining what type of file the sample was originally carved from. The Carved subtypes are described below:

PCAP

The "Carved PCAP" Origin specifies that a file was "carved out" of a network/packet capture.

SubfieldDescriptionFormat/Accepted ValuesRequired
Source IPThe source IP address this file came fromAny valid IPv4/IPv6no
Destination IPThe destination IP address this file was going toAny valid IPv4/IPv6no
Source PortThe source port this file was sent fromAny valid port (16-bit unsigned integer)no
Destination PortThe destination port this file was going toAny valid port (16-bit unsigned integer)no
ProtocolThe protocol by which this file was sent"UDP"/"Udp"/"udp" or "TCP"/"Tcp"/"tcp"no
URLThe URL this file was sent from or to if it was sent using HTTPAny UTF-8 formatted textno
Unknown

The "Carved Unknown" Origin specifies that a file was "carved out" of an unknown or unspecified file type.

This origin has no other subfields except for the ones from it's parent "Carved" origin.

Web UI


You can upload files in the Web UI by following the steps shown in the following video:

Run Pipelines

You can choose to immediately run one or more pipelines on your uploaded file by selecting them in the Run Pipelines submenu. You can also run pipelines on the file later from the file's page in the Web UI or using Thorctl (see Spawning Reactions for more info on running pipelines on files).

Thorctl


It is best to use Thorctl when you have a large number of files that you want to upload. Thorctl will eagerly upload multiple files in parallel by default, and specifying a directory to upload will recursively upload every file within the directory tree. To upload a file or a folder of files, you can use the following command (using --file-groups/-G go specify the groups to upload to):

thorctl files upload --file-groups <group> <files/or/folders>

If you have multiple files or folders to upload (e.g. ./hello.txt, /bin/ls, and ~/Documents), you can upload them all in one command like so:

thorctl files upload -G example-group ./hello.txt /bin/ls ~/Documents

Uploading to Multiple Groups

You can upload to more than one group by placing commas between each group:

thorctl files upload -G <group1>,<group2>,<group3> <file/or/folder>

Or by adding multiple -G or --file-groups flags:

thorctl files upload -G <group1> -G <group2> -G <group3> <file/or/folder>

Uploading with Tags

You can also upload a file with specific tags with the --file-tags or -T flag:

thorctl files upload --file-groups <group> --file-tags Dataset=Examples --file-tags Corn=good <file/or/folder>

Because tags can contain any symbol (including commas), you must specify each tag with its own -file-tags or -T flag rather than delimiting them with commas.

Filtering Which Files to Upload

There may be cases where you want to upload only certain files within a folder. Thorctl provides the ability to either inclusively or exclusively filter with regular expressions using the --filter and --skip flags, respectively. For example, to upload only files with the .exe extension within a folder, you could run the following command:

thorctl files upload --file-groups example-group --filter .*\.exe ./my-folder

Or to upload everything within a folder except for files starting with temp-, you could run this command:

thorctl files upload --file-groups example-group --skip temp-.* ./my-folder

Supply multiple filters by specifying filter flags multiple times:

thorctl files upload --file-groups example-group --filter .*\.exe --filter .*evil.* --skip temp-.* ./my-folder

The filter and skip regular expressions must adhere to the format used by the Rust regex crate. Fortunately, this format is very similar to most other popular regex types and should be relatively familiar. A helpful site to build and test your regular expressions can be found here: https://rustexp.lpil.uk

Hidden Directories

Additionally, if you want to include hidden sub-directories/files in a target directory, use the --include-hidden flag:

thorctl files upload -G example-group ./files --include-hidden

Folder Tags

Thorctl also has a feature to use file subdirectories as tag values with customizable tag keys using the --folder-tags option. For example, say you're uploading a directory bin with the following structure:

cool_binaries
├── file1
└── dumped
    ├── file2
    ├── file3
    ├── pe
        └── file4
    └── elf
        └── file5

The cool_binaries directory contains five total files spread across three subdirectories. Each tag we provide with --folder-tags corresponds to a directory from top to bottom (including the root cool_binaries directory). So for example, if you run:

thorctl files upload -G example-group ./bin --folder-tags alpha --folder-tags beta --folder-tags gamma

The key alpha would correspond to the bin directory, beta to dumped, and gamma to pe and elf. So all files in the cool_binaries directory including files in subdirectories would get the tag alpha=cool_binaries, all files in the dumped directory would get the tag beta=dumped, and so on. Below is a summary of the files and the tags they would have after running the above command:

FileTags
file1alpha=cool_binaries
file2alpha=cool_binaries, beta=dumped
file3alpha=cool_binaries, beta=dumped
file4alpha=cool_binaries, beta=dumped, gamma=pe
file5alpha=cool_binaries, beta=dumped, gamma=elf

A few things to note:

  • Tags correspond to subdirectory levels, not individual subdirectories, meaning files in subdirectories on the same level will get the same tag key (like pe and elf above).
  • You don't have to provide the same number of tags as subdirectory levels. Any files in subdirectories deeper than the number of folder tags will receive all of their parents' tags until the provided tags are exhausted (e.g. a file in a child directory of elf called x86 would get tags for cool_binaries, dumped and elf but not for x86).

Adjust Number of Parallel Uploads

By default, Thorctl can perform a maximum of 10 actions in parallel at any given time. In the case of file uploads, that means a maximum of 10 files can be uploaded concurrently. You can adjust the number of parallel actions Thorctl will attempt to conduct using the -w flag:

thorctl -w 20 files upload --file-groups <group> <file/or/folders>