Automating File Transfers to Microsoft Fabric with PowerShell and AzCopy

Jun 24, 2025 · 6 min read · AzCopy ·

Last year, I shared my experience with managing files in OneLake using the Python SDK, where I explored how to programmatically interact with Microsoft Fabric’s data lake capabilities. Recently, I revisited a similar challenge — but this time, I approached it from a different angle. Instead of Python, I used PowerShell in combination with AzCopy to automate the process of uploading files from an Azure Virtual Machine directly into Microsoft Fabric.

🎯 Objective

The goal: automatically transfer files from an Azure virtual machine into Microsoft Fabric.

🔄 End-to-End Flow

Here's the high-level workflow:

Files are collected on a virtual machine.
A script triggers for each new file.
The file is uploaded to Fabric storage.
Once a complete dataset is in place, a Fabric pipeline kicks off.

In this post, we’ll zoom in on step 3: the file transfer.

🧰 From Manual to Scripted Transfers

Initially, I explored Azure Storage Explorer, which allows manual uploads to the Files section of a Lakehouse.
Behind the scenes, it uses AzCopy, a command-line tool optimized for Azure storage transfers.

So, why not use this for automation?

This led me to testing AzCopy directly in PowerShell to fully automate the transfers.

⚠️ Challenges at Scale

It worked well—until files started arriving frequently and simultaneously.

That’s when the real problems surfaced:

❌ Token file conflicts: Multiple AzCopy processes tried to use the same access token cache → file locks.
⚙️ Resource contention: Too many parallel AzCopy jobs competing for CPU and memory.
📉 Performance bottlenecks: High concurrency can cause I/O slowdowns and potential API throttling.
♻️ Cleanup required: Residual job files and logs needed regular pruning.

It’s important to understand the two levels of concurrency here:

Multiple processes can run in parallel.
Each process internally handles multiple requests concurrently.

Both layers need proper control.

I started by managing concurrency within each process using AzCopy environment variables and command-line parameters.

📘 Helpful Docs on Concurrency Control Within Each Process

Two resources proved essential:

📄 AzCopy Command Reference

Parameter	Description
`cap-mbps`	Throttles bandwidth usage. Helpful when multiple jobs run in parallel.
`output-level`	Quiet mode reduces console output; details remain in log files.

🧬 AzCopy Environment Variables

Variable	Purpose
`AZCOPY_LOG_LOCATION`	Log file location. Centralized logs are cleaned after `x` days via cron.
`AZCOPY_JOB_PLAN_LOCATION`	Path for job plans and token files. Use separate directories per process.
`AZCOPY_CONCURRENCY_VALUE`	Limits concurrent operations within each process.
`AZCOPY_BUFFER_GB`	Allocates memory for upload buffers. Also affects max chunk size.
`AZCOPY_DISABLE_SYSLOG`	Disables system logging. Reduces unnecessary output on Windows/Linux.

📦 Why the Portable Version of AzCopy?

I used the portable binary of AzCopy for flexibility and ease of deployment:
👉 Download it here

💻 PowerShell Function: `Transfer-File`

Here’s the core function that handles a single transfer:

 1function Transfer-File {
 2	param (
 3		[string]$source_path,
 4		[string]$destination,
 5		[string]$azcopy_path,
 6		[string]$azcopy_log_dir,
 7		[string]$azcopy_token_dir,
 8		[decimal]$buffer_gb = 1.3,
 9		[int]$concurrency = 8,
10		[int]$cap_mbps = 100,
11		[switch]$silent
12	)
13
14	# Create unique token cache directory per process
15	$tokenloc_guid = [guid]::NewGuid().ToString()
16	$token_cache_dir = Join-Path $azcopy_token_dir "AzCopyTokenCache_$PID_$tokenloc_guid"
17	if (-not (Test-Path $token_cache_dir)) {
18		New-Item -ItemType Directory -Path $token_cache_dir -Force | Out-Null
19	}
20	if (-not $silent) { Write-Output "AzCopy Token will be stored here: $token_cache_dir" }	
21
22	try {
23		if (-not $silent) { Write-Output "`nStarting file transfer..." }
24		# Set environment variables and authenticate using managed identity
25		$env:AZCOPY_LOG_LOCATION = $azcopy_log_dir
26		$env:AZCOPY_JOB_PLAN_LOCATION = $token_cache_dir
27		$env:AZCOPY_CONCURRENCY_VALUE = "$concurrency"
28		$env:AZCOPY_BUFFER_GB="$buffer_gb"
29		$env:AZCOPY_DISABLE_SYSLOG = "true"
30		& $azcopy_path login --identity
31		if ($LASTEXITCODE -ne 0) {
32			throw "AzCopy login failed with exit code $LASTEXITCODE"
33		}
34
35		# Run azcopy to transfer files       
36		if (-not $silent) { Write-Output "Uploading file to: $destination (concurrency: $concurrency, buffer: $buffer_gb GB, cap-mbps: $cap_mbps)" }
37		& $azcopy_path copy `
38			"$source_path" `
39			"$destination" `
40			--overwrite=true `
41			--from-to=LocalBlob `
42			--blob-type BlockBlob `
43			--follow-symlinks `
44			--check-length=true `
45			--put-md5 `
46			--disable-auto-decoding=false `
47			--trusted-microsoft-suffixes=onelake.blob.fabric.microsoft.com `
48			--log-level=INFO `
49			--output-level=quiet `
50			--cap-mbps=$cap_mbps
51		if ($LASTEXITCODE -ne 0) {
52			throw "AzCopy copy operation failed with exit code $LASTEXITCODE"
53		}
54
55		if (-not $silent) { Write-Output "AzCopy completed successfully." }
56	}
57	finally {
58		# Delete token cache directory
59		try {
60			if (Test-Path $token_cache_dir) {
61				Remove-Item -Path $token_cache_dir -Recurse -Force -ErrorAction SilentlyContinue
62				if (-not $silent) { Write-Output "Cleaned up token cache directory: $token_cache_dir" }
63			}
64		}
65		catch {
66			if (-not $silent) { Write-Output "Failed to clean up token cache directory $token_cache_dir" }
67		}
68	}
69}

Assume that AzCopy is placed in the folder C:\Tools\ and these two subfolders are created:

Logs for storing log files
TokenCache for temporary token storage

To upload a file from C:\DataExports\ to a Fabric Lakehouse (Workspace: WH_Sales, Lakehouse: LH_Sales, Folder: Sales), you can use the following PowerShell script:

 1$azcopyExe = "C:\Tools\azcopy\azcopy.exe"
 2$logDirectory = "C:\Tools\azcopy\Logs\"
 3$tokenCacheDirectory = "C:\Tools\azcopy\TokenCache\"
 4
 5$workspace = "WS_Sales"
 6$lakehouse = "LH_Sales"
 7$folder = "Sales"
 8$fileName = "sales_data_2025_06_23.csv"
 9
10$sourcePath = "C:\DataExports\$fileName"
11$destination = "https://onelake.blob.fabric.microsoft.com/$workspace/$lakehouse.Lakehouse/Files/$folder/"
12
13Transfer-File `
14    -source_path $sourcePath `
15    -destination $destination `
16    -azcopy_path $azcopyExe `
17    -azcopy_log_dir $logDirectory `
18    -azcopy_token_dir $tokenCacheDirectory

🔍 Key Highlights

✅ Isolation per process: Each run uses its own token cache directory (PID + GUID).
⚙️ Resource tuning: Adjustable buffer_gb, concurrency, and cap_mbps.
🔐 Managed Identity Auth: Secure—no secrets required.
🧹 Automatic cleanup: Temporary files removed after each run; logs cleaned via cron.
🔇 Silent mode: Reduces console noise when needed.

🚦 Process-Level Concurrency Control is Crucial

As mentioned earlier, concurrency must be controlled at two levels:

Within each process: via AzCopy options.
Across processes: by limiting how many run simultaneously.

If there's a risk of transfer spikes, a queue can help control process concurrency:

Use a queue-based system (e.g., file-based or database-backed job queue).
Run a background worker that dequeues and processes files with limited parallelism (e.g., 1–3 threads).

Benefits:

Prevents resource exhaustion
Reduces API throttling risks
Improves performance predictability

🧾 Final Thoughts

This script is still in an early stage but it looks promising.
By combining PowerShell and AzCopy, you can build a lightweight, scalable file ingestion system for Microsoft Fabric.

With solid concurrency management and error handling, this setup can run reliably and unattended—ideal for 24/7 operations.