Automating File Transfers to Microsoft Fabric with PowerShell and AzCopy

Last year, I shared my experience with managing files in OneLake using the Python SDK, where I explored how to programmatically interact with Microsoft Fabric’s data lake capabilities. Recently, I revisited a similar challenge — but this time, I approached it from a different angle. Instead of Python, I used PowerShell in combination with AzCopy to automate the process of uploading files from an Azure Virtual Machine directly into Microsoft Fabric.


🎯 Objective

The goal: automatically transfer files from an Azure virtual machine into Microsoft Fabric.


🔄 End-to-End Flow

Here's the high-level workflow:

  1. Files are collected on a virtual machine.
  2. A script triggers for each new file.
  3. The file is uploaded to Fabric storage.
  4. Once a complete dataset is in place, a Fabric pipeline kicks off.

In this post, we’ll zoom in on step 3: the file transfer.


🧰 From Manual to Scripted Transfers

Initially, I explored Azure Storage Explorer, which allows manual uploads to the Files section of a Lakehouse.
Behind the scenes, it uses AzCopy, a command-line tool optimized for Azure storage transfers.

So, why not use this for automation?

This led me to testing AzCopy directly in PowerShell to fully automate the transfers.


⚠️ Challenges at Scale

It worked well—until files started arriving frequently and simultaneously.

That’s when the real problems surfaced:

  • Token file conflicts: Multiple AzCopy processes tried to use the same access token cache → file locks.
  • ⚙️ Resource contention: Too many parallel AzCopy jobs competing for CPU and memory.
  • 📉 Performance bottlenecks: High concurrency can cause I/O slowdowns and potential API throttling.
  • ♻️ Cleanup required: Residual job files and logs needed regular pruning.

It’s important to understand the two levels of concurrency here:

  • Multiple processes can run in parallel.
  • Each process internally handles multiple requests concurrently.

Both layers need proper control.

I started by managing concurrency within each process using AzCopy environment variables and command-line parameters.


📘 Helpful Docs on Concurrency Control Within Each Process

Two resources proved essential:

📄 AzCopy Command Reference

Parameter Description
cap-mbps Throttles bandwidth usage. Helpful when multiple jobs run in parallel.
output-level Quiet mode reduces console output; details remain in log files.

🧬 AzCopy Environment Variables

Variable Purpose
AZCOPY_LOG_LOCATION Log file location. Centralized logs are cleaned after x days via cron.
AZCOPY_JOB_PLAN_LOCATION Path for job plans and token files. Use separate directories per process.
AZCOPY_CONCURRENCY_VALUE Limits concurrent operations within each process.
AZCOPY_BUFFER_GB Allocates memory for upload buffers. Also affects max chunk size.
AZCOPY_DISABLE_SYSLOG Disables system logging. Reduces unnecessary output on Windows/Linux.

📦 Why the Portable Version of AzCopy?

I used the portable binary of AzCopy for flexibility and ease of deployment:
👉 Download it here


💻 PowerShell Function: Transfer-File

Here’s the core function that handles a single transfer:

 1function Transfer-File {
 2	param (
 3		[string]$source_path,
 4		[string]$destination,
 5		[string]$azcopy_path,
 6		[string]$azcopy_log_dir,
 7		[string]$azcopy_token_dir,
 8		[decimal]$buffer_gb = 1.3,
 9		[int]$concurrency = 8,
10		[int]$cap_mbps = 100,
11		[switch]$silent
12	)
13
14	# Create unique token cache directory per process
15	$tokenloc_guid = [guid]::NewGuid().ToString()
16	$token_cache_dir = Join-Path $azcopy_token_dir "AzCopyTokenCache_$PID_$tokenloc_guid"
17	if (-not (Test-Path $token_cache_dir)) {
18		New-Item -ItemType Directory -Path $token_cache_dir -Force | Out-Null
19	}
20	if (-not $silent) { Write-Output "AzCopy Token will be stored here: $token_cache_dir" }	
21
22	try {
23		if (-not $silent) { Write-Output "`nStarting file transfer..." }
24		# Set environment variables and authenticate using managed identity
25		$env:AZCOPY_LOG_LOCATION = $azcopy_log_dir
26		$env:AZCOPY_JOB_PLAN_LOCATION = $token_cache_dir
27		$env:AZCOPY_CONCURRENCY_VALUE = "$concurrency"
28		$env:AZCOPY_BUFFER_GB="$buffer_gb"
29		$env:AZCOPY_DISABLE_SYSLOG = "true"
30		& $azcopy_path login --identity
31		if ($LASTEXITCODE -ne 0) {
32			throw "AzCopy login failed with exit code $LASTEXITCODE"
33		}
34
35		# Run azcopy to transfer files       
36		if (-not $silent) { Write-Output "Uploading file to: $destination (concurrency: $concurrency, buffer: $buffer_gb GB, cap-mbps: $cap_mbps)" }
37		& $azcopy_path copy `
38			"$source_path" `
39			"$destination" `
40			--overwrite=true `
41			--from-to=LocalBlob `
42			--blob-type BlockBlob `
43			--follow-symlinks `
44			--check-length=true `
45			--put-md5 `
46			--disable-auto-decoding=false `
47			--trusted-microsoft-suffixes=onelake.blob.fabric.microsoft.com `
48			--log-level=INFO `
49			--output-level=quiet `
50			--cap-mbps=$cap_mbps
51		if ($LASTEXITCODE -ne 0) {
52			throw "AzCopy copy operation failed with exit code $LASTEXITCODE"
53		}
54
55		if (-not $silent) { Write-Output "AzCopy completed successfully." }
56	}
57	finally {
58		# Delete token cache directory
59		try {
60			if (Test-Path $token_cache_dir) {
61				Remove-Item -Path $token_cache_dir -Recurse -Force -ErrorAction SilentlyContinue
62				if (-not $silent) { Write-Output "Cleaned up token cache directory: $token_cache_dir" }
63			}
64		}
65		catch {
66			if (-not $silent) { Write-Output "Failed to clean up token cache directory $token_cache_dir" }
67		}
68	}
69}

Assume that AzCopy is placed in the folder C:\Tools\ and these two subfolders are created:

  • Logs for storing log files
  • TokenCache for temporary token storage

To upload a file from C:\DataExports\ to a Fabric Lakehouse (Workspace: WH_Sales, Lakehouse: LH_Sales, Folder: Sales), you can use the following PowerShell script:

 1$azcopyExe = "C:\Tools\azcopy\azcopy.exe"
 2$logDirectory = "C:\Tools\azcopy\Logs\"
 3$tokenCacheDirectory = "C:\Tools\azcopy\TokenCache\"
 4
 5$workspace = "WS_Sales"
 6$lakehouse = "LH_Sales"
 7$folder = "Sales"
 8$fileName = "sales_data_2025_06_23.csv"
 9
10$sourcePath = "C:\DataExports\$fileName"
11$destination = "https://onelake.blob.fabric.microsoft.com/$workspace/$lakehouse.Lakehouse/Files/$folder/"
12
13Transfer-File `
14    -source_path $sourcePath `
15    -destination $destination `
16    -azcopy_path $azcopyExe `
17    -azcopy_log_dir $logDirectory `
18    -azcopy_token_dir $tokenCacheDirectory

🔍 Key Highlights

  • Isolation per process: Each run uses its own token cache directory (PID + GUID).
  • ⚙️ Resource tuning: Adjustable buffer_gb, concurrency, and cap_mbps.
  • 🔐 Managed Identity Auth: Secure—no secrets required.
  • 🧹 Automatic cleanup: Temporary files removed after each run; logs cleaned via cron.
  • 🔇 Silent mode: Reduces console noise when needed.

🚦 Process-Level Concurrency Control is Crucial

As mentioned earlier, concurrency must be controlled at two levels:

  • Within each process: via AzCopy options.
  • Across processes: by limiting how many run simultaneously.

If there's a risk of transfer spikes, a queue can help control process concurrency:

  • Use a queue-based system (e.g., file-based or database-backed job queue).
  • Run a background worker that dequeues and processes files with limited parallelism (e.g., 1–3 threads).

Benefits:

  • Prevents resource exhaustion
  • Reduces API throttling risks
  • Improves performance predictability

🧾 Final Thoughts

This script is still in an early stage but it looks promising.
By combining PowerShell and AzCopy, you can build a lightweight, scalable file ingestion system for Microsoft Fabric.

With solid concurrency management and error handling, this setup can run reliably and unattended—ideal for 24/7 operations.