Automating File Transfers to Microsoft Fabric with PowerShell and AzCopy
Last year, I shared my experience with managing files in OneLake using the Python SDK, where I explored how to programmatically interact with Microsoft Fabric’s data lake capabilities. Recently, I revisited a similar challenge — but this time, I approached it from a different angle. Instead of Python, I used PowerShell in combination with AzCopy to automate the process of uploading files from an Azure Virtual Machine directly into Microsoft Fabric.
- 🎯 Objective
- 🔄 End-to-End Flow
- 🧰 From Manual to Scripted Transfers
- ⚠️ Challenges at Scale
- 📘 Helpful Docs on Concurrency Control Within Each Process
- 📦 Why the Portable Version of AzCopy?
- 💻 PowerShell Function:
Transfer-File - 🔍 Key Highlights
- 🚦 Process-Level Concurrency Control is Crucial
- 🧾 Final Thoughts
🎯 Objective
The goal: automatically transfer files from an Azure virtual machine into Microsoft Fabric.
🔄 End-to-End Flow
Here's the high-level workflow:
- Files are collected on a virtual machine.
- A script triggers for each new file.
- The file is uploaded to Fabric storage.
- Once a complete dataset is in place, a Fabric pipeline kicks off.
In this post, we’ll zoom in on step 3: the file transfer.
🧰 From Manual to Scripted Transfers
Initially, I explored Azure Storage Explorer, which allows manual uploads to the Files section of a Lakehouse.
Behind the scenes, it uses AzCopy, a command-line tool optimized for Azure storage transfers.
So, why not use this for automation?
This led me to testing AzCopy directly in PowerShell to fully automate the transfers.
⚠️ Challenges at Scale
It worked well—until files started arriving frequently and simultaneously.
That’s when the real problems surfaced:
- ❌ Token file conflicts: Multiple AzCopy processes tried to use the same access token cache → file locks.
- ⚙️ Resource contention: Too many parallel AzCopy jobs competing for CPU and memory.
- 📉 Performance bottlenecks: High concurrency can cause I/O slowdowns and potential API throttling.
- ♻️ Cleanup required: Residual job files and logs needed regular pruning.
It’s important to understand the two levels of concurrency here:
- Multiple processes can run in parallel.
- Each process internally handles multiple requests concurrently.
Both layers need proper control.
I started by managing concurrency within each process using AzCopy environment variables and command-line parameters.
📘 Helpful Docs on Concurrency Control Within Each Process
Two resources proved essential:
📄 AzCopy Command Reference
| Parameter | Description |
|---|---|
cap-mbps |
Throttles bandwidth usage. Helpful when multiple jobs run in parallel. |
output-level |
Quiet mode reduces console output; details remain in log files. |
🧬 AzCopy Environment Variables
| Variable | Purpose |
|---|---|
AZCOPY_LOG_LOCATION |
Log file location. Centralized logs are cleaned after x days via cron. |
AZCOPY_JOB_PLAN_LOCATION |
Path for job plans and token files. Use separate directories per process. |
AZCOPY_CONCURRENCY_VALUE |
Limits concurrent operations within each process. |
AZCOPY_BUFFER_GB |
Allocates memory for upload buffers. Also affects max chunk size. |
AZCOPY_DISABLE_SYSLOG |
Disables system logging. Reduces unnecessary output on Windows/Linux. |
📦 Why the Portable Version of AzCopy?
I used the portable binary of AzCopy for flexibility and ease of deployment:
👉 Download it here
💻 PowerShell Function: Transfer-File
Here’s the core function that handles a single transfer:
1function Transfer-File {
2 param (
3 [string]$source_path,
4 [string]$destination,
5 [string]$azcopy_path,
6 [string]$azcopy_log_dir,
7 [string]$azcopy_token_dir,
8 [decimal]$buffer_gb = 1.3,
9 [int]$concurrency = 8,
10 [int]$cap_mbps = 100,
11 [switch]$silent
12 )
13
14 # Create unique token cache directory per process
15 $tokenloc_guid = [guid]::NewGuid().ToString()
16 $token_cache_dir = Join-Path $azcopy_token_dir "AzCopyTokenCache_$PID_$tokenloc_guid"
17 if (-not (Test-Path $token_cache_dir)) {
18 New-Item -ItemType Directory -Path $token_cache_dir -Force | Out-Null
19 }
20 if (-not $silent) { Write-Output "AzCopy Token will be stored here: $token_cache_dir" }
21
22 try {
23 if (-not $silent) { Write-Output "`nStarting file transfer..." }
24 # Set environment variables and authenticate using managed identity
25 $env:AZCOPY_LOG_LOCATION = $azcopy_log_dir
26 $env:AZCOPY_JOB_PLAN_LOCATION = $token_cache_dir
27 $env:AZCOPY_CONCURRENCY_VALUE = "$concurrency"
28 $env:AZCOPY_BUFFER_GB="$buffer_gb"
29 $env:AZCOPY_DISABLE_SYSLOG = "true"
30 & $azcopy_path login --identity
31 if ($LASTEXITCODE -ne 0) {
32 throw "AzCopy login failed with exit code $LASTEXITCODE"
33 }
34
35 # Run azcopy to transfer files
36 if (-not $silent) { Write-Output "Uploading file to: $destination (concurrency: $concurrency, buffer: $buffer_gb GB, cap-mbps: $cap_mbps)" }
37 & $azcopy_path copy `
38 "$source_path" `
39 "$destination" `
40 --overwrite=true `
41 --from-to=LocalBlob `
42 --blob-type BlockBlob `
43 --follow-symlinks `
44 --check-length=true `
45 --put-md5 `
46 --disable-auto-decoding=false `
47 --trusted-microsoft-suffixes=onelake.blob.fabric.microsoft.com `
48 --log-level=INFO `
49 --output-level=quiet `
50 --cap-mbps=$cap_mbps
51 if ($LASTEXITCODE -ne 0) {
52 throw "AzCopy copy operation failed with exit code $LASTEXITCODE"
53 }
54
55 if (-not $silent) { Write-Output "AzCopy completed successfully." }
56 }
57 finally {
58 # Delete token cache directory
59 try {
60 if (Test-Path $token_cache_dir) {
61 Remove-Item -Path $token_cache_dir -Recurse -Force -ErrorAction SilentlyContinue
62 if (-not $silent) { Write-Output "Cleaned up token cache directory: $token_cache_dir" }
63 }
64 }
65 catch {
66 if (-not $silent) { Write-Output "Failed to clean up token cache directory $token_cache_dir" }
67 }
68 }
69}
Assume that AzCopy is placed in the folder C:\Tools\ and these two subfolders are created:
Logsfor storing log filesTokenCachefor temporary token storage
To upload a file from C:\DataExports\ to a Fabric Lakehouse (Workspace: WH_Sales, Lakehouse: LH_Sales, Folder: Sales), you can use the following PowerShell script:
1$azcopyExe = "C:\Tools\azcopy\azcopy.exe"
2$logDirectory = "C:\Tools\azcopy\Logs\"
3$tokenCacheDirectory = "C:\Tools\azcopy\TokenCache\"
4
5$workspace = "WS_Sales"
6$lakehouse = "LH_Sales"
7$folder = "Sales"
8$fileName = "sales_data_2025_06_23.csv"
9
10$sourcePath = "C:\DataExports\$fileName"
11$destination = "https://onelake.blob.fabric.microsoft.com/$workspace/$lakehouse.Lakehouse/Files/$folder/"
12
13Transfer-File `
14 -source_path $sourcePath `
15 -destination $destination `
16 -azcopy_path $azcopyExe `
17 -azcopy_log_dir $logDirectory `
18 -azcopy_token_dir $tokenCacheDirectory
🔍 Key Highlights
- ✅ Isolation per process: Each run uses its own token cache directory (PID + GUID).
- ⚙️ Resource tuning: Adjustable
buffer_gb,concurrency, andcap_mbps. - 🔐 Managed Identity Auth: Secure—no secrets required.
- 🧹 Automatic cleanup: Temporary files removed after each run; logs cleaned via cron.
- 🔇 Silent mode: Reduces console noise when needed.
🚦 Process-Level Concurrency Control is Crucial
As mentioned earlier, concurrency must be controlled at two levels:
- Within each process: via AzCopy options.
- Across processes: by limiting how many run simultaneously.
If there's a risk of transfer spikes, a queue can help control process concurrency:
- Use a queue-based system (e.g., file-based or database-backed job queue).
- Run a background worker that dequeues and processes files with limited parallelism (e.g., 1–3 threads).
Benefits:
- Prevents resource exhaustion
- Reduces API throttling risks
- Improves performance predictability
🧾 Final Thoughts
This script is still in an early stage but it looks promising.
By combining PowerShell and AzCopy, you can build a lightweight, scalable file ingestion system for Microsoft Fabric.
With solid concurrency management and error handling, this setup can run reliably and unattended—ideal for 24/7 operations.