Preserving File Permissions and Ownership Across Platform Conversions

File conversion is usually discussed in terms of format fidelity—how well the visual or textual content survives a transformation. Yet, for many organisations the security envelope that surrounds a file—its permissions, ownership, and extended attributes—is equally vital. When a document moves from a Windows workstation to a Linux server, or when it passes through a cloud‑based converter, those access controls can be silently stripped, exposing sensitive data or breaking automated workflows. This guide walks through the underlying permission models, explains why they matter during conversion, and provides concrete, reproducible techniques for keeping them intact.


Understanding Permission Models on Different Platforms

POSIX permissions dominate Unix‑like systems. Every file has an owner user, an owner group, and three permission triples (read, write, execute) for user, group, and others. Modern Linux distributions also support POSIX ACLs, which allow fine‑grained entries beyond the classic three‑tuple.

Windows ACLs are more expressive. An Access Control List contains a sequence of Access Control Entries (ACEs) that specify allow or deny rules for users, groups, or built‑in principals such as Authenticated Users. Each ACE can include inheritance flags, object‑type specific permissions, and auditing settings.

Both platforms expose extended attributes (xattrs) and resource forks (on macOS) that store custom metadata—think of a custom tag indicating “confidential” or a checksum used by an external system. When a file is merely copied, most operating systems preserve these attributes; however, most naïve conversion tools treat the file as an opaque byte stream and drop everything beyond the raw data.


Why Permissions Matter in Conversion Workflows

  1. Regulatory compliance – GDPR, HIPAA, and other statutes often require that access controls survive any data handling operation, not just storage.
  2. Operational continuity – Automated pipelines that rely on group‑based execution (e.g., a nightly job that processes files owned by data‑ingest) will fail if the ownership is lost.
  3. Risk mitigation – Stripped ACLs can turn a private document into a world‑readable file, creating a data‑leak surface.
  4. Auditing – For forensic or e‑discovery purposes the original permission state is part of the evidential chain; its alteration can invalidate the audit trail.

Consequently, any conversion pipeline that moves files across filesystems, containers, or cloud services should treat permissions as first‑class citizens.


Typical Scenarios Where Permissions Disappear

1. Windows → Linux via SMB or FTP

When a file is uploaded from a Windows share to a Linux server, the SMB client usually maps the Windows owner to a local user (often nobody) and discards the original ACL. FTP, being a plain‑text protocol, strips all metadata.

2. Cloud‑based conversion services

Most SaaS converters accept a multipart/form-data POST, read the file contents, perform the transformation, and return the result. The service treats the payload as raw bytes; therefore, OS‑level permission bits never leave the client machine. After download, the resulting file inherits the default permissions of the receiving directory. For example, when using convertise.app the uploaded document is processed entirely in the cloud, and the returned file arrives with the permissions of the local download folder.

3. Archive extraction without metadata preservation

A common shortcut is to zip a directory, send the archive, convert the files inside, and unzip the results. The zip format can store Unix permissions, but many consumers unzip with the -X flag disabled, causing the bits to be lost; Windows ZIP utilities ignore them altogether.


Strategies to Preserve Permissions During Conversion

a. Wrap Files in an Archive that Retains Metadata

The simplest approach is to place the source files in an archive that explicitly records permission data, then convert the archive itself if possible. Formats that support this include:

  • tar with the --preserve-permissions (-p) flag. tar stores UID/GID, mode bits, and POSIX ACLs when the --acls option is supplied (GNU tar).
  • pax which is a POSIX‑standard archive capable of storing extended attributes.
  • 7‑zip (.7z) which can record Windows ACLs when the -sacl switch is used.

By preserving the archive, you avoid having to re‑apply permissions after each individual file conversion.

b. Export and Re‑import Permission Metadata Separately

When the conversion target format cannot contain permission bits (e.g., converting a DOCX to PDF), export the security descriptors to a sidecar file before conversion:

# Export POSIX ACLs to a JSON file
auditctl -a always,exit -F arch=b64 -S chmod,chown -k perm_export
getfacl -R /data/incoming > perms.acl

After conversion, a short post‑process script reapplies the saved ACLs to the new files, matching them by relative path.

c. Use Conversion Tools that Honor Metadata

Some command‑line utilities have built‑in options to copy permissions:

  • pandoc (for document formats) respects the --preserve flag to retain file mode bits.
  • ffmpeg can copy the metadata flag; although it does not propagate UNIX permissions, you can combine it with -map_metadata to keep embedded tags.
  • For image conversion, ImageMagick's convert has the -strip option (which removes metadata) but by default it leaves file mode untouched. Explicitly avoiding -strip and using -set filename:original can help you later restore permissions.

d. Programmatic Re‑application with Scripting Languages

Languages such as Python expose the os.chmod, os.chown, and os.setxattr APIs. A generic re‑apply routine could look like:

import json, os, pwd, grp

with open('perms.json') as f:
    perms = json.load(f)

for rel_path, meta in perms.items():
    dst = os.path.join('converted', rel_path)
    os.chmod(dst, meta['mode'])
    uid = pwd.getpwnam(meta['owner']).pw_uid
    gid = grp.getgrnam(meta['group']).gr_gid
    os.chown(dst, uid, gid)
    for attr, value in meta.get('xattrs', {}).items():
        os.setxattr(dst, attr, value.encode())

Storing the metadata in a portable JSON format means the same script works on both Windows (via pywin32 for ACLs) and Linux.


Example End‑to‑End Workflow

  1. Collect source files in /project/source.
  2. Export permissions to perms.json using a small Go utility that walks the directory tree and writes UID/GID, mode, and Windows ACL SDDL strings.
  3. Create a tarball with tar -cvpf source.tar /project/source – the -p flag forces the archive to store the exact mode bits.
  4. Upload the tarball to the conversion service (e.g., curl -F file=@source.tar https://api.convertise.app/convert?to=zip). The service returns a new archive converted.zip where each document is transformed but the wrapper remains.
  5. Extract the archive on the destination host using tar -xvpzf converted.zip (or 7z x on Windows with -sacl).
  6. Re‑apply ACLs by feeding perms.json into the Python script above.

The result is a set of converted files that look and behave exactly like the originals from a security standpoint.


Testing and Verification

After a conversion run, verify that permissions match expectations:

  • Checksum comparison – Compute a SHA‑256 for each file before and after conversion to ensure content integrity; then compare permission hashes using getfacl -c (Linux) or icacls (Windows) and hash those output strings.
  • Automated regression – Incorporate a step in a CI pipeline that runs a test suite: copy a fixture directory, run the conversion, and assert that stat -c "%a %U %G" matches the baseline.
  • Audit logs – If your organisation requires an audit trail, log the permission export and re‑apply timestamps alongside the conversion IDs. This satisfies many compliance frameworks that demand traceability of security metadata.

Edge Cases and Special Considerations

Encrypted Files

When a file is encrypted at the filesystem level (e.g., Windows BitLocker, Linux eCryptfs), the conversion service cannot see the underlying permissions because the data is presented as a ciphertext blob. The recommended practice is to decrypt to a secure staging area, perform the conversion while preserving ACLs, then re‑encrypt the result.

Streaming Conversions

Some pipelines stream a file directly to a conversion binary (ffmpeg -i - -f mp4 -). In such cases the original file never exists on disk after the stream begins, and therefore its permission bits cannot be copied. The workaround is to duplicate the file descriptor: open the source, fstat its mode, and after the conversion close the stream, then chmod the output file to the saved mode.

Cross‑Platform Path Normalization

Windows uses backslashes and may store case‑insensitive paths, while Unix is case‑sensitive. When matching side‑car metadata to converted files, normalize paths with os.path.normcase (Windows) or os.path.realpath (POSIX) before look‑up.


Checklist for Permission‑Safe Conversion

  • Identify the source permission model (POSIX, Windows ACL, macOS xattr).
  • Export permission metadata to a portable representation before conversion.
  • Choose an archive format that stores these bits if you must bundle files.
  • Prefer conversion tools that preserve file mode unless you deliberately strip metadata.
  • Re‑apply permissions after conversion using scripted automation.
  • Verify with checksum‑based tests that both content and ACLs match expectations.
  • Document the process in an internal run‑book for auditors.

Conclusion

File conversion is often reduced to a question of "does the new file look the same?" For secure and compliant environments the answer must also include "does the new file keep the same access controls?" By treating permissions as explicit data—exporting them, transporting them alongside the payload, and re‑instating them after conversion—you can build pipelines that respect both content fidelity and security posture. Whether you are moving PDFs from a Windows desktop to a Linux‑based archival system, or you are leveraging a cloud‑first converter such as convertise.app, these practices give you predictable, auditable results without sacrificing the convenience of modern file‑conversion services.