PIGSTY

Babelfish

MS SQL Server Wire compatibility on PostgreSQL

Pigsty allows users to create a Microsoft SQL Server compatible PostgreSQL cluster using Babelfish and WiltonDB!

  • Babelfish: An open-source MSSQL (Microsoft SQL Server) compatibility extension Open Sourced by AWS
  • WiltonDB: A PostgreSQL kernel distribution focusing on integrating Babelfish

Babelfish is a PostgreSQL extension, but it works on a slightly modified PostgreSQL kernel Fork, WiltonDB provides compiled kernel binaries and extension binary packages on EL/Ubuntu systems.

Pigsty can replace the native PostgreSQL kernel with WiltonDB, providing an out-of-the-box MSSQL compatible cluster along with all the supported by common PostgreSQL clusters, such as HA, PITR, IaC, monitoring, etc.

WiltonDB is very similar to PostgreSQL 15, but it can not use vanilla PostgreSQL extensions directly. WiltonDB has several re-compiled extensions such as system_stats, pg_hint_plan and tds_fdw.

The cluster will listen on the default PostgreSQL port and the default MSSQL 1433 port, providing MSSQL services via the TDS WireProtocol on this port. You can connect to the MSSQL service provided by Pigsty using any MSSQL client, such as SQL Server Management Studio, or using the sqlcmd command-line tool.


Get Started

install Pigsty’s with the mssql config template.

./configure -c mssql     # Use mssql (babelfish) template
./install.yml            # install everything with pigsty

For production deployments, make sure to modify the password parameters in the pigsty.yml config before running the install playbook.


Notes

When installing and deploying the MSSQL module, please pay special attention to the following points:

  • WiltonDB is available on EL (7/8/9) and Ubuntu (20.04/22.04) but not available on Debian systems.
  • WiltonDB is currently compiled based on PostgreSQL 15, so you need to specify pg_version: 15.
  • On EL systems, the wiltondb binary is installed by default in the /usr/bin/ directory, while on Ubuntu systems, it is installed in the /usr/lib/postgresql/15/bin/ directory, which is different from the official PostgreSQL binary location.
  • In WiltonDB compatibility mode, the HBA password authentication rule needs to use md5 instead of scram-sha-256. Therefore, you need to override Pigsty’s default HBA rule set and insert the md5 authentication rule required by SQL Server before the dbrole_readonly wildcard authentication rule.
  • WiltonDB can only be enabled for a primary database, and you should designate a user as the Babelfish superuser, allowing Babelfish to create databases and users. The default is mssql and dbuser_myssql. If you change this, you should also modify the user in files/mssql.sql.
  • The WiltonDB TDS cable protocol compatibility plugin babelfishpg_tds needs to be enabled in shared_preload_libraries.
  • After enabling the WiltonDB extension, it listens on the default MSSQL port 1433. You can override Pigsty’s default service definitions to redirect the primary and replica services to port 1433 instead of the 5432 / 6432ports.

The following parameters need to be configured for the MSSQL database cluster:

pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - {name: dbuser_mssql ,password: DBUser.MSSQL ,superuser: true, pgbouncer: true ,roles: [dbrole_admin], comment: superuser & owner for babelfish  }
    pg_databases:
      - name: mssql
        baseline: mssql.sql
        extensions: [uuid-ossp, babelfishpg_common, babelfishpg_tsql, babelfishpg_tds, babelfishpg_money, pg_hint_plan, system_stats, tds_fdw]
        owner: dbuser_mssql
        parameters: { 'babelfishpg_tsql.migration_mode' : 'multi-db' }
        comment: babelfish cluster, a MSSQL compatible pg cluster
    node_crontab: [ '00 01 * * * postgres /pg/bin/pg-backup full' ] # make a full backup every 1am

    # Babelfish / WiltonDB Ad Hoc Settings
    pg_mode: mssql                     # Microsoft SQL Server Compatible Mode
    pg_version: 15
    pg_packages: [ wiltondb, pgsql-common, sqlcmd ]
    pg_libs: 'babelfishpg_tds, pg_stat_statements, auto_explain' # add timescaledb to shared_preload_libraries
    pg_default_hba_rules: # overwrite default HBA rules for babelfish cluster
      - { user: '${dbsu}'    ,db: all         ,addr: local     ,auth: ident ,title: 'dbsu access via local os user ident' }
      - { user: '${dbsu}'    ,db: replication ,addr: local     ,auth: ident ,title: 'dbsu replication from local os ident' }
      - { user: '${repl}'    ,db: replication ,addr: localhost ,auth: pwd   ,title: 'replicator replication from localhost' }
      - { user: '${repl}'    ,db: replication ,addr: intra     ,auth: pwd   ,title: 'replicator replication from intranet' }
      - { user: '${repl}'    ,db: postgres    ,addr: intra     ,auth: pwd   ,title: 'replicator postgres db from intranet' }
      - { user: '${monitor}' ,db: all         ,addr: localhost ,auth: pwd   ,title: 'monitor from localhost with password' }
      - { user: '${monitor}' ,db: all         ,addr: infra     ,auth: pwd   ,title: 'monitor from infra host with password' }
      - { user: '${admin}'   ,db: all         ,addr: infra     ,auth: ssl   ,title: 'admin @ infra nodes with pwd & ssl' }
      - { user: '${admin}'   ,db: all         ,addr: world     ,auth: ssl   ,title: 'admin @ everywhere with ssl & pwd' }
      - { user: dbuser_mssql ,db: mssql       ,addr: intra     ,auth: md5   ,title: 'allow mssql dbsu intranet access' } # <--- use md5 auth method for mssql user
      - { user: '+dbrole_readonly',db: all    ,addr: localhost ,auth: pwd   ,title: 'pgbouncer read/write via local socket' }
      - { user: '+dbrole_readonly',db: all    ,addr: intra     ,auth: pwd   ,title: 'read/write biz user via password' }
      - { user: '+dbrole_offline' ,db: all    ,addr: intra     ,auth: pwd   ,title: 'allow etl offline tasks from intranet' }
    pg_default_services: # route primary & replica service to mssql port 1433
      - { name: primary ,port: 5433 ,dest: 1433  ,check: /primary   ,selector: "[]" }
      - { name: replica ,port: 5434 ,dest: 1433  ,check: /read-only ,selector: "[]" , backup: "[? pg_role == `primary` || pg_role == `offline` ]" }
      - { name: default ,port: 5436 ,dest: postgres ,check: /primary   ,selector: "[]" }
      - { name: offline ,port: 5438 ,dest: postgres ,check: /replica   ,selector: "[? pg_role == `offline` || pg_offline_query ]" , backup: "[? pg_role == `replica` && !pg_offline_query]" }

You can define business databases & users in the pg_databases and pg_users section:

#----------------------------------#
# pgsql (singleton on current node)
#----------------------------------#
# this is an example single-node postgres cluster with postgis & timescaledb installed, with one biz database & two biz users
pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary } # <---- primary instance with read-write capability
  vars:
    pg_cluster: pg-test
    pg_users:                           # create MSSQL superuser
      - {name: dbuser_mssql ,password: DBUser.MSSQL ,superuser: true, pgbouncer: true ,roles: [dbrole_admin], comment: superuser & owner for babelfish  }
    pg_primary_db: mssql                # use `mssql` as the primary sql server database
    pg_databases:
      - name: mssql
        baseline: mssql.sql             # init babelfish database & user
        extensions:
          - { name: uuid-ossp          }
          - { name: babelfishpg_common }
          - { name: babelfishpg_tsql   }
          - { name: babelfishpg_tds    }
          - { name: babelfishpg_money  }
          - { name: pg_hint_plan       }
          - { name: system_stats       }
          - { name: tds_fdw            }
        owner: dbuser_mssql
        parameters: { 'babelfishpg_tsql.migration_mode' : 'multi-db' }
        comment: babelfish cluster, a MSSQL compatible pg cluster

Client Access

You can use any SQL Server compatible client tool to access this database cluster.

Microsoft provides sqlcmd as the official command-line tool.

Besides, they have a go version cli tool: go-sqlcmd

Install go-sqlcmd:

curl -LO https://github.com/microsoft/go-sqlcmd/releases/download/v1.4.0/sqlcmd-v1.4.0-linux-amd64.tar.bz2
tar xjvf sqlcmd-v1.4.0-linux-amd64.tar.bz2
sudo mv sqlcmd* /usr/bin/

Get started with go-sqlcmd

$ sqlcmd -S 10.10.10.10,1433 -U dbuser_mssql -P DBUser.MSSQL
1> select @@version
2> go
version
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Babelfish for PostgreSQL with SQL Server Compatibility - 12.0.2000.8
Oct 22 2023 17:48:32
Copyright (c) Amazon Web Services
PostgreSQL 15.4 (EL 1:15.4.wiltondb3.3_2-2.el8) on x86_64-redhat-linux-gnu (Babelfish 3.3.0)

(1 row affected)

You can route service traffic to MSSQL 1433 port instead of 5433/5434:

# route 5433 on all members to 1433 on primary
sqlcmd -S 10.10.10.11,5433 -U dbuser_mssql -P DBUser.MSSQL

# route 5434 on all members to 1433 on replicas
sqlcmd -S 10.10.10.11,5434 -U dbuser_mssql -P DBUser.MSSQL

Install

If you have the Internet access, you can add the WiltonDB repository to the node and install it as a node package directly:

node_repo_modules: local,node,pgsql,mssql
node_packages: [ wiltondb ]

Install wiltondb with the following command:

./node.yml -t node_repo,node_pkg

It’s OK to install vanilla PostgreSQL and WiltonDB on the same node, but you can only run one of them at a time, and this is not recommended for production environments.


Extensions

Most of the PGSQL module’s extensions (non-SQL class) cannot be used directly on the WiltonDB core of the MSSQL module and need to be recompiled.

WiltonDB currently comes with the following extension plugins:

NameVersionComment
dblink1.2connect to other PostgreSQL databases from within a database
adminpack2.1administrative functions for PostgreSQL
dict_int1.0text search dictionary template for integers
intagg1.1integer aggregator and enumerator (obsolete)
dict_xsyn1.0text search dictionary template for extended synonym processing
amcheck1.3functions for verifying relation integrity
autoinc1.0functions for autoincrementing fields
bloom1.0bloom access method - signature file based index
fuzzystrmatch1.1determine similarities and distance between strings
intarray1.5functions, operators, and index support for 1-D arrays of integers
btree_gin1.3support for indexing common datatypes in GIN
btree_gist1.7support for indexing common datatypes in GiST
hstore1.8data type for storing sets of (key, value) pairs
hstore_plperl1.0transform between hstore and plperl
isn1.2data types for international product numbering standards
hstore_plperlu1.0transform between hstore and plperlu
jsonb_plperl1.0transform between jsonb and plperl
citext1.6data type for case-insensitive character strings
jsonb_plperlu1.0transform between jsonb and plperlu
jsonb_plpython3u1.0transform between jsonb and plpython3u
cube1.5data type for multidimensional cubes
hstore_plpython3u1.0transform between hstore and plpython3u
earthdistance1.1calculate great-circle distances on the surface of the Earth
lo1.1Large Object maintenance
file_fdw1.0foreign-data wrapper for flat file access
insert_username1.0functions for tracking who changed a table
ltree1.2data type for hierarchical tree-like structures
ltree_plpython3u1.0transform between ltree and plpython3u
pg_walinspect1.0functions to inspect contents of PostgreSQL Write-Ahead Log
moddatetime1.0functions for tracking last modification time
old_snapshot1.0utilities in support of old_snapshot_threshold
pgcrypto1.3cryptographic functions
pgrowlocks1.2show row-level locking information
pageinspect1.11inspect the contents of database pages at a low level
pg_surgery1.0extension to perform surgery on a damaged relation
seg1.4data type for representing line segments or floating-point intervals
pgstattuple1.5show tuple-level statistics
pg_buffercache1.3examine the shared buffer cache
pg_freespacemap1.2examine the free space map (FSM)
postgres_fdw1.1foreign-data wrapper for remote PostgreSQL servers
pg_prewarm1.2prewarm relation data
tcn1.0Triggered change notifications
pg_trgm1.6text similarity measurement and index searching based on trigrams
xml21.1XPath querying and XSLT
refint1.0functions for implementing referential integrity (obsolete)
pg_visibility1.2examine the visibility map (VM) and page-level visibility info
pg_stat_statements1.10track planning and execution statistics of all SQL statements executed
sslinfo1.2information about SSL certificates
tablefunc1.0functions that manipulate whole tables, including crosstab
tsm_system_rows1.0TABLESAMPLE method which accepts number of rows as a limit
tsm_system_time1.0TABLESAMPLE method which accepts time in milliseconds as a limit
unaccent1.1text search dictionary that removes accents
uuid-ossp1.1generate universally unique identifiers (UUIDs)
plpgsql1.0PL/pgSQL procedural language
babelfishpg_money1.1.0babelfishpg_money
system_stats2.0EnterpriseDB system statistics for PostgreSQL
tds_fdw2.0.3Foreign data wrapper for querying a TDS database (Sybase or Microsoft SQL Server)
babelfishpg_common3.3.3Transact SQL Datatype Support
babelfishpg_tds1.0.0TDS protocol extension
pg_hint_plan1.5.1
babelfishpg_tsql3.3.1Transact SQL compatibility