PostgreSQL PostgreSQL as seen by Rubyists Rubyists Andrey Novikov, Evil Martians Kaigi on Rails 2022 22 October 2022
evilmartians.com 邪悪な火星人? イービルマーシャンズ!
Martian Open Source Yabeda:Ruby application instrumentation framework Lefthook:git hooks manager AnyCable:Polyglot replacement for ActionCable server PostCSS:A tool for transforming CSS with JavaScript Imgproxy:Fast and secure standalone server for resizing and converting remote images Logux:Client-server communication framework based on Optimistic UI, CRDT, and log Overmind:Process manager for Procfile-based applications and tmux Even more at evilmartians.com/oss
This talk is about… Subtle differences
Example of subtle yet important difference Are they same? … Left: road signconforming Vienna ConventiononRoad Signs and Signals Right: Japanese road signper “Order onRoad Sign, Road Line, and Road Surface Marking”
Example of subtle yet important difference Are they same? NO! Both require drivers to give a way… European sign doesn’t have a requirement to stop or even slow down!
Example of subtle yet important difference Are they same? YES! Stop signs around the world are mostly octagonal. Japanese signs have “STOP” word in English after 2017, but are still pretty rare.
And how it relates to Ruby and PostgreSQL?
And how it relates to Ruby and PostgreSQL? Application data Data in the database
Ruby vs PostgreSQL data types Are they same?
Integers Integer Variable length integer 2, 4, 8 bytes signed In ActiveModel there is validation for databases: Name Size Range smallint 2 -32768 to +32767 integer 4 -2147483648 to +2147483647 bigint 8 -9223372036854775808 to +9223372036854775807 See bignum.c in Ruby sources. See Numeric types docs 1.size # => 8 (bytes) (256**8 - 1).size # => 8 (bytes) (256**8).size # => 9 (bytes) (256**40 - 1).size # => 40 (bytes) Test.create(value: 2147483648) # ActiveModel::RangeError: 2147483648 is out of rang # for ActiveModel::Type::Integer with limit 4 bytes ` ` ` ` ` ` INSERT INTO "tests" ("value") VALUES (2147483648); -- ERROR: integer out of range
Oh no, I have integer primary keys! What to do? 0. Don’t panic! 1. Use pghero or Heroku pg:diagnose to detect problematic primary keys. 2. Migrate to bigint or uuid if needed (use triggers, Luke!) 3. In case of emergency, remember that all integers are signed! You always have 2 more billions of values on the dark negative side! Example of pkey migration from int to bigint: engineering.silverfin.com/pg-zero-downtime-bigint-migration ` ` ` ` ALTER SEQUENCE tablename_id_seq MINVALUE -2147483647 RESTART WITH -2147483647; pghero
Floating point numbers Float 8 bytes (double-precision) real — 4 bytes double — 8 bytes See Ruby docs for Float More fun at 0.30000000000000004.com! IEEE 754 0.1 + 0.2 # => 0.30000000000000004 Float::MAX # => 1.7976931348623157e+308 Float::MAX + '1e+308'.to_f # => Infinity # BUT! Float::MAX + '0.1'.to_f # => 1.7976931348623157e+308 🤔 Float::MAX == (Float::MAX + '0.1'.to_f) # => true 🤯 Float::NAN == Float::NAN # => false SELECT 0.1::float + 0.2::float; -- 0.300000000000000 SELECT 0.1 + 0.2; -- 0.3 (but it is NOT float!) SELECT '1.7976931348623157e+308'::float + '1e+308':: --- ERROR: value out of range: overflow SELECT '+inf'::double precision; -- Infinity 🤷 SELECT '1.7976931348623157e+308'::float = ('1.7976931348623157e+308'::float + '0.1'::float); -- true ¯_(ツ)_/¯ SELECT 'NaN'::float = 'NaN'::float; -- true 🤯 0.30000000000000004.com
Don’t use floats for calculating money! 🤑💥🤕 Never ever!
Arbitrary precision numbers BigDecimal Variable length numeric Variable length See Ruby docs for BigDecimal Use numeric to store money! BigDecimal("0.1") + BigDecimal("0.2") # => 0.3e0 BigDecimal("NaN") == BigDecimal("NaN") # => false BigDecimal("1.0") / BigDecimal("0.0") #=> Infinity # To match PostgreSQL behavior: BigDecimal.mode(BigDecimal::EXCEPTION_OVERFLOW, tru BigDecimal("1.0") / BigDecimal("0.0") # Computation results in 'Infinity' (FloatDomainEr BigDecimal("0.1") + 0.2.to_d == 0.30000000000000004 # true 🤔 SELECT 0.1 + 0.2; -- 0.3 which is decimal SELECT 'NaN'::decimal = 'NaN'::decimal; -- true SELECT '1.0'::decimal / '0.0'::decimal; -- ERROR: division by zero SELECT (0.1 + 0.2) = (0.1::float + 0.2::float); -- false ` `
But there is money type, isn’t it? BigDecimal Variable size money 8 byte fixed-precision number. ActiveRecord has to parse textual representation, see connection_adapters/postgresql/oid/money.rb Also see issue № 31457 for lots of pain. Both output and acceptable input format depends on session-level lc_monetary setting! Precision is defined by lc_monetary at database creation time and can’t be changed! # If the database locale setting isn't `en_US`: # Creation may fail: Product.create!(price: 100500.42) # ERROR: invalid input syntax for type money: "1005 # Or it can succeed, but won't be able to be parsed Product.last.price # => 0.0 -- on production: SELECT 100500.42::money; -- $100,500.42 -- on dev machine: SELECT 100500.42::money; -- ¥ 100,500 -- 🤯 But it should be dollars, and where are my cen ` ` ` `
Strings and texts, lyrics and prose String Variable size varchar,text variable size, max 1 GB Read the docs: String Read the docs: 8.3. Character Types "こんにちは地球人!".encoding # => #<Encoding:UTF-8> "xe3x2ex2e".encoding # => #<Encoding:UTF-8> "xe3x2ex2e".valid_encoding? # => false "これx00やばい!".valid_encoding? # => true SELECT 'こんにちは地球人!'; -- こんにちは地球人! SELECT E'xe3x2ex2e'); -- ERROR: invalid byte sequence for encoding "UTF8 SELECT E'これx00やばい!'; -- ERROR: invalid byte sequence for encoding "UTF8
So many string types! 1. Don’t use char(n) It is always size of n and stores unnecessary spaces at right. Mostly for compatibility with older applications. 2. varchar and text are effectively the same, choose whatever you like. string in migrations is varchar in PostgreSQL. 💡Did you know? SimpleForm gem will render multi-line HTML <textarea> tag for text type and single-line <input type="text"> for character varying. Convention over configuration! See SimpleForm gem README: github.com/heartcombo/simple_form ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
Please, use utf8everywhere.org! utf8everywhere.org
Binary data String Variable size bytea Variable size, max 1 GB Memory and network traffic consumption: 📈 See Binary Data Types page in the docs. data = File.binread(“meme.png”) # => "x89PNGrnx1A…" data.encoding # => #<Encoding:ASCII-8BIT> data.bytesize # => 46534 Test.last.blob # => "x89PNGrnx1A…" Test.last.blob_before_type_cast.bytesize # => 46534 Test.last.blob_before_type_cast # => "x89504e470d0a1a0a" Test.last.blob_before_type_cast.bytesize # => 93070 SELECT 'x89504e470d0a1a0a…'::bytea; # Note hexadecimal format ↑
What if 1 GB isn’t enough? You can’t store more in a table column (hard limit) But you can store up 4 TB in large objects table! And there is a gem for that: active_storage-postgresql Beware performance implications of TOAST → More about it in PG docs: 70.2. TOAST Andrey Novikov @Envek · Follow — PostgreSQL, did you update all these megabyte- long JSON I gave you? — Yes — What did it cost? — Everything… all the IO on the server. Let’s talk about how PostgreSQL stores JSONB (yep, it will be a thread about TOAST) 10:47 PM · May 18, 2022 Read the full conversation on Twitter 359 Reply Copy link Read 4 replies
Dates Date date 4 bytes Internally stores number of days since year 4713 BC up to infinity. The Julianday number is inelapsed days since noon (GreenwichMeanTime) onJanuary 1, 4713 BCE(in the Juliancalendar). The day count is virtually the astronomical Julian day number. Internally stores number of days since year 4713 BC up to year 5874897 AD. Inthe JulianDate system, eachday has a sequential number, starting from JD0 (whichis sometimes called the JulianDate). JD0 corresponds to 1 January 4713 BC inthe Juliancalendar. See the docs for Date class. See B.7. Julian Dates in PostgreSQL docs.
Time and timezones Time AS::TimeWithZone Two UNIX timestamps inside and tzdata also timestamp timestamptz 8 bytes, microsecond precision Ruby on Rails uses UTC timezone internally. Use timestamp with time zone whenever possible! Time.now # => 2022-10-22 13:42:42 +0900 Time.current # => Sat, 22 Oct 2022 04:42:42 UTC +00: Time.current.time_zone # => #<ActiveSupport::TimeZone name="UTC", @tzinfo= Time.use_zone("Asia/Tokyo") { Time.current } # => Sat, 22 Oct 2020 13:42:42 JST +09:00 CREATE TABLE tests (t1 timestamp, t2 timestamptz); SET SESSION timezone TO 'Etc/UTC'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Asia/Tokyo'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Europe/Lisbon'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Asia/Tokyo'; SELECT * FROM tests; t1 | t2 ---------------------+------------------------- 2022-10-22 04:42:42 │ 2022-10-02 13:42:42+09 │ 2022-10-22 13:42:42 │ 2022-10-02 13:42:42+09 │ 2022-10-22 05:42:42 │ 2022-10-02 13:42:42+09 │ ` `
How to not mess up with timezones 1. Use timezone-aware methods Use Time.current and Date.current instead of Time.now and Date.today 2. Convert timestamps to user time zone 3. Don’t use dates in SQL, use timestamps More tips here: thoughtbot.com/blog/its-about-time-zones ` ` ` ` ` ` ` ` Time.use_zone(user.timezone) do # Do SQL queries, render views, … end # or Time.current.in_time_zone(user.timezone) - Posts.where(published_at: Date.today...Date.tomorrow) + Posts.where(published_at: Time.current.beginning_of_day..Time.current.end_of_day)
Time zones are hard 見る The Problem with Time & Timezones - Computerphile The Problem with Time & Timezones - Computerphile 共有 共有
Void and uncertainity NilClass NULL nil == nil # => true ¯_(ツ)_/¯ SELECT NULL = NULL; -- NULL 🚨 SELECT NULL IS NULL; -- true SELECT NULL IS DISTINCT FROM NULL; -- false SELECT 'Ruby' = NULL; -- NULL 🚨 SELECT 'Ruby' IS NULL; -- false SELECT 'Ruby' IS DISTINCT FROM NULL; -- true
PostgreSQL-specific datatypes
JSON Hash,Array json,jsonb Variable length, up to 1GB Be careful with symbols as keys Define as_json method on your classes to serialize them to JSON automatically. Behavior of JSON.dump and to_json in Rails is different! JSON saves value as is (it is just a string) JSONB is effective but strict: no duplicate keys, doesn’t preserve whitespaces, etc… Inside: string (no null-bytes!), numeric, … { "foo" => "bar", foo: "baz" }.to_json # {"foo":"baz"} ` ` ` ` ` ` SELECT '{"foo": "bar","foo":"baz"}'::json; -- {"foo": "bar","foo":"baz"} SELECT '{"foo": "bar","foo":"baz"}'::jsonb; -- {"foo": "baz"}
JSON on steroids Use store_model gem to make powerful value objects from JSON fields. But don’t overuse! There is performance penalty for serialization and deserialization. store_model gem repo
Ranges Range intrange,tsrange, … intmultirange, … Additional methods in the facets gem. https://www.postgresql.org/docs/14/rangetypes.html 5..7 or 5...8 Time.current..1.day.from_now # endless or beginless ranges Time.current.. ..Date.yesterday nil.. or Range.new(nil, nil) # Beginning is always included if possible :-( Test.pluck("intrange(1, 5, '()')").first # 2...5 Test.pluck("tstzrange(now(), now() + '1 hour', '() # ArgumentError: The Ruby Range object does not # support excluding the beginning of a Range. SELECT int8range(5, 7, '[]'); -- [5,8] SELECT int8range(5, 8); -- [5,8) SELECT tstzrange(now(), now() + '1 day', '()'); -- ["2022-10-22 14:42:42+09","2022-10-23 14:42:42+09 -- endless or beginless ranges SELECT tstzrange(now(), NULL); SELECT tstzrange(NULL, NULL); -- PG 14: Multiranges and operators SELECT nummultirange(numrange(1, 20)) - nummultirange(numrange(4, 6)); -- {[1,4),[6,20)} -- and many more… (exclusion constraints!)
UUID String 36 bytes uuid 16 bytes See Rails docs for Digest::UUID See docs for pgcrypto and uuid-ossp extensions. Also take a look at upcoming UUIDv6, v7, and v8! datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04 # All-random UUIDv4 SecureRandom.uuid # => “40f15398-4b38-4e16-8b3c-ff16fc960d38” # Determined UUIDv5 (hash-based) Digest::UUID.uuid_v5(Digest::UUID::DNS_NAMESPACE, "n # => "9b8edca0-90f2-5031-8e5d-3f708834696c" CREATE EXTENSION "pgcrypto"; SELECT gen_random_uuid(); -- 2cfff962-4a24-4ef3-b2f8-35351b18bf63 CREATE EXTENSION "uuid-ossp"; SELECT uuid_generate_v5(uuid_ns_dns(), 'name'); -- 9b8edca0-90f2-5031-8e5d-3f708834696c
IP addresses IPAddr inet,cidr 7 or 19 bytes both inet works with both host and network addresses. cidr works with network addresses only. See IPAddr docs. See Network address types and functions and operators. ip6 = IPAddr.new "3ffe:505:2::1" ip4 = IPAddr.new "192.168.2.0/24" IPAddr.new("192.168.2.0/24").mask(16) #<IPAddr: IPv4:192.168.0.0/255.255.0.0> SELECT '::1'::inet; SELECT '127.0.0.1/32'::inet; SELECT set_masklen(cidr '192.168.1.0/24', 16); -- 192.168.0.0/16 SELECT inet_merge(inet '192.168.1.5/24', inet '192. -- 192.168.0.0/22 ` ` ` `
Durations ActiveSupport::Duration interval 16 bytes Disclaimer: I added it to Rails in pull request № 16919. Supported out-of-the-box in Ruby on Rails 6.1+ Time.current + 1.year # => Thu, 18 Jun 2021 21:00:00 MSK +03:00 100500.weeks.iso8601 # => "P100500W" 1.month.to_i # => 2629746 (30.436875 days in seconds) SELECT now() + ‘1 year’; -- 2021-06-18 21:00:00+03 SELECT '100500 weeks'::interval; -- 703500 days SELECT EXTRACT(epoch FROM '1 month'::interval); -- 2592000 (30.0 days in seconds)
Enums String Custom enum types 4 bytes Values are human-readable in SQL On Rails < 7 you can use activerecord-postgres_enum gem See 8.7 Enumerated Types page in PostgreSQL docs # In migration (Rails 7+): create_enum :status, ["draft", "published", "archive change_table :posts do |t| t.enum :status, enum_type: "status", default: "dra end # In the application code: class Article < ApplicationRecord enum :status, { draft: "draft", published: "publis end Article.last.status #=> "draft" Article.last.draft? #=> true Article.last.published! # UPDATE articles SET status = 'published' WHERE id CREATE TYPE status AS ENUM ('draft', 'published', ' ALTER TABLE posts ADD COLUMN "status" status NOT NU INSERT INTO posts (status) VALUES ('published'); INSERT INTO posts (status) VALUES ('draft'); SELECT id, status FROM posts; id | status ----+------------ 2 | draft 1 | published
Inside ActiveRecord How datatypes are working under the hood
Example of composite type use case What if we want to: 1. Store products with prices in different currencies in one table 2. Work with price and currency as a whole There is a great money gem in Ruby, but how to do it in SQL? 3. And do some calculations without having to write complex SQL queries with joins. Before: After: User (string currency) has many Product (numeric price) User (string currency) # only as a setting! has many Product (true_money price (string+numeric))
Create custom datatype Declare composite datatype in the database: CREATE TYPE _true_money AS ( currency varchar, amount numeric ); -- type with constraints to allow: -- - either NULL value (no price, can be forbidden by NOT NULL) -- - or value with both currency and amount specified CREATE DOMAIN true_money AS _true_money CHECK ( value IS NULL AND value IS DISTINCT FROM (null, null)::_true_money OR ((value).currency IS NOT NULL AND (value).amount IS NOT NULL) );
Fun fact about composite datatypes Every table defines own datatype which can be used elsewhere But don’t use it in reality, please! (There are limitations) CREATE TABLE "inner" ( v1 integer, v2 text ); CREATE TABLE "outer" (v inner); INSERT INTO "outer" (v) VALUES ((42,'Hello world!')); SELECT * FROM "outer"; v ------------------- (42,"Hello world!")
Use composite datatype See 8.16. Composite Types in PostgreSQL docs for more advices and caveats. ALTER TABLE tests ADD COLUMN price true_money; INSERT INTO tests (price) VALUES (('JPY',10000.0)); INSERT INTO tests (price) VALUES ('("JPY",100.0)'); SELECT price FROM tests; -- (JPY,10000.0),(JPY,100.0) SELECT (price).currency, (price).amount FROM tests; currency | amount ----------+--------- JPY | 10000.0 JPY | 100.0
Declare it in ActiveRecord module ActiveRecord module ConnectionAdapters module PostgreSQL module OID class TrueMoney < Type::Value def type :true_money end # Here will be (de)serialization code end end end end end
Deserialization And "(USD,4.2)" becomes #<Money fractional:420 currency:USD> in Ruby ✨ def deserialize(value) return nil if value.nil? currency, amount = value.match(/A("?(w+)"?,(d+(?:.d+)?))z/).captures ::Money.from_amount(BigDecimal(amount), currency) end ` ` ` `
Casting user input Add ability to assign ready object to attribute: def cast(value) return nil if value.nil? case value when ::Money then value when String then deserialize(value) else raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money" end end
Deserialization and input casting at once Replaces both deserialize and cast , also handles nil s. def cast_value(value) case value when ::Money then value when String currency, amount = value.match(/A("?(w+)"?,(d+(?:.d+)?))z/).captures ::Money.from_amount(BigDecimal(amount), currency) else raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money" end end end ` ` ` ` ` `
Serialization for the database Reuse available serialization methods for subtypes. def serialize(value) return nil if value.nil? # ActiveRecord will handle NULL for us amount_t = ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::Decimal.new currency_t = ::ActiveModel::Type::String.new "(#{currency_t.serialize(value.currency.iso_code).inspect},#{amount_t.serialize(value.amount)})" end
Register datatype in ActiveRecord PostgreSQLAdapterWithTrueMoney = Module.new do def initialize_type_map(m = type_map) m.register_type "true_money" do |*_args, _sql_type| ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney.new end m.alias_type "_true_money", "true_money" super end end ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.prepend(PostgreSQLAdapterWithTrueMoney) ActiveRecord::Type.register( :true_money, ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney, adapter: :postgresql, )
Also add it for migrations… module SchemaStatementsWithTrueMoney def type_to_sql(type, limit: nil, precision: nil, scale: nil, array: nil, **) case type.to_s when 'true_money' then 'true_money' else super end end end ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaStatements.prepend(SchemaStatementsWithTrueMoney) module ActiveRecord::ConnectionAdapters::PostgreSQL::ColumnMethods def true_money(name, options = {}) column(name, :true_money, options) end end
Ready to use! rails g model Product title price:true_money rails db:migrate rails console Product.create!(title: "Something", price: Money.from_amount(100000, “USD”)) Product.last # => #<Product id: 1, title: "Something", price: 100000.00 USD>
But it is not done yet! A lot of stuff has to be done to make a full-featured datatype in SQL… But then you can do a lot in SQL: After all, one might re-invent abandoned pg-currency CREATE FUNCTION true_money_add(a true_money, b true_ BEGIN IF (a).currency != (b).currency THEN RAISE EXCEPTION '% can not be added to % - cur END IF; RETURN ((a).currency, (a).amount + (b).amount); END; $$ IMMUTABLE RETURNS NULL ON NULL INPUT LANGUAGE plp CREATE OPERATOR +(leftarg=true_money, rightarg=true CREATE FUNCTION true_money_sum(state true_money, va BEGIN IF value IS NULL AND state IS NULL THEN RETURN NULL; END IF; IF state IS NULL THEN RETURN value; END IF; RETURN state + value; END; $$ IMMUTABLE LANGUAGE plpgsql; CREATE AGGREGATE sum (true_money) (sfunc = true_mon SELECT (price).currency AS currency, sum(price) AS total FROM products GROUP BY currency;
Play with it yourself! https://gist.github.com/Envek/780b917e72a86c123776ee763b8dd986
Gems, gems, gems! Everything That Can Be Invented Has Been Invented
Gems for datatypes torque-postgresql — standard datatypes not (yet) supported by Rails. activerecord-postgis-adapter — all the power of PostGIS extension in Ruby. activerecord-postgres_enum — support enum in migrations and schema (before Rails 7)
Gems for other PostgreSQL features Because PostgreSQL is much more than datatypes. ActiveRecordExtended — functions for datatypes and DSL for queries fx — make schema.rb great again with triggers scenic — add support for views order_query — keyset-pagination for your models postgresql_cursor — get more data from the database efficiently And also martian pg_trunk gem to rule them all get fx , scenic , object dependency management and more within a single gem! ` ` ` ` structure.sql no more! Make schema.rb great again! ` ` ` ` pg_trunk gem
That’s it! Questions?
Thank you! @Envek @Envek @Envek @Envek github.com/Envek @evilmartians @evilmartians @evil-martians @evil.martians evilmartians.com Our awesome blog: evilmartians.com/chronicles! (Special thanks to @hachi8833 for translating ≈20 posts to Japanese!) See these slides at envek.github.io/kaigionrails-postgresql-as-seen-by-rubyists Theseslides

PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)

  • 1.
    PostgreSQL PostgreSQL as seen by Rubyists Rubyists AndreyNovikov, Evil Martians Kaigi on Rails 2022 22 October 2022
  • 3.
  • 4.
    Martian Open Source Yabeda:Rubyapplication instrumentation framework Lefthook:git hooks manager AnyCable:Polyglot replacement for ActionCable server PostCSS:A tool for transforming CSS with JavaScript Imgproxy:Fast and secure standalone server for resizing and converting remote images Logux:Client-server communication framework based on Optimistic UI, CRDT, and log Overmind:Process manager for Procfile-based applications and tmux Even more at evilmartians.com/oss
  • 5.
    This talk isabout… Subtle differences
  • 7.
    Example of subtleyet important difference Are they same? … Left: road signconforming Vienna ConventiononRoad Signs and Signals Right: Japanese road signper “Order onRoad Sign, Road Line, and Road Surface Marking”
  • 8.
    Example of subtleyet important difference Are they same? NO! Both require drivers to give a way… European sign doesn’t have a requirement to stop or even slow down!
  • 9.
    Example of subtleyet important difference Are they same? YES! Stop signs around the world are mostly octagonal. Japanese signs have “STOP” word in English after 2017, but are still pretty rare.
  • 10.
    And how itrelates to Ruby and PostgreSQL?
  • 11.
    And how itrelates to Ruby and PostgreSQL? Application data Data in the database
  • 12.
    Ruby vs PostgreSQLdata types Are they same?
  • 13.
    Integers Integer Variable length integer 2, 4,8 bytes signed In ActiveModel there is validation for databases: Name Size Range smallint 2 -32768 to +32767 integer 4 -2147483648 to +2147483647 bigint 8 -9223372036854775808 to +9223372036854775807 See bignum.c in Ruby sources. See Numeric types docs 1.size # => 8 (bytes) (256**8 - 1).size # => 8 (bytes) (256**8).size # => 9 (bytes) (256**40 - 1).size # => 40 (bytes) Test.create(value: 2147483648) # ActiveModel::RangeError: 2147483648 is out of rang # for ActiveModel::Type::Integer with limit 4 bytes ` ` ` ` ` ` INSERT INTO "tests" ("value") VALUES (2147483648); -- ERROR: integer out of range
  • 15.
    Oh no, Ihave integer primary keys! What to do? 0. Don’t panic! 1. Use pghero or Heroku pg:diagnose to detect problematic primary keys. 2. Migrate to bigint or uuid if needed (use triggers, Luke!) 3. In case of emergency, remember that all integers are signed! You always have 2 more billions of values on the dark negative side! Example of pkey migration from int to bigint: engineering.silverfin.com/pg-zero-downtime-bigint-migration ` ` ` ` ALTER SEQUENCE tablename_id_seq MINVALUE -2147483647 RESTART WITH -2147483647; pghero
  • 16.
    Floating point numbers Float 8bytes (double-precision) real — 4 bytes double — 8 bytes See Ruby docs for Float More fun at 0.30000000000000004.com! IEEE 754 0.1 + 0.2 # => 0.30000000000000004 Float::MAX # => 1.7976931348623157e+308 Float::MAX + '1e+308'.to_f # => Infinity # BUT! Float::MAX + '0.1'.to_f # => 1.7976931348623157e+308 🤔 Float::MAX == (Float::MAX + '0.1'.to_f) # => true 🤯 Float::NAN == Float::NAN # => false SELECT 0.1::float + 0.2::float; -- 0.300000000000000 SELECT 0.1 + 0.2; -- 0.3 (but it is NOT float!) SELECT '1.7976931348623157e+308'::float + '1e+308':: --- ERROR: value out of range: overflow SELECT '+inf'::double precision; -- Infinity 🤷 SELECT '1.7976931348623157e+308'::float = ('1.7976931348623157e+308'::float + '0.1'::float); -- true ¯_(ツ)_/¯ SELECT 'NaN'::float = 'NaN'::float; -- true 🤯 0.30000000000000004.com
  • 17.
    Don’t use floatsfor calculating money! 🤑💥🤕 Never ever!
  • 18.
    Arbitrary precision numbers BigDecimal Variablelength numeric Variable length See Ruby docs for BigDecimal Use numeric to store money! BigDecimal("0.1") + BigDecimal("0.2") # => 0.3e0 BigDecimal("NaN") == BigDecimal("NaN") # => false BigDecimal("1.0") / BigDecimal("0.0") #=> Infinity # To match PostgreSQL behavior: BigDecimal.mode(BigDecimal::EXCEPTION_OVERFLOW, tru BigDecimal("1.0") / BigDecimal("0.0") # Computation results in 'Infinity' (FloatDomainEr BigDecimal("0.1") + 0.2.to_d == 0.30000000000000004 # true 🤔 SELECT 0.1 + 0.2; -- 0.3 which is decimal SELECT 'NaN'::decimal = 'NaN'::decimal; -- true SELECT '1.0'::decimal / '0.0'::decimal; -- ERROR: division by zero SELECT (0.1 + 0.2) = (0.1::float + 0.2::float); -- false ` `
  • 19.
    But there ismoney type, isn’t it? BigDecimal Variable size money 8 byte fixed-precision number. ActiveRecord has to parse textual representation, see connection_adapters/postgresql/oid/money.rb Also see issue № 31457 for lots of pain. Both output and acceptable input format depends on session-level lc_monetary setting! Precision is defined by lc_monetary at database creation time and can’t be changed! # If the database locale setting isn't `en_US`: # Creation may fail: Product.create!(price: 100500.42) # ERROR: invalid input syntax for type money: "1005 # Or it can succeed, but won't be able to be parsed Product.last.price # => 0.0 -- on production: SELECT 100500.42::money; -- $100,500.42 -- on dev machine: SELECT 100500.42::money; -- ¥ 100,500 -- 🤯 But it should be dollars, and where are my cen ` ` ` `
  • 20.
    Strings and texts,lyrics and prose String Variable size varchar,text variable size, max 1 GB Read the docs: String Read the docs: 8.3. Character Types "こんにちは地球人!".encoding # => #<Encoding:UTF-8> "xe3x2ex2e".encoding # => #<Encoding:UTF-8> "xe3x2ex2e".valid_encoding? # => false "これx00やばい!".valid_encoding? # => true SELECT 'こんにちは地球人!'; -- こんにちは地球人! SELECT E'xe3x2ex2e'); -- ERROR: invalid byte sequence for encoding "UTF8 SELECT E'これx00やばい!'; -- ERROR: invalid byte sequence for encoding "UTF8
  • 21.
    So many stringtypes! 1. Don’t use char(n) It is always size of n and stores unnecessary spaces at right. Mostly for compatibility with older applications. 2. varchar and text are effectively the same, choose whatever you like. string in migrations is varchar in PostgreSQL. 💡Did you know? SimpleForm gem will render multi-line HTML <textarea> tag for text type and single-line <input type="text"> for character varying. Convention over configuration! See SimpleForm gem README: github.com/heartcombo/simple_form ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
  • 22.
  • 23.
    Binary data String Variable size bytea Variablesize, max 1 GB Memory and network traffic consumption: 📈 See Binary Data Types page in the docs. data = File.binread(“meme.png”) # => "x89PNGrnx1A…" data.encoding # => #<Encoding:ASCII-8BIT> data.bytesize # => 46534 Test.last.blob # => "x89PNGrnx1A…" Test.last.blob_before_type_cast.bytesize # => 46534 Test.last.blob_before_type_cast # => "x89504e470d0a1a0a" Test.last.blob_before_type_cast.bytesize # => 93070 SELECT 'x89504e470d0a1a0a…'::bytea; # Note hexadecimal format ↑
  • 24.
    What if 1GB isn’t enough? You can’t store more in a table column (hard limit) But you can store up 4 TB in large objects table! And there is a gem for that: active_storage-postgresql Beware performance implications of TOAST → More about it in PG docs: 70.2. TOAST Andrey Novikov @Envek · Follow — PostgreSQL, did you update all these megabyte- long JSON I gave you? — Yes — What did it cost? — Everything… all the IO on the server. Let’s talk about how PostgreSQL stores JSONB (yep, it will be a thread about TOAST) 10:47 PM · May 18, 2022 Read the full conversation on Twitter 359 Reply Copy link Read 4 replies
  • 25.
    Dates Date date 4 bytes Internallystores number of days since year 4713 BC up to infinity. The Julianday number is inelapsed days since noon (GreenwichMeanTime) onJanuary 1, 4713 BCE(in the Juliancalendar). The day count is virtually the astronomical Julian day number. Internally stores number of days since year 4713 BC up to year 5874897 AD. Inthe JulianDate system, eachday has a sequential number, starting from JD0 (whichis sometimes called the JulianDate). JD0 corresponds to 1 January 4713 BC inthe Juliancalendar. See the docs for Date class. See B.7. Julian Dates in PostgreSQL docs.
  • 26.
    Time and timezones Time AS::TimeWithZone TwoUNIX timestamps inside and tzdata also timestamp timestamptz 8 bytes, microsecond precision Ruby on Rails uses UTC timezone internally. Use timestamp with time zone whenever possible! Time.now # => 2022-10-22 13:42:42 +0900 Time.current # => Sat, 22 Oct 2022 04:42:42 UTC +00: Time.current.time_zone # => #<ActiveSupport::TimeZone name="UTC", @tzinfo= Time.use_zone("Asia/Tokyo") { Time.current } # => Sat, 22 Oct 2020 13:42:42 JST +09:00 CREATE TABLE tests (t1 timestamp, t2 timestamptz); SET SESSION timezone TO 'Etc/UTC'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Asia/Tokyo'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Europe/Lisbon'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Asia/Tokyo'; SELECT * FROM tests; t1 | t2 ---------------------+------------------------- 2022-10-22 04:42:42 │ 2022-10-02 13:42:42+09 │ 2022-10-22 13:42:42 │ 2022-10-02 13:42:42+09 │ 2022-10-22 05:42:42 │ 2022-10-02 13:42:42+09 │ ` `
  • 27.
    How to notmess up with timezones 1. Use timezone-aware methods Use Time.current and Date.current instead of Time.now and Date.today 2. Convert timestamps to user time zone 3. Don’t use dates in SQL, use timestamps More tips here: thoughtbot.com/blog/its-about-time-zones ` ` ` ` ` ` ` ` Time.use_zone(user.timezone) do # Do SQL queries, render views, … end # or Time.current.in_time_zone(user.timezone) - Posts.where(published_at: Date.today...Date.tomorrow) + Posts.where(published_at: Time.current.beginning_of_day..Time.current.end_of_day)
  • 28.
    Time zones arehard 見る The Problem with Time & Timezones - Computerphile The Problem with Time & Timezones - Computerphile 共有 共有
  • 29.
    Void and uncertainity NilClassNULL nil == nil # => true ¯_(ツ)_/¯ SELECT NULL = NULL; -- NULL 🚨 SELECT NULL IS NULL; -- true SELECT NULL IS DISTINCT FROM NULL; -- false SELECT 'Ruby' = NULL; -- NULL 🚨 SELECT 'Ruby' IS NULL; -- false SELECT 'Ruby' IS DISTINCT FROM NULL; -- true
  • 30.
  • 31.
    JSON Hash,Array json,jsonb Variable length,up to 1GB Be careful with symbols as keys Define as_json method on your classes to serialize them to JSON automatically. Behavior of JSON.dump and to_json in Rails is different! JSON saves value as is (it is just a string) JSONB is effective but strict: no duplicate keys, doesn’t preserve whitespaces, etc… Inside: string (no null-bytes!), numeric, … { "foo" => "bar", foo: "baz" }.to_json # {"foo":"baz"} ` ` ` ` ` ` SELECT '{"foo": "bar","foo":"baz"}'::json; -- {"foo": "bar","foo":"baz"} SELECT '{"foo": "bar","foo":"baz"}'::jsonb; -- {"foo": "baz"}
  • 32.
    JSON on steroids Usestore_model gem to make powerful value objects from JSON fields. But don’t overuse! There is performance penalty for serialization and deserialization. store_model gem repo
  • 33.
    Ranges Range intrange,tsrange, … intmultirange,… Additional methods in the facets gem. https://www.postgresql.org/docs/14/rangetypes.html 5..7 or 5...8 Time.current..1.day.from_now # endless or beginless ranges Time.current.. ..Date.yesterday nil.. or Range.new(nil, nil) # Beginning is always included if possible :-( Test.pluck("intrange(1, 5, '()')").first # 2...5 Test.pluck("tstzrange(now(), now() + '1 hour', '() # ArgumentError: The Ruby Range object does not # support excluding the beginning of a Range. SELECT int8range(5, 7, '[]'); -- [5,8] SELECT int8range(5, 8); -- [5,8) SELECT tstzrange(now(), now() + '1 day', '()'); -- ["2022-10-22 14:42:42+09","2022-10-23 14:42:42+09 -- endless or beginless ranges SELECT tstzrange(now(), NULL); SELECT tstzrange(NULL, NULL); -- PG 14: Multiranges and operators SELECT nummultirange(numrange(1, 20)) - nummultirange(numrange(4, 6)); -- {[1,4),[6,20)} -- and many more… (exclusion constraints!)
  • 34.
    UUID String 36 bytes uuid 16 bytes SeeRails docs for Digest::UUID See docs for pgcrypto and uuid-ossp extensions. Also take a look at upcoming UUIDv6, v7, and v8! datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04 # All-random UUIDv4 SecureRandom.uuid # => “40f15398-4b38-4e16-8b3c-ff16fc960d38” # Determined UUIDv5 (hash-based) Digest::UUID.uuid_v5(Digest::UUID::DNS_NAMESPACE, "n # => "9b8edca0-90f2-5031-8e5d-3f708834696c" CREATE EXTENSION "pgcrypto"; SELECT gen_random_uuid(); -- 2cfff962-4a24-4ef3-b2f8-35351b18bf63 CREATE EXTENSION "uuid-ossp"; SELECT uuid_generate_v5(uuid_ns_dns(), 'name'); -- 9b8edca0-90f2-5031-8e5d-3f708834696c
  • 35.
    IP addresses IPAddr inet,cidr 7or 19 bytes both inet works with both host and network addresses. cidr works with network addresses only. See IPAddr docs. See Network address types and functions and operators. ip6 = IPAddr.new "3ffe:505:2::1" ip4 = IPAddr.new "192.168.2.0/24" IPAddr.new("192.168.2.0/24").mask(16) #<IPAddr: IPv4:192.168.0.0/255.255.0.0> SELECT '::1'::inet; SELECT '127.0.0.1/32'::inet; SELECT set_masklen(cidr '192.168.1.0/24', 16); -- 192.168.0.0/16 SELECT inet_merge(inet '192.168.1.5/24', inet '192. -- 192.168.0.0/22 ` ` ` `
  • 36.
    Durations ActiveSupport::Duration interval 16 bytes Disclaimer:I added it to Rails in pull request № 16919. Supported out-of-the-box in Ruby on Rails 6.1+ Time.current + 1.year # => Thu, 18 Jun 2021 21:00:00 MSK +03:00 100500.weeks.iso8601 # => "P100500W" 1.month.to_i # => 2629746 (30.436875 days in seconds) SELECT now() + ‘1 year’; -- 2021-06-18 21:00:00+03 SELECT '100500 weeks'::interval; -- 703500 days SELECT EXTRACT(epoch FROM '1 month'::interval); -- 2592000 (30.0 days in seconds)
  • 37.
    Enums String Custom enumtypes 4 bytes Values are human-readable in SQL On Rails < 7 you can use activerecord-postgres_enum gem See 8.7 Enumerated Types page in PostgreSQL docs # In migration (Rails 7+): create_enum :status, ["draft", "published", "archive change_table :posts do |t| t.enum :status, enum_type: "status", default: "dra end # In the application code: class Article < ApplicationRecord enum :status, { draft: "draft", published: "publis end Article.last.status #=> "draft" Article.last.draft? #=> true Article.last.published! # UPDATE articles SET status = 'published' WHERE id CREATE TYPE status AS ENUM ('draft', 'published', ' ALTER TABLE posts ADD COLUMN "status" status NOT NU INSERT INTO posts (status) VALUES ('published'); INSERT INTO posts (status) VALUES ('draft'); SELECT id, status FROM posts; id | status ----+------------ 2 | draft 1 | published
  • 38.
    Inside ActiveRecord How datatypesare working under the hood
  • 39.
    Example of compositetype use case What if we want to: 1. Store products with prices in different currencies in one table 2. Work with price and currency as a whole There is a great money gem in Ruby, but how to do it in SQL? 3. And do some calculations without having to write complex SQL queries with joins. Before: After: User (string currency) has many Product (numeric price) User (string currency) # only as a setting! has many Product (true_money price (string+numeric))
  • 40.
    Create custom datatype Declarecomposite datatype in the database: CREATE TYPE _true_money AS ( currency varchar, amount numeric ); -- type with constraints to allow: -- - either NULL value (no price, can be forbidden by NOT NULL) -- - or value with both currency and amount specified CREATE DOMAIN true_money AS _true_money CHECK ( value IS NULL AND value IS DISTINCT FROM (null, null)::_true_money OR ((value).currency IS NOT NULL AND (value).amount IS NOT NULL) );
  • 41.
    Fun fact aboutcomposite datatypes Every table defines own datatype which can be used elsewhere But don’t use it in reality, please! (There are limitations) CREATE TABLE "inner" ( v1 integer, v2 text ); CREATE TABLE "outer" (v inner); INSERT INTO "outer" (v) VALUES ((42,'Hello world!')); SELECT * FROM "outer"; v ------------------- (42,"Hello world!")
  • 42.
    Use composite datatype See8.16. Composite Types in PostgreSQL docs for more advices and caveats. ALTER TABLE tests ADD COLUMN price true_money; INSERT INTO tests (price) VALUES (('JPY',10000.0)); INSERT INTO tests (price) VALUES ('("JPY",100.0)'); SELECT price FROM tests; -- (JPY,10000.0),(JPY,100.0) SELECT (price).currency, (price).amount FROM tests; currency | amount ----------+--------- JPY | 10000.0 JPY | 100.0
  • 43.
    Declare it inActiveRecord module ActiveRecord module ConnectionAdapters module PostgreSQL module OID class TrueMoney < Type::Value def type :true_money end # Here will be (de)serialization code end end end end end
  • 44.
    Deserialization And "(USD,4.2)" becomes#<Money fractional:420 currency:USD> in Ruby ✨ def deserialize(value) return nil if value.nil? currency, amount = value.match(/A("?(w+)"?,(d+(?:.d+)?))z/).captures ::Money.from_amount(BigDecimal(amount), currency) end ` ` ` `
  • 45.
    Casting user input Addability to assign ready object to attribute: def cast(value) return nil if value.nil? case value when ::Money then value when String then deserialize(value) else raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money" end end
  • 46.
    Deserialization and inputcasting at once Replaces both deserialize and cast , also handles nil s. def cast_value(value) case value when ::Money then value when String currency, amount = value.match(/A("?(w+)"?,(d+(?:.d+)?))z/).captures ::Money.from_amount(BigDecimal(amount), currency) else raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money" end end end ` ` ` ` ` `
  • 47.
    Serialization for thedatabase Reuse available serialization methods for subtypes. def serialize(value) return nil if value.nil? # ActiveRecord will handle NULL for us amount_t = ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::Decimal.new currency_t = ::ActiveModel::Type::String.new "(#{currency_t.serialize(value.currency.iso_code).inspect},#{amount_t.serialize(value.amount)})" end
  • 48.
    Register datatype inActiveRecord PostgreSQLAdapterWithTrueMoney = Module.new do def initialize_type_map(m = type_map) m.register_type "true_money" do |*_args, _sql_type| ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney.new end m.alias_type "_true_money", "true_money" super end end ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.prepend(PostgreSQLAdapterWithTrueMoney) ActiveRecord::Type.register( :true_money, ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney, adapter: :postgresql, )
  • 49.
    Also add itfor migrations… module SchemaStatementsWithTrueMoney def type_to_sql(type, limit: nil, precision: nil, scale: nil, array: nil, **) case type.to_s when 'true_money' then 'true_money' else super end end end ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaStatements.prepend(SchemaStatementsWithTrueMoney) module ActiveRecord::ConnectionAdapters::PostgreSQL::ColumnMethods def true_money(name, options = {}) column(name, :true_money, options) end end
  • 50.
    Ready to use! railsg model Product title price:true_money rails db:migrate rails console Product.create!(title: "Something", price: Money.from_amount(100000, “USD”)) Product.last # => #<Product id: 1, title: "Something", price: 100000.00 USD>
  • 51.
    But it isnot done yet! A lot of stuff has to be done to make a full-featured datatype in SQL… But then you can do a lot in SQL: After all, one might re-invent abandoned pg-currency CREATE FUNCTION true_money_add(a true_money, b true_ BEGIN IF (a).currency != (b).currency THEN RAISE EXCEPTION '% can not be added to % - cur END IF; RETURN ((a).currency, (a).amount + (b).amount); END; $$ IMMUTABLE RETURNS NULL ON NULL INPUT LANGUAGE plp CREATE OPERATOR +(leftarg=true_money, rightarg=true CREATE FUNCTION true_money_sum(state true_money, va BEGIN IF value IS NULL AND state IS NULL THEN RETURN NULL; END IF; IF state IS NULL THEN RETURN value; END IF; RETURN state + value; END; $$ IMMUTABLE LANGUAGE plpgsql; CREATE AGGREGATE sum (true_money) (sfunc = true_mon SELECT (price).currency AS currency, sum(price) AS total FROM products GROUP BY currency;
  • 52.
    Play with ityourself! https://gist.github.com/Envek/780b917e72a86c123776ee763b8dd986
  • 53.
    Gems, gems, gems! EverythingThat Can Be Invented Has Been Invented
  • 54.
    Gems for datatypes torque-postgresql— standard datatypes not (yet) supported by Rails. activerecord-postgis-adapter — all the power of PostGIS extension in Ruby. activerecord-postgres_enum — support enum in migrations and schema (before Rails 7)
  • 55.
    Gems for otherPostgreSQL features Because PostgreSQL is much more than datatypes. ActiveRecordExtended — functions for datatypes and DSL for queries fx — make schema.rb great again with triggers scenic — add support for views order_query — keyset-pagination for your models postgresql_cursor — get more data from the database efficiently And also martian pg_trunk gem to rule them all get fx , scenic , object dependency management and more within a single gem! ` ` ` ` structure.sql no more! Make schema.rb great again! ` ` ` ` pg_trunk gem
  • 56.
  • 57.
    Thank you! @Envek @Envek @Envek @Envek github.com/Envek @evilmartians @evilmartians @evil-martians @evil.martians evilmartians.com Our awesomeblog: evilmartians.com/chronicles! (Special thanks to @hachi8833 for translating ≈20 posts to Japanese!) See these slides at envek.github.io/kaigionrails-postgresql-as-seen-by-rubyists Theseslides