Schema cache in YAML #27042

kirs · 2016-11-14T15:09:07Z

WTF is schema cache: Rails has support for storing the schema information in db/schema_cache.dump to avoid hitting database with SHOW FULL FIELDS. This feature was introduced in #5162.

The problem now is that we dump all many things into Marshal, including internal column classes.
When you try to read schema cache generated by Rails 4.2 in Rails 5.0:

uninitialized constant ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter::Column (NameError)

(because obviously, internal AR classes have been refactored).

Solution proposal

By serializing basic schema information into YAML instead of Marshal dump, we could make schema cache compatible between Rails versions and avoid exceptions like a described above.

Concerns

I had to make connection.column_definitions method public
Instead of messing with SchemaCache classes and its dynamic cache methods, I introduced PersistedSchemaCache that would be immutable.

review @rafaelfranca @sgrif

rails-bot · 2016-11-14T15:09:09Z

r? @eileencodes

(@rails-bot has picked a reviewer for you, use r? to override)

kirs · 2016-11-14T15:54:12Z

It would be great to get some feedback about the code while I'm fixing the CI failures.

rafaelfranca

I don't know if it is a good thing to have this PersistedSchemaCache class. It is duplicating a lot of code that we already have in SchemaCache and there is no real gain for the users and neither in terms of code maintenance in the Rails codebase. Now instead of one place to change the schema cache you have to change two places.

rafaelfranca · 2016-11-14T19:46:12Z

activerecord/lib/active_record/connection_adapters/persisted_schema_cache.rb

I think you don't need those aliases here.

rafaelfranca · 2016-11-14T19:46:37Z

activerecord/lib/active_record/connection_adapters/persisted_schema_cache.rb

This is not being called so why do we need this?

Because it's a public method that someone may call, and since this class is immutable, we'd like to avoid that call.

rafaelfranca · 2016-11-14T20:03:07Z

Another thing. Instead of custom methods we should implement init_with and encode_with to be able to call Yaml.load/dump

kirs · 2016-11-14T21:24:17Z

I don't know if it is a good thing to have this PersistedSchemaCache class. It is duplicating a lot of code that we already have in SchemaCache and there is no real gain for the users and neither in terms of code maintenance in the Rails codebase. Now instead of one place to change the schema cache you have to change two places.

That's a good point that I was going to discuss here.
SchemaCache has a very dynamic nature: it only knows about some tables that has been loaded, and when you request a table that wasn't loaded yet it will query that table. It means that the knowledge about tables that SchemaCache has is always incomplete.

That said, to make SchemaCache know about all tables we currently have to do con.data_sources.each { |table| con.schema_cache.add(table) } and that looks a bit like a hack.

In a contrast, PersistedSchemaCache provides the cache that always has the complete (finite) knowledge about tables. It will never do any additional queries.

Another reason that we can't simply implement SchemaCache#encode_with is because SchemaCache stores the columns that are already initialized. In the YAML cache that would work between Rails versions we'd like to store the result of SHOW FULL TABLE, without internal Column classes.

rafaelfranca · 2016-11-14T22:18:56Z

SchemaCache has a very dynamic nature: it only knows about some tables that has been loaded, and when you request a table that wasn't loaded yet it will query that table. It means that the knowledge about tables that SchemaCache has is always incomplete.

That is exactly the behavior we want. If the SchemaCache is loaded by a old dump we don't want the application to raise an exception, so it is better that it query that table. The schema cache should behave like a cache, not as the ultimate source of truth. The database should be the ultimate source of truth.

That said, to make SchemaCache know about all tables we currently have to do con.data_sources.each { |table| con.schema_cache.add(table) } and that looks a bit like a hack.

I'd expect that the SchemaCache already have all information about the tables of that connection when duping and what is exactly what was being done https://github.com/rails/rails/pull/27042/files#diff-28a5ae383b291583c513ad8eeed99a3aL274, you just moved this "hack" to inside of the new class.

Another reason that we can't simply implement SchemaCache#encode_with is because SchemaCache stores the columns that are already initialized. In the YAML cache that would work between Rails versions we'd like to store the result of SHOW FULL TABLE, without internal Column classes.

I don't get it. We can write any format we want with encond_with, so we don't need to write the internal Column classes.

kirs · 2016-11-15T01:47:39Z

Imagine we get rid of PersistedSchemaCache and move into SchemaCache#init_with and SchemaCache#encode_with.

 def encode_with(coder) data_sources = @data_sources.keys coder['columns'] = data_sources.each_with_object({}) do |table_name, acc| # we still have to call `column_definitions` from `encode_with` to get raw column information # because we don't want to encode built Column objects acc[table_name] = connection.column_definitions(table_name).to_a end coder['primary_keys'] = data_sources.each_with_object({}) do |table_name, acc| acc[table_name] = primary_keys(table_name) end coder['version'] = ActiveRecord::Migrator.current_version end

This one would work, beside the fact that we'll still need to query SHOW FULL FIELDS to get the raw data about columns.

But I don't see a way to use init_with here:

 def init_with(coder) columns_by_table = {} columns_hash = {} payload.fetch(:columns).each do |table_name, columns| columns_by_table[table_name] = columns.map do |column_meta| # here, we need to call `new_column_from_field` but we don't have access to connection # from init_with connection.new_column_from_field(table_name, column_meta) end columns_hash[table_name] = Hash[columns_by_table[table_name].map { |col| [col.name, col] }] end end

Because connection is not available there.

kaspth · 2016-11-18T20:07:10Z

activerecord/lib/active_record/railtie.rb

Don't think we need this line anymore

kaspth · 2016-11-18T20:07:57Z

activerecord/lib/active_record/railtie.rb

Think this is just cache.version again

kaspth

Code wise LGTM

Is there any backward compatibility concerns by replacing the dump file?

eugeneius · 2016-11-19T08:29:36Z

activerecord/lib/active_record/railties/databases.rake

schema_cache.dump -> schema_cache.yml

eugeneius · 2016-11-19T08:29:38Z

activerecord/lib/active_record/railties/databases.rake

schema_cache.dump -> schema_cache.yml

kirs · 2016-11-21T02:34:19Z

@eugeneius thanks!

rafaelfranca

A lot of rubocop rules are broken in this PR. autocorrect should fix them.

Is there any backward compatibility concerns by replacing the dump file?

No. This actually make it backwards compatible. The only non-backward compatible problem we have now is Rails 5.1 not being able to load the dump file using marshal and the first deploy will have to generate the yaml version.

rafaelfranca · 2016-11-25T10:40:27Z

activerecord/test/cases/connection_adapters/schema_cache_test.rb

The name of this file should be 'test/assets/schema_dump_5_1.yml'. I don't think it is going to be safe to backport this.

kaspth · 2016-11-27T16:42:58Z

@rafaelfranca got it 👍

kirs · 2016-11-28T03:20:56Z

@rafaelfranca thanks for review! I updated the PR.

kirs · 2016-12-13T14:47:48Z

ping @rafaelfranca

[ci skip]

kaspth · 2016-12-13T18:36:26Z

BAM 🖐

rails-bot assigned eileencodes Nov 14, 2016

rafaelfranca assigned rafaelfranca and unassigned eileencodes Nov 14, 2016

rafaelfranca requested changes Nov 14, 2016

View reviewed changes

maclover7 added activerecord needs feedback labels Nov 14, 2016

kaspth reviewed Nov 18, 2016

View reviewed changes

activerecord/lib/active_record/railtie.rb Outdated

Copy link

Contributor

kaspth Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need this line anymore

kaspth reviewed Nov 18, 2016

View reviewed changes

activerecord/lib/active_record/railtie.rb Outdated

Copy link

Contributor

kaspth Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is just cache.version again

kaspth approved these changes Nov 18, 2016

View reviewed changes

eugeneius reviewed Nov 19, 2016

View reviewed changes

activerecord/lib/active_record/railties/databases.rake Outdated

Copy link

Member

eugeneius Nov 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_cache.dump -> schema_cache.yml

eugeneius reviewed Nov 19, 2016

View reviewed changes

activerecord/lib/active_record/railties/databases.rake Outdated

Copy link

Member

eugeneius Nov 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_cache.dump -> schema_cache.yml

rafaelfranca requested changes Nov 25, 2016

View reviewed changes

rafaelfranca reviewed Nov 25, 2016

View reviewed changes

Use YAML to serialize schema cache

4c00c6e

rafaelfranca merged commit ddf81c5 into rails:master Dec 13, 2016

rafaelfranca added a commit that referenced this pull request Dec 13, 2016

Add CHANGELOG entry to #27042

34a1c7e

[ci skip]

maclover7 removed the needs feedback label Dec 13, 2016

metaskills mentioned this pull request Dec 18, 2016

Support For Schema Cache File customink/secondbase#37

Closed

kirs mentioned this pull request Dec 31, 2016

Update schema cache doc in guides/command_line #27525

Merged

metaskills mentioned this pull request Jan 24, 2017

Schema Cache Support customink/secondbase#40

Merged

matthewd mentioned this pull request Mar 29, 2019

Make SchemaCache loading faster #35785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Schema cache in YAML #27042

Schema cache in YAML #27042

Uh oh!

kirs commented Nov 14, 2016

rails-bot commented Nov 14, 2016

kirs commented Nov 14, 2016

rafaelfranca left a comment

rafaelfranca Nov 14, 2016

rafaelfranca Nov 14, 2016

kirs Nov 14, 2016

rafaelfranca commented Nov 14, 2016

kirs commented Nov 14, 2016

rafaelfranca commented Nov 14, 2016 •

edited

Loading

kirs commented Nov 15, 2016

kaspth Nov 18, 2016

kaspth Nov 18, 2016

kaspth left a comment

eugeneius Nov 19, 2016

eugeneius Nov 19, 2016

kirs commented Nov 21, 2016

rafaelfranca left a comment

rafaelfranca Nov 25, 2016

kaspth commented Nov 27, 2016

kirs commented Nov 28, 2016

kirs commented Dec 13, 2016

kaspth commented Dec 13, 2016

Labels

7 participants

Uh oh!

Schema cache in YAML #27042

Schema cache in YAML #27042

Uh oh!

Conversation

kirs commented Nov 14, 2016

Solution proposal

Concerns

rails-bot commented Nov 14, 2016

kirs commented Nov 14, 2016

rafaelfranca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaelfranca commented Nov 14, 2016

kirs commented Nov 14, 2016

rafaelfranca commented Nov 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kirs commented Nov 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaspth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kirs commented Nov 21, 2016

rafaelfranca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaspth commented Nov 27, 2016

kirs commented Nov 28, 2016

kirs commented Dec 13, 2016

kaspth commented Dec 13, 2016

Labels

7 participants

rafaelfranca commented Nov 14, 2016 •

edited

Loading