Skip to content

Conversation

@kirs
Copy link
Member

@kirs kirs commented Nov 14, 2016

WTF is schema cache: Rails has support for storing the schema information in db/schema_cache.dump to avoid hitting database with SHOW FULL FIELDS. This feature was introduced in #5162.

The problem now is that we dump all many things into Marshal, including internal column classes.
When you try to read schema cache generated by Rails 4.2 in Rails 5.0:

uninitialized constant ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter::Column (NameError) 

(because obviously, internal AR classes have been refactored).

Solution proposal

By serializing basic schema information into YAML instead of Marshal dump, we could make schema cache compatible between Rails versions and avoid exceptions like a described above.

Concerns

  1. I had to make connection.column_definitions method public
  2. Instead of messing with SchemaCache classes and its dynamic cache methods, I introduced PersistedSchemaCache that would be immutable.

review @rafaelfranca @sgrif

@rails-bot
Copy link

r? @eileencodes

(@rails-bot has picked a reviewer for you, use r? to override)

@kirs
Copy link
Member Author

kirs commented Nov 14, 2016

It would be great to get some feedback about the code while I'm fixing the CI failures.

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it is a good thing to have this PersistedSchemaCache class. It is duplicating a lot of code that we already have in SchemaCache and there is no real gain for the users and neither in terms of code maintenance in the Rails codebase. Now instead of one place to change the schema cache you have to change two places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need those aliases here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not being called so why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a public method that someone may call, and since this class is immutable, we'd like to avoid that call.

@rafaelfranca
Copy link
Member

Another thing. Instead of custom methods we should implement init_with and encode_with to be able to call Yaml.load/dump

@kirs
Copy link
Member Author

kirs commented Nov 14, 2016

I don't know if it is a good thing to have this PersistedSchemaCache class. It is duplicating a lot of code that we already have in SchemaCache and there is no real gain for the users and neither in terms of code maintenance in the Rails codebase. Now instead of one place to change the schema cache you have to change two places.

That's a good point that I was going to discuss here.
SchemaCache has a very dynamic nature: it only knows about some tables that has been loaded, and when you request a table that wasn't loaded yet it will query that table. It means that the knowledge about tables that SchemaCache has is always incomplete.

That said, to make SchemaCache know about all tables we currently have to do con.data_sources.each { |table| con.schema_cache.add(table) } and that looks a bit like a hack.

In a contrast, PersistedSchemaCache provides the cache that always has the complete (finite) knowledge about tables. It will never do any additional queries.

Another reason that we can't simply implement SchemaCache#encode_with is because SchemaCache stores the columns that are already initialized. In the YAML cache that would work between Rails versions we'd like to store the result of SHOW FULL TABLE, without internal Column classes.

@rafaelfranca
Copy link
Member

rafaelfranca commented Nov 14, 2016

SchemaCache has a very dynamic nature: it only knows about some tables that has been loaded, and when you request a table that wasn't loaded yet it will query that table. It means that the knowledge about tables that SchemaCache has is always incomplete.

That is exactly the behavior we want. If the SchemaCache is loaded by a old dump we don't want the application to raise an exception, so it is better that it query that table. The schema cache should behave like a cache, not as the ultimate source of truth. The database should be the ultimate source of truth.

That said, to make SchemaCache know about all tables we currently have to do con.data_sources.each { |table| con.schema_cache.add(table) } and that looks a bit like a hack.

I'd expect that the SchemaCache already have all information about the tables of that connection when duping and what is exactly what was being done https://github.com/rails/rails/pull/27042/files#diff-28a5ae383b291583c513ad8eeed99a3aL274, you just moved this "hack" to inside of the new class.

Another reason that we can't simply implement SchemaCache#encode_with is because SchemaCache stores the columns that are already initialized. In the YAML cache that would work between Rails versions we'd like to store the result of SHOW FULL TABLE, without internal Column classes.

I don't get it. We can write any format we want with encond_with, so we don't need to write the internal Column classes.

@kirs
Copy link
Member Author

kirs commented Nov 15, 2016

Imagine we get rid of PersistedSchemaCache and move into SchemaCache#init_with and SchemaCache#encode_with.

 def encode_with(coder) data_sources = @data_sources.keys coder['columns'] = data_sources.each_with_object({}) do |table_name, acc| # we still have to call `column_definitions` from `encode_with` to get raw column information # because we don't want to encode built Column objects acc[table_name] = connection.column_definitions(table_name).to_a end coder['primary_keys'] = data_sources.each_with_object({}) do |table_name, acc| acc[table_name] = primary_keys(table_name) end coder['version'] = ActiveRecord::Migrator.current_version end 

This one would work, beside the fact that we'll still need to query SHOW FULL FIELDS to get the raw data about columns.

But I don't see a way to use init_with here:

 def init_with(coder) columns_by_table = {} columns_hash = {} payload.fetch(:columns).each do |table_name, columns| columns_by_table[table_name] = columns.map do |column_meta| # here, we need to call `new_column_from_field` but we don't have access to connection # from init_with connection.new_column_from_field(table_name, column_meta) end columns_hash[table_name] = Hash[columns_by_table[table_name].map { |col| [col.name, col] }] end end

Because connection is not available there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need this line anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is just cache.version again

Copy link
Contributor

@kaspth kaspth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code wise LGTM

Is there any backward compatibility concerns by replacing the dump file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_cache.dump -> schema_cache.yml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_cache.dump -> schema_cache.yml

@kirs
Copy link
Member Author

kirs commented Nov 21, 2016

@eugeneius thanks!

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of rubocop rules are broken in this PR. autocorrect should fix them.

Is there any backward compatibility concerns by replacing the dump file?

No. This actually make it backwards compatible. The only non-backward compatible problem we have now is Rails 5.1 not being able to load the dump file using marshal and the first deploy will have to generate the yaml version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this file should be 'test/assets/schema_dump_5_1.yml'. I don't think it is going to be safe to backport this.

@kaspth
Copy link
Contributor

kaspth commented Nov 27, 2016

@rafaelfranca got it 👍

@kirs
Copy link
Member Author

kirs commented Nov 28, 2016

@rafaelfranca thanks for review! I updated the PR.

@kirs
Copy link
Member Author

kirs commented Dec 13, 2016

@rafaelfranca rafaelfranca merged commit ddf81c5 into rails:master Dec 13, 2016
rafaelfranca added a commit that referenced this pull request Dec 13, 2016
@kaspth
Copy link
Contributor

kaspth commented Dec 13, 2016

BAM 🖐

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

7 participants