PostgreSQL怎么调用mergeruns函数

发布时间：2021-11-09 11:51:21 来源：亿速云阅读：152 作者：iii 栏目：关系型数据库

这篇文章主要介绍“PostgreSQL怎么调用mergeruns函数”，在日常操作中，相信很多人在PostgreSQL怎么调用mergeruns函数问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”PostgreSQL怎么调用mergeruns函数”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

TupleTableSlot
执行器在”tuple table”中存储元组,这个表是各自独立的TupleTableSlots链表.

/*----------  * The executor stores tuples in a "tuple table" which is a List of  * independent TupleTableSlots.  There are several cases we need to handle:  *      1. physical tuple in a disk buffer page  *      2. physical tuple constructed in palloc'ed memory  *      3. "minimal" physical tuple constructed in palloc'ed memory  *      4. "virtual" tuple consisting of Datum/isnull arrays  * 执行器在"tuple table"中存储元组,这个表是各自独立的TupleTableSlots链表.  * 有以下情况需要处理:  *      1. 磁盘缓存页中的物理元组  *      2. 在已分配内存中构造的物理元组  *      3. 在已分配内存中构造的"minimal"物理元组  *      4. 含有Datum/isnull数组的"virtual"虚拟元组  *  * The first two cases are similar in that they both deal with "materialized"  * tuples, but resource management is different.  For a tuple in a disk page  * we need to hold a pin on the buffer until the TupleTableSlot's reference  * to the tuple is dropped; while for a palloc'd tuple we usually want the  * tuple pfree'd when the TupleTableSlot's reference is dropped.  * 最上面2种情况跟"物化"元组的处理方式类似,但资源管理是不同的.  * 对于在磁盘页中的元组,需要pin在缓存中直至TupleTableSlot依赖的元组被清除,  *   而对于通过palloc分配的元组在TupleTableSlot依赖被清除后通常希望使用pfree释放  *  * A "minimal" tuple is handled similarly to a palloc'd regular tuple.  * At present, minimal tuples never are stored in buffers, so there is no  * parallel to case 1.  Note that a minimal tuple has no "system columns".  * (Actually, it could have an OID, but we have no need to access the OID.)  * "minimal"元组与通常的palloc分配的元组处理类似.  * 截止目前为止,"minimal"元组不会存储在缓存中,因此对于第一种情况不会存在并行的问题.  * 注意"minimal"没有"system columns"系统列  * (实际上,可以有OID,但不需要访问OID列)  *  * A "virtual" tuple is an optimization used to minimize physical data  * copying in a nest of plan nodes.  Any pass-by-reference Datums in the  * tuple point to storage that is not directly associated with the  * TupleTableSlot; generally they will point to part of a tuple stored in  * a lower plan node's output TupleTableSlot, or to a function result  * constructed in a plan node's per-tuple econtext.  It is the responsibility  * of the generating plan node to be sure these resources are not released  * for as long as the virtual tuple needs to be valid.  We only use virtual  * tuples in the result slots of plan nodes --- tuples to be copied anywhere  * else need to be "materialized" into physical tuples.  Note also that a  * virtual tuple does not have any "system columns".  * "virtual"元组是用于在嵌套计划节点中拷贝时最小化物理数据的优化.  * 所有通过引用传递指向与TupleTableSlot非直接相关的存储的元组的Datums使用,  *   通常它们会指向存储在低层节点输出的TupleTableSlot中的元组的一部分,  *   或者指向在计划节点的per-tuple内存上下文econtext中构造的函数结果.  * 产生计划节点的时候有责任确保这些资源未被释放,确保virtual元组是有效的.  * 我们使用计划节点中的结果slots中的虚拟元组 --- 元组会拷贝到其他地方需要"物化"到物理元组中.  * 注意virtual元组不需要有"system columns"  *  * It is also possible for a TupleTableSlot to hold both physical and minimal  * copies of a tuple.  This is done when the slot is requested to provide  * the format other than the one it currently holds.  (Originally we attempted  * to handle such requests by replacing one format with the other, but that  * had the fatal defect of invalidating any pass-by-reference Datums pointing  * into the existing slot contents.)  Both copies must contain identical data  * payloads when this is the case.  * TupleTableSlot包含物理和minimal元组拷贝是可能的.  * 在slot需要提供格式化而不是当前持有的格式时会出现这种情况.  * (原始的情况是我们准备通过另外一种格式进行替换来处理这种请求,但在校验引用传递Datums时会出现致命错误)  * 同时在这种情况下,拷贝必须含有唯一的数据payloads.  *  * The Datum/isnull arrays of a TupleTableSlot serve double duty.  When the  * slot contains a virtual tuple, they are the authoritative data.  When the  * slot contains a physical tuple, the arrays contain data extracted from  * the tuple.  (In this state, any pass-by-reference Datums point into  * the physical tuple.)  The extracted information is built "lazily",  * ie, only as needed.  This serves to avoid repeated extraction of data  * from the physical tuple.  * TupleTableSlot中的Datum/isnull数组有双重职责.  * 在slot包含虚拟元组时,它们是authoritative(权威)数据.  * 在slot包含物理元组时,时包含从元组中提取的数据的数组.  * (在这种情况下,所有通过引用传递的Datums指向物理元组)  * 提取的信息通过'lazily'在需要的时候才构建.  * 这样可以避免从物理元组的重复数据提取.  *  * A TupleTableSlot can also be "empty", holding no valid data.  This is  * the only valid state for a freshly-created slot that has not yet had a  * tuple descriptor assigned to it.  In this state, tts_isempty must be  * true, tts_shouldFree false, tts_tuple NULL, tts_buffer InvalidBuffer,  * and tts_nvalid zero.  * TupleTableSlot可能为"empty",没有有效数据.  * 对于新鲜创建仍未分配描述的的slot来说这是唯一有效的状态.  * 在这种状态下,tts_isempty必须为T,tts_shouldFree为F, tts_tuple为NULL,  *   tts_buffer为InvalidBuffer,tts_nvalid为0.  *  * The tupleDescriptor is simply referenced, not copied, by the TupleTableSlot  * code.  The caller of ExecSetSlotDescriptor() is responsible for providing  * a descriptor that will live as long as the slot does.  (Typically, both  * slots and descriptors are in per-query memory and are freed by memory  * context deallocation at query end; so it's not worth providing any extra  * mechanism to do more.  However, the slot will increment the tupdesc  * reference count if a reference-counted tupdesc is supplied.)  * tupleDescriptor只是简单的引用并没有通过TupleTableSlot中的代码进行拷贝.  * ExecSetSlotDescriptor()的调用者有责任提供与slot生命周期一样的描述符.  * (典型的,不管是slots还是描述符会在per-query内存中,  *  并且会在查询结束时通过内存上下文的析构器释放,因此不需要提供额外的机制来处理.  *  但是,如果使用了引用计数型tupdesc,slot会增加tupdesc引用计数)  *  * When tts_shouldFree is true, the physical tuple is "owned" by the slot  * and should be freed when the slot's reference to the tuple is dropped.  * 在tts_shouldFree为T的情况下,物理元组由slot持有,并且在slot引用元组被清除时释放内存.  *  * If tts_buffer is not InvalidBuffer, then the slot is holding a pin  * on the indicated buffer page; drop the pin when we release the  * slot's reference to that buffer.  (tts_shouldFree should always be  * false in such a case, since presumably tts_tuple is pointing at the  * buffer page.)  * 如tts_buffer不是InvalidBuffer,那么slot持有缓存页中的pin,在释放引用该buffer的slot时会清除该pin.  * (tts_shouldFree通常来说应为F,因为tts_tuple会指向缓存页)  *  * tts_nvalid indicates the number of valid columns in the tts_values/isnull  * arrays.  When the slot is holding a "virtual" tuple this must be equal  * to the descriptor's natts.  When the slot is holding a physical tuple  * this is equal to the number of columns we have extracted (we always  * extract columns from left to right, so there are no holes).  * tts_nvalid指示了tts_values/isnull数组中的有效列数.  * 如果slot含有虚拟元组,该字段必须跟描述符的natts一样.  * 在slot含有物理元组时,该字段等于我们提取的列数.  * (我们通常从左到右提取列,因此不会有空洞存在)  *  * tts_values/tts_isnull are allocated when a descriptor is assigned to the  * slot; they are of length equal to the descriptor's natts.  * 在描述符分配给slot时tts_values/tts_isnull会被分配内存,长度与描述符natts长度一样.  *  * tts_mintuple must always be NULL if the slot does not hold a "minimal"  * tuple.  When it does, tts_mintuple points to the actual MinimalTupleData  * object (the thing to be pfree'd if tts_shouldFreeMin is true).  If the slot  * has only a minimal and not also a regular physical tuple, then tts_tuple  * points at tts_minhdr and the fields of that struct are set correctly  * for access to the minimal tuple; in particular, tts_minhdr.t_data points  * MINIMAL_TUPLE_OFFSET bytes before tts_mintuple.  This allows column  * extraction to treat the case identically to regular physical tuples.  * 如果slot没有包含minimal元组,tts_mintuple通常必须为NULL.  * 如含有,则tts_mintuple执行实际的MinimalTupleData对象(如tts_shouldFreeMin为T,则需要通过pfree释放内存).  * 如果slot只有一个minimal而没有通常的物理元组,那么tts_tuple指向tts_minhdr,  *   结构体的其他字段会被正确的设置为用于访问minimal元组.  *   特别的, tts_minhdr.t_data指向tts_mintuple前的MINIMAL_TUPLE_OFFSET字节.  * 这可以让列提取可以独立处理通常的物理元组.  *  * tts_slow/tts_off are saved state for slot_deform_tuple, and should not  * be touched by any other code.  * tts_slow/tts_off用于存储slot_deform_tuple状态,不应通过其他代码修改.  *----------  */ typedef struct TupleTableSlot {     NodeTag     type;//Node标记     //如slot为空,则为T     bool        tts_isempty;    /* true = slot is empty */     //是否需要pfree tts_tuple?     bool        tts_shouldFree; /* should pfree tts_tuple? */     //是否需要pfree tts_mintuple?     bool        tts_shouldFreeMin;  /* should pfree tts_mintuple? */ #define FIELDNO_TUPLETABLESLOT_SLOW 4     //为slot_deform_tuple存储状态?     bool        tts_slow;       /* saved state for slot_deform_tuple */ #define FIELDNO_TUPLETABLESLOT_TUPLE 5     //物理元组,如为虚拟元组则为NULL     HeapTuple   tts_tuple;      /* physical tuple, or NULL if virtual */ #define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 6     //slot中的元组描述符     TupleDesc   tts_tupleDescriptor;    /* slot's tuple descriptor */     //slot所在的上下文     MemoryContext tts_mcxt;     /* slot itself is in this context */     //元组缓存,如无则为InvalidBuffer     Buffer      tts_buffer;     /* tuple's buffer, or InvalidBuffer */ #define FIELDNO_TUPLETABLESLOT_NVALID 9     //tts_values中的有效值     int         tts_nvalid;     /* # of valid values in tts_values */ #define FIELDNO_TUPLETABLESLOT_VALUES 10     //当前每个属性的值     Datum      *tts_values;     /* current per-attribute values */ #define FIELDNO_TUPLETABLESLOT_ISNULL 11     //isnull数组     bool       *tts_isnull;     /* current per-attribute isnull flags */     //minimal元组,如无则为NULL     MinimalTuple tts_mintuple;  /* minimal tuple, or NULL if none */     //在minimal情况下的工作空间     HeapTupleData tts_minhdr;   /* workspace for minimal-tuple-only case */ #define FIELDNO_TUPLETABLESLOT_OFF 14     //slot_deform_tuple的存储状态     uint32      tts_off;        /* saved state for slot_deform_tuple */     //不能被变更的描述符(固定描述符)     bool        tts_fixedTupleDescriptor;   /* descriptor can't be changed */ } TupleTableSlot; /* base tuple table slot type */ typedef struct TupleTableSlot {     NodeTag     type;//Node标记 #define FIELDNO_TUPLETABLESLOT_FLAGS 1     uint16      tts_flags;      /* 布尔状态;Boolean states */ #define FIELDNO_TUPLETABLESLOT_NVALID 2     AttrNumber  tts_nvalid;     /* 在tts_values中有多少有效的values;# of valid values in tts_values */     const TupleTableSlotOps *const tts_ops; /* slot的实际实现;implementation of slot */ #define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4     TupleDesc   tts_tupleDescriptor;    /* slot的元组描述符;slot's tuple descriptor */ #define FIELDNO_TUPLETABLESLOT_VALUES 5     Datum      *tts_values;     /* 当前属性值;current per-attribute values */ #define FIELDNO_TUPLETABLESLOT_ISNULL 6     bool       *tts_isnull;     /* 当前属性isnull标记;current per-attribute isnull flags */     MemoryContext tts_mcxt;     /*内存上下文; slot itself is in this context */ } TupleTableSlot; /* routines for a TupleTableSlot implementation */ //TupleTableSlot的"小程序" struct TupleTableSlotOps {     /* Minimum size of the slot */     //slot的最小化大小     size_t          base_slot_size;     /* Initialization. */     //初始化方法     void (*init)(TupleTableSlot *slot);     /* Destruction. */     //析构方法     void (*release)(TupleTableSlot *slot);     /*      * Clear the contents of the slot. Only the contents are expected to be      * cleared and not the tuple descriptor. Typically an implementation of      * this callback should free the memory allocated for the tuple contained      * in the slot.      * 清除slot中的内容。      * 只希望清除内容，而不希望清除元组描述符。      * 通常，这个回调的实现应该释放为slot中包含的元组分配的内存。      */     void (*clear)(TupleTableSlot *slot);     /*      * Fill up first natts entries of tts_values and tts_isnull arrays with      * values from the tuple contained in the slot. The function may be called      * with natts more than the number of attributes available in the tuple,      * in which case it should set tts_nvalid to the number of returned      * columns.      * 用slot中包含的元组的值填充tts_values和tts_isnull数组的第一个natts条目。      * 在调用该函数时，natts可能多于元组中可用属性的数量，在这种情况下，      *   应该将tts_nvalid设置为返回列的数量。      */     void (*getsomeattrs)(TupleTableSlot *slot, int natts);     /*      * Returns value of the given system attribute as a datum and sets isnull      * to false, if it's not NULL. Throws an error if the slot type does not      * support system attributes.      * 将给定系统属性的值作为基准返回，如果不为NULL，      *   则将isnull设置为false。如果slot类型不支持系统属性，则引发错误。      */     Datum (*getsysattr)(TupleTableSlot *slot, int attnum, bool *isnull);     /*      * Make the contents of the slot solely depend on the slot, and not on      * underlying resources (like another memory context, buffers, etc).      * 使slot的内容完全依赖于slot，而不是底层资源(如另一个内存上下文、缓冲区等)。      */     void (*materialize)(TupleTableSlot *slot);     /*      * Copy the contents of the source slot into the destination slot's own      * context. Invoked using callback of the destination slot.      * 将源slot的内容复制到目标slot自己的上下文中。      * 使用目标slot的回调函数调用。      */     void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot);     /*      * Return a heap tuple "owned" by the slot. It is slot's responsibility to      * free the memory consumed by the heap tuple. If the slot can not "own" a      * heap tuple, it should not implement this callback and should set it as      * NULL.      * 返回slot“拥有”的堆元组。      * slot负责释放堆元组分配的内存。      * 如果slot不能“拥有”堆元组，它不应该实现这个回调函数，应该将它设置为NULL。      */     HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);     /*      * Return a minimal tuple "owned" by the slot. It is slot's responsibility      * to free the memory consumed by the minimal tuple. If the slot can not      * "own" a minimal tuple, it should not implement this callback and should      * set it as NULL.      * 返回slot“拥有”的最小元组。      * slot负责释放最小元组分配的内存。      * 如果slot不能“拥有”最小元组，它不应该实现这个回调函数，应该将它设置为NULL。      */     MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);     /*      * Return a copy of heap tuple representing the contents of the slot. The      * copy needs to be palloc'd in the current memory context. The slot      * itself is expected to remain unaffected. It is *not* expected to have      * meaningful "system columns" in the copy. The copy is not be "owned" by      * the slot i.e. the caller has to take responsibilty to free memory      * consumed by the slot.      * 返回表示slot内容的堆元组副本。      * 需要在当前内存上下文中对副本进行内存分配palloc。      * 预计slot本身不会受到影响。      * 它不希望在副本中有有意义的“系统列”。副本不是slot“拥有”的，即调用方必须负责释放slot消耗的内存。      */     HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);     /*      * Return a copy of minimal tuple representing the contents of the slot. The      * copy needs to be palloc'd in the current memory context. The slot      * itself is expected to remain unaffected. It is *not* expected to have      * meaningful "system columns" in the copy. The copy is not be "owned" by      * the slot i.e. the caller has to take responsibilty to free memory      * consumed by the slot.      * 返回表示slot内容的最小元组的副本。      * 需要在当前内存上下文中对副本进行palloc。      * 预计slot本身不会受到影响。      * 它不希望在副本中有有意义的“系统列”。副本不是slot“拥有”的，即调用方必须负责释放slot消耗的内存。      */     MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot); }; typedef struct tupleDesc {     int         natts;          /* tuple中的属性数量;number of attributes in the tuple */     Oid         tdtypeid;       /* tuple类型的组合类型ID;composite type ID for tuple type */     int32       tdtypmod;       /* tuple类型的typmode;typmod for tuple type */     int         tdrefcount;     /* 依赖计数,如为-1,则没有依赖;reference count, or -1 if not counting */     TupleConstr *constr;        /* 约束,如无则为NULL;constraints, or NULL if none */     /* attrs[N] is the description of Attribute Number N+1 */     //attrs[N]是第N+1个属性的描述符     FormData_pg_attribute attrs[FLEXIBLE_ARRAY_MEMBER]; }  *TupleDesc;

SortState
排序运行期状态信息

/* ----------------  *   SortState information  *   排序运行期状态信息  * ----------------  */ typedef struct SortState {     //基类     ScanState   ss;             /* its first field is NodeTag */     //是否需要随机访问排序输出?     bool        randomAccess;   /* need random access to sort output? */     //结果集是否存在边界?     bool        bounded;        /* is the result set bounded? */     //如存在边界,需要多少个元组?     int64       bound;          /* if bounded, how many tuples are needed */     //是否已完成排序?     bool        sort_Done;      /* sort completed yet? */     //是否使用有界值?     bool        bounded_Done;   /* value of bounded we did the sort with */     //使用的有界值?     int64       bound_Done;     /* value of bound we did the sort with */     //tuplesort.c的私有状态     void       *tuplesortstate; /* private state of tuplesort.c */     //是否worker?     bool        am_worker;      /* are we a worker? */     //每个worker对应一个条目     SharedSortInfo *shared_info;    /* one entry per worker */ } SortState; /* ----------------  *   Shared memory container for per-worker sort information  *   per-worker排序信息的共享内存容器  * ----------------  */ typedef struct SharedSortInfo {     //worker个数?     int         num_workers;     //排序机制     TuplesortInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER]; } SharedSortInfo;

TuplesortInstrumentation
报告排序统计的数据结构.

/*  * Data structures for reporting sort statistics.  Note that  * TuplesortInstrumentation can't contain any pointers because we  * sometimes put it in shared memory.  * 报告排序统计的数据结构.  * 注意TuplesortInstrumentation不能包含指针因为有时候会把该结构体放在共享内存中.  */ typedef enum {     SORT_TYPE_STILL_IN_PROGRESS = 0,//仍然在排序中     SORT_TYPE_TOP_N_HEAPSORT,//TOP N 堆排序     SORT_TYPE_QUICKSORT,//快速排序     SORT_TYPE_EXTERNAL_SORT,//外排序     SORT_TYPE_EXTERNAL_MERGE//外排序后的合并 } TuplesortMethod;//排序方法 typedef enum {     SORT_SPACE_TYPE_DISK,//需要用上磁盘     SORT_SPACE_TYPE_MEMORY//使用内存 } TuplesortSpaceType; typedef struct TuplesortInstrumentation {     //使用的排序算法     TuplesortMethod sortMethod; /* sort algorithm used */     //排序使用空间类型     TuplesortSpaceType spaceType;   /* type of space spaceUsed represents */     //空间消耗(以K为单位)     long        spaceUsed;      /* space consumption, in kB */ } TuplesortInstrumentation;

二、源码解读

mergeruns归并所有已完成初始轮的数据.

/*  * mergeruns -- merge all the completed initial runs.  * mergeruns -- 归并所有已完成的数据.  *  * This implements steps D5, D6 of Algorithm D.  All input data has  * already been written to initial runs on tape (see dumptuples).  * 实现了算法D中的D5和D6.  * 所有输入数据已写入到磁盘上(dumptuples函数负责完成).  */ static void mergeruns(Tuplesortstate *state) {     int         tapenum,                 svTape,                 svRuns,                 svDummy;     int         numTapes;     int         numInputTapes;     Assert(state->status == TSS_BUILDRUNS);     Assert(state->memtupcount == 0);     if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL)     {         /*          * If there are multiple runs to be merged, when we go to read back          * tuples from disk, abbreviated keys will not have been stored, and          * we don't care to regenerate them.  Disable abbreviation from this          * point on.          * 如果从磁盘上读回元组时存在多个运行需要被归并,          *   缩写键不会被存储,并不关系是否需要重新生成它们.          * 在这一刻起,禁用缩写.          */         state->sortKeys->abbrev_converter = NULL;         state->sortKeys->comparator = state->sortKeys->abbrev_full_comparator;         /* Not strictly necessary, but be tidy */         //非严格性需要,但需要tidy         state->sortKeys->abbrev_abort = NULL;         state->sortKeys->abbrev_full_comparator = NULL;     }     /*      * Reset tuple memory.  We've freed all the tuples that we previously      * allocated.  We will use the slab allocator from now on.      * 重置元组内存.      * 已释放了先前分配的内存.从现在起使用slab分配器.      */     MemoryContextDelete(state->tuplecontext);     state->tuplecontext = NULL;     /*      * We no longer need a large memtuples array.  (We will allocate a smaller      * one for the heap later.)      * 不再需要大块的memtuples数组.(将为后面的堆分配更小块的内存)      */     FREEMEM(state, GetMemoryChunkSpace(state->memtuples));     pfree(state->memtuples);     state->memtuples = NULL;     /*      * If we had fewer runs than tapes, refund the memory that we imagined we      * would need for the tape buffers of the unused tapes.      * 比起tapes,如果runs要少, 退还我们认为需要用于tape缓存但其实用不上的内存.      *      * numTapes and numInputTapes reflect the actual number of tapes we will      * use.  Note that the output tape's tape number is maxTapes - 1, so the      * tape numbers of the used tapes are not consecutive, and you cannot just      * loop from 0 to numTapes to visit all used tapes!      * numTapes和numInputTapes反映了实际的使用tapes数.      * 注意输出的tape编号是maxTapes - 1,因此已使用的tape编号不是连续的,      *   不能简单的从0 - numTapes循环访问所有已使用的tapes.      */     if (state->Level == 1)     {         numInputTapes = state->currentRun;         numTapes = numInputTapes + 1;         FREEMEM(state, (state->maxTapes - numTapes) * TAPE_BUFFER_OVERHEAD);     }     else     {         numInputTapes = state->tapeRange;         numTapes = state->maxTapes;     }     /*      * Initialize the slab allocator.  We need one slab slot per input tape,      * for the tuples in the heap, plus one to hold the tuple last returned      * from tuplesort_gettuple.  (If we're sorting pass-by-val Datums,      * however, we don't need to do allocate anything.)      * 初始化slab分配器.每一个输入的tape都有一个slab slot,对于堆中的元组,      *   外加1用于保存最后从tuplesort_gettuple返回的元组.      * (但是,如果通过传值的方式传递Datums,不需要执行内存分配)      *      * From this point on, we no longer use the USEMEM()/LACKMEM() mechanism      * to track memory usage of individual tuples.      * 从这点起,不再使用USEMEM()/LACKMEM()这种机制来跟踪独立元组的内存使用.      */     if (state->tuples)         init_slab_allocator(state, numInputTapes + 1);     else         init_slab_allocator(state, 0);     /*      * Allocate a new 'memtuples' array, for the heap.  It will hold one tuple      * from each input tape.      * 为堆分配新的'memtuples'数组      * 对于每一个输入的tape,都会保存有一个元组.      */     state->memtupsize = numInputTapes;     state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));     USEMEM(state, GetMemoryChunkSpace(state->memtuples));     /*      * Use all the remaining memory we have available for read buffers among      * the input tapes.      * 使用所有可使用的剩余内存读取输入tapes之间的缓存.      *      * We don't try to "rebalance" the memory among tapes, when we start a new      * merge phase, even if some tapes are inactive in the new phase.  That      * would be hard, because logtape.c doesn't know where one run ends and      * another begins.  When a new merge phase begins, and a tape doesn't      * participate in it, its buffer nevertheless already contains tuples from      * the next run on same tape, so we cannot release the buffer.  That's OK      * in practice, merge performance isn't that sensitive to the amount of      * buffers used, and most merge phases use all or almost all tapes,      * anyway.      * 在新的阶段就算存在某些tapes不再活动,在开始新的归并阶段时,不再尝试在tapes之间重平衡内存.      * 这是比较难以实现的,因为logtape.c不知道某个运行在哪里结束了,那个运行在哪里开始.      * 在新的归并阶段开始时,tape不需要分享,尽管如此,它的缓冲区已包含来自同一tape上下一次运行需要的元组,      * 因此不需要释放缓冲区.      * 实践中,这是没有问题的,归并的性能对于缓存的使用不是性能敏感的,大多数归并阶段使用所有或大多数的tapes.      */ #ifdef TRACE_SORT     if (trace_sort)         elog(LOG, "worker %d using " INT64_FORMAT " KB of memory for read buffers among %d input tapes",              state->worker, state->availMem / 1024, numInputTapes); #endif     state->read_buffer_size = Max(state->availMem / numInputTapes, 0);     USEMEM(state, state->read_buffer_size * numInputTapes);     /* End of step D2: rewind all output tapes to prepare for merging */     //D2完成,倒回所有输出tapes准备归并     for (tapenum = 0; tapenum < state->tapeRange; tapenum++)         LogicalTapeRewindForRead(state->tapeset, tapenum, state->read_buffer_size);     for (;;)     {         //------------- 循环         /*          * At this point we know that tape[T] is empty.  If there's just one          * (real or dummy) run left on each input tape, then only one merge          * pass remains.  If we don't have to produce a materialized sorted          * tape, we can stop at this point and do the final merge on-the-fly.          * 在这时候,我们已知tape[T]是空的.          * 如果正好在每一个输入tape上只剩下某个run(实际或者虚拟的),那么只剩下一次归并.          * 如果不需要产生物化排序后的tape,这时候可以停止并执行内存中的最终归并.          */         if (!state->randomAccess && !WORKER(state))         {             bool        allOneRun = true;             Assert(state->tp_runs[state->tapeRange] == 0);             for (tapenum = 0; tapenum < state->tapeRange; tapenum++)             {                 if (state->tp_runs[tapenum] + state->tp_dummy[tapenum] != 1)                 {                     allOneRun = false;                     break;                 }             }             if (allOneRun)             {                 /* Tell logtape.c we won't be writing anymore */                 //通知logtape.c,不再写入.                 LogicalTapeSetForgetFreeSpace(state->tapeset);                 /* Initialize for the final merge pass */                 //为最终的归并做准备                 beginmerge(state);                 state->status = TSS_FINALMERGE;                 return;             }         }         /* Step D5: merge runs onto tape[T] until tape[P] is empty */         //步骤D5:归并runs到tape[T]中直至tape[P]为空         while (state->tp_runs[state->tapeRange - 1] ||                state->tp_dummy[state->tapeRange - 1])         {             bool        allDummy = true;             for (tapenum = 0; tapenum < state->tapeRange; tapenum++)             {                 if (state->tp_dummy[tapenum] == 0)                 {                     allDummy = false;                     break;                 }             }             if (allDummy)             {                 state->tp_dummy[state->tapeRange]++;                 for (tapenum = 0; tapenum < state->tapeRange; tapenum++)                     state->tp_dummy[tapenum]--;             }             else                 mergeonerun(state);         }         /* Step D6: decrease level */         //步骤D6:往上层汇总         if (--state->Level == 0)             break;         /* rewind output tape T to use as new input */         //倒回输入的Tape T作为新的输入         LogicalTapeRewindForRead(state->tapeset, state->tp_tapenum[state->tapeRange],                                  state->read_buffer_size);         /* rewind used-up input tape P, and prepare it for write pass */         //倒回使用上的输入tape P,并为写入轮准备         LogicalTapeRewindForWrite(state->tapeset, state->tp_tapenum[state->tapeRange - 1]);         state->tp_runs[state->tapeRange - 1] = 0;         /*          * reassign tape units per step D6; note we no longer care about A[]          * 每一个步骤D6,重分配tape单元.          * 注意我们不再关心A[]了.          */         svTape = state->tp_tapenum[state->tapeRange];         svDummy = state->tp_dummy[state->tapeRange];         svRuns = state->tp_runs[state->tapeRange];         for (tapenum = state->tapeRange; tapenum > 0; tapenum--)         {             state->tp_tapenum[tapenum] = state->tp_tapenum[tapenum - 1];             state->tp_dummy[tapenum] = state->tp_dummy[tapenum - 1];             state->tp_runs[tapenum] = state->tp_runs[tapenum - 1];         }         state->tp_tapenum[0] = svTape;         state->tp_dummy[0] = svDummy;         state->tp_runs[0] = svRuns;     }     /*      * Done.  Knuth says that the result is on TAPE[1], but since we exited      * the loop without performing the last iteration of step D6, we have not      * rearranged the tape unit assignment, and therefore the result is on      * TAPE[T].  We need to do it this way so that we can freeze the final      * output tape while rewinding it.  The last iteration of step D6 would be      * a waste of cycles anyway...      * 大功告成!结果位于TAPE[1]中,但因为没有执行步骤D6中最后一个迭代就退出了循环,      *   因此不需要重新整理tape单元分配,因此结果在TAPE[T]中.      * 通过这种方法来处理一遍可以在倒回时冻结结果输出TAPE.      * 步骤D6的最后一轮迭代会是浪费.      */     state->result_tape = state->tp_tapenum[state->tapeRange];     if (!WORKER(state))         LogicalTapeFreeze(state->tapeset, state->result_tape, NULL);     else         worker_freeze_result_tape(state);     state->status = TSS_SORTEDONTAPE;     /* Release the read buffers of all the other tapes, by rewinding them. */     //通过倒回tapes,释放所有其他tapes的读缓存     for (tapenum = 0; tapenum < state->maxTapes; tapenum++)     {         if (tapenum != state->result_tape)             LogicalTapeRewindForWrite(state->tapeset, tapenum);     } }

三、跟踪分析

测试脚本

select * from t_sort order by c1,c2;

跟踪分析

(gdb) b mergeruns Breakpoint 1 at 0xa73508: file tuplesort.c, line 2570. (gdb)  Note: breakpoint 1 also set at pc 0xa73508. Breakpoint 2 at 0xa73508: file tuplesort.c, line 2570.

输入参数

(gdb) c Continuing. Breakpoint 1, mergeruns (state=0x2b808a8) at tuplesort.c:2570 2570        Assert(state->status == TSS_BUILDRUNS); (gdb) p *state $1 = {status = TSS_BUILDRUNS, nKeys = 2, randomAccess = false, bounded = false, boundUsed = false, bound = 0,    tuples = true, availMem = 3164456, allowedMem = 4194304, maxTapes = 16, tapeRange = 15, sortcontext = 0x2b80790,    tuplecontext = 0x2b827a0, tapeset = 0x2b81480, comparetup = 0xa7525b <comparetup_heap>,    copytup = 0xa76247 <copytup_heap>, writetup = 0xa76de1 <writetup_heap>, readtup = 0xa76ec6 <readtup_heap>,    memtuples = 0x7f0cfeb14050, memtupcount = 0, memtupsize = 37448, growmemtuples = false, slabAllocatorUsed = false,    slabMemoryBegin = 0x0, slabMemoryEnd = 0x0, slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0,    currentRun = 3, mergeactive = 0x2b81350, Level = 1, destTape = 2, tp_fib = 0x2b80d58, tp_runs = 0x2b81378,    tp_dummy = 0x2b813d0, tp_tapenum = 0x2b81428, activeTapes = 0, result_tape = -1, current = 0, eof_reached = false,    markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0, nParticipants = -1,    tupDesc = 0x2b67ae0, sortKeys = 0x2b80cc0, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0, estate = 0x0, heapRel = 0x0,    indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0, datumType = 0, datumTypeLen = 0,    ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0,          tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0,          __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {ru_majflt = 0,          __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {         ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0,          __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {         ru_nivcsw = 0, __ru_nivcsw_word = 0}}}} (gdb)

排序键等信息

(gdb) n 2571        Assert(state->memtupcount == 0); (gdb)  2573        if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL) (gdb) p *state->sortKeys $2 = {ssup_cxt = 0x2b80790, ssup_collation = 0, ssup_reverse = false, ssup_nulls_first = false, ssup_attno = 2,    ssup_extra = 0x0, comparator = 0x4fd4af <btint4fastcmp>, abbreviate = true, abbrev_converter = 0x0, abbrev_abort = 0x0,    abbrev_full_comparator = 0x0} (gdb) p *state->sortKeys->abbrev_converter Cannot access memory at address 0x0

重置元组内存,不再需要大块的memtuples数组.

(gdb) n 2593        MemoryContextDelete(state->tuplecontext); (gdb)  2594        state->tuplecontext = NULL; (gdb)  (gdb) n 2600        FREEMEM(state, GetMemoryChunkSpace(state->memtuples)); (gdb)  2601        pfree(state->memtuples); (gdb)  2602        state->memtuples = NULL; (gdb)  2613        if (state->Level == 1) (gdb)

计算Tapes数

(gdb) n 2615            numInputTapes = state->currentRun; (gdb) p state->currentRun $3 = 3 (gdb) p state->Level $4 = 1 (gdb) p state->tapeRange $5 = 15 (gdb) p state->maxTapes $6 = 16 (gdb) n 2616            numTapes = numInputTapes + 1; (gdb)  2617            FREEMEM(state, (state->maxTapes - numTapes) * TAPE_BUFFER_OVERHEAD); (gdb)  2634        if (state->tuples) (gdb) p numInputTapes $7 = 3 (gdb) p numTapes $8 = 4 (gdb)

初始化slab分配器/为堆分配新的’memtuples’数组/倒回所有输出tapes准备归并

(gdb) n 2635            init_slab_allocator(state, numInputTapes + 1); (gdb) n 2643        state->memtupsize = numInputTapes; (gdb)  2644        state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple)); (gdb)  2645        USEMEM(state, GetMemoryChunkSpace(state->memtuples)); (gdb) p state->memtupsize $9 = 3 (gdb) n 2662        if (trace_sort) (gdb)  2667        state->read_buffer_size = Max(state->availMem / numInputTapes, 0); (gdb)  2668        USEMEM(state, state->read_buffer_size * numInputTapes); (gdb) p state->read_buffer_size $10 = 1385762 (gdb) n 2671        for (tapenum = 0; tapenum < state->tapeRange; tapenum++) (gdb)  2672            LogicalTapeRewindForRead(state->tapeset, tapenum, state->read_buffer_size); (gdb) p state->tapeRange $11 = 15 (gdb) p state->status $12 = TSS_BUILDRUNS (gdb)

进入循环

2671        for (tapenum = 0; tapenum < state->tapeRange; tapenum++) (gdb)  2682            if (!state->randomAccess && !WORKER(state)) (gdb)  2684                bool        allOneRun = true; (gdb) p state->randomAccess $15 = false (gdb) p WORKER(state) $16 = 0 (gdb)

循环判断allOneRun是否为F

2687                for (tapenum = 0; tapenum < state->tapeRange; tapenum++) (gdb)  2695                if (allOneRun) (gdb) p allOneRun $19 = true (gdb)

开始归并,并设置状态,返回

(gdb) n 2698                    LogicalTapeSetForgetFreeSpace(state->tapeset); (gdb)  2700                    beginmerge(state); (gdb)  2701                    state->status = TSS_FINALMERGE; (gdb)  2702                    return; (gdb)  2779    } (gdb)  tuplesort_performsort (state=0x2b808a8) at tuplesort.c:1866 1866                state->eof_reached = false; (gdb)

完成排序

(gdb) n 1867                state->markpos_block = 0L; (gdb)  1868                state->markpos_offset = 0; (gdb)  1869                state->markpos_eof = false; (gdb)  1870                break; (gdb)  1878        if (trace_sort) (gdb)  1890        MemoryContextSwitchTo(oldcontext); (gdb)  1891    } (gdb)  ExecSort (pstate=0x2b67640) at nodeSort.c:123 123         estate->es_direction = dir; (gdb) c Continuing.

到此，关于“PostgreSQL怎么调用mergeruns函数”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注亿速云网站，小编会继续努力为大家带来更多实用的文章！

向AI问一下细节

PostgreSQL怎么调用mergeruns函数

二、源码解读

三、跟踪分析

猜你喜欢

最新资讯

相关推荐

相关标签