-
- Notifications
You must be signed in to change notification settings - Fork 19.3k
ENH: GH17054: read_html() handles rowspan/colspan and infers headers #17089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
145 commits Select commit Hold shift + click to select a range
4bf2f2e ENH: GH17054: read_html() handles rowspan/colspan and infers headers
jowens 80d9c2b in python 3, lambdas no longer take tuples as args. thanks pep 3113.
jowens 26d1f6a fixing lint error
jowens 37af4ea in python3, zip does not return a list, so list(zip(...))
jowens 86dee93 Merge branch 'master' into read_html_with_colspan_rowspan
jowens d3eca72 Merge branch 'master' into read_html_with_colspan_rowspan
jowens f064562 documentation changes only
jowens 67c8a59 Merge branch 'read_html_with_colspan_rowspan' of github.com:jowens/pa…
jowens 5a38278 documentation changes only
jowens 39f7814 documentation changes only, limited to 80 cols
jowens 531863f more documentation edits
jowens 818d394 minor documentation edits
jowens f3a6aa3 better return type explanation in code, added issue number to tests
jowens 2f904b2 cleaning up legacy documentation issues
jowens f4e7592 remove 'if'
jowens 293d9e4 newlines for clarity
jowens efabae4 DOC: whatsnew typos
jreback 552677f ENH: GH17054: read_html() handles rowspan/colspan and infers headers
jowens 1aacf17 TST: Check more error messages in tests (#17075)
gfyoung 359890f BUG: Respect dtype when calling pivot_table with margins=True
toobaz 3fd2612 MAINT: Add missing space in parsers.pyx
gfyoung 76249bf MAINT: Add missing paren around print statement
gfyoung 77d16d4 DOC: fix typos in missing.rst
jreback bd50a4f in python 3, lambdas no longer take tuples as args. thanks pep 3113.
jowens 452e08d fixing lint error
jowens ecfaa4c in python3, zip does not return a list, so list(zip(...))
jowens 69cd83c DOC: further clean-up null/na changes (#17113)
jorisvandenbossche 1e5cfa1 BUG: Allow pd.unique to accept tuple of strings (#17108)
mroeschke c502dba BUG: Allow Series with same name with crosstab (#16028)
mroeschke 2155c3e COMPAT: make sure use_inf_as_null is deprecated (#17126)
jreback 3ed9f53 CI: bump version of xlsxwriter to 0.5.2 (#17142)
jreback 9a50c21 DOC: Clean up instructions in ISSUE_TEMPLATE (#17146)
gfyoung 5759eff Add missing space to the NotImplementedError's message for compound d…
FKint 3855039 DOC: (de)type the return value of concat (#17079) (#17119)
jebob d7cb627 BUG: Thoroughly dedup column names in read_csv (#17095)
gfyoung 9d32df6 DOC: Additions/updates to documentation (#17150)
alanyee 5ce00e1 ENH: add to/from_parquet with pyarrow & fastparquet (#15838)
jreback 9aadb64 DOC: doc typos, xref #15838
jreback 89fa421 TST: test for categorical index monotonicity (#17152)
jreback ccdae36 MAINT: Remove non-standard and inconsistently-used imports (#17085)
jbrockmendel 5b42bdf DOC: typos in whatsnew
56957cf DOC: whatsnew 0.21.0 fixes
jreback d2e21c3 BUG: Fix CSV parsing of singleton list header (#17090)
20487bf ENH: Support strings containing '%' in add_prefix/add_suffix (#17151)…
jschendel b4b4c77 REF: repr - allow block to override values that get formatted (#17143)
jorisvandenbossche b720f0d MAINT: Drop unnecessary newlines in issue template
gfyoung 43dab45 remove direct import of nan
jbrockmendel 94a734a use == to test String equality (#17171)
jhelie e143ee1 ENH: Add warning when setting into nonexistent attribute (#16951)
deniederhut 5a523bb DOC: added string processing comparison with SAS (#16497)
natethedrummer 0bfad7c CLN: remove unused get methods in internals (#17169)
jbrockmendel a4e4909 TST: Partial Boolean DataFrame Indexing (#17186)
mroeschke e8fab8a CLN: Reformat docstring for IPython fixture
gfyoung d089d44 Define Series.plot and Series.hist in class definition (#17199)
jbrockmendel b09b274 BUG: support pandas objects in iloc with old numpy versions (#17194)
toobaz cc8c5d7 Implement _make_accessor classmethod for PandasDelegate (#17166)
jbrockmendel df9710b Create ABCDateOffset (#17165)
jbrockmendel e71e6d7 BUG: resample and apply modify the index type for empty Series (#17149)
discort e9c7f29 DOC: Updated NDFrame.astype docs (#17203)
topper-123 38293d3 MAINT: Minor touch-ups to GitHub PULL_REQUEST_TEMPLATE (#17207)
dhimmel 7280e6c CLN: replace %s syntax with .format in core.computation (#17209)
jschendel 421dcf4 Bugfix for multilevel columns with empty strings in Python 2 (#17099)
chrisjbillington d5733ee CLN/ASV clean-up frame stat ops benchmarks (#17205)
jorisvandenbossche 9f69583 BUG: Rolling apply on DataFrame with Datetime index returns NaN (#17156)
FXocena 1e1ce40 CLN: Remove import exception handling (#17218)
dhimmel a1509dc MAINT: Remove extra the's in deprecation messages (#17222)
gfyoung 6788533 DOC: Patch docs in _decorators.py
gfyoung 619e031 CLN: replace %s syntax with .format in pandas.util (#17224)
jschendel 9e26997 Add 'See also' sections (#17223)
topper-123 a7311d2 move pivot_table doc-string to DataFrame (#17174)
jbrockmendel 1ac9ede Remove import of pandas as pd in core.window (#17233)
jbrockmendel a2d8d23 TST: Move more frame tests to SharedWithSparse (#17227)
kernc 013b983 REF: _get_objs_combined_axis (#17217)
toobaz fddb66d ENH/PERF: Remove frequency inference from .dt accessor (#17210)
cpcloud 2e55156 Fix apparent typo in tests (#17247)
jbrockmendel b49446e COMPAT: avoid calling getsizeof() on PyPy
mattip 536b761 CLN: replace %s syntax with .format in pandas.core.reshape (#17252)
jschendel a1ff671 ENH: Infer compression from non-string paths (#17206)
dhimmel df1b0dc Fix bugs in IntervalIndex.is_non_overlapping_monotonic (#17238)
jschendel 8fe1cc3 BUG: Fix behavior of argmax and argmin with inf (#16449) (#16449)
DGrady 357e7ae CLN: Remove have_pytz (#17266)
jbrockmendel aa97aa6 CLN: replace %s syntax with .format in core.dtypes and core.sparse (#…
jschendel a618bec Replace imports of * with explicit imports (#17269)
jbrockmendel db3ea2f TST: pytest deprecation warnings GH17197 (#17253)
swyoon de60666 Handle more date/datetime/time formats (#15871)
Winand 0bbda54 DOC: add example on json_normalize (#16438)
zzgao c148dd2 BUG: Have object dtype for empty Categorical.categories (#17249)
TomAugspurger 155c11a CLN: replace %s syntax with .format in pandas.tseries (#17290)
jschendel e4aeed2 TST: parameterize consistency tests for rolling/expanding windows (#1…
jreback db11418 FIX: define `DataFrame.items` for all versions of python (#17214)
tacaswell a256e26 PERF: Update ASV publish config (#17293)
TomAugspurger 75d46a6 DOC: Expand docstrings for head / tail methods (#16941)
yosukeBaya4 172abfb MAINT: Use set literal for unsupported + depr args
gfyoung 1982aca DOC: Add proper docstring to maybe_convert_indices
gfyoung 393bb19 DOC: Improving docstring of take method (#16948)
matagus 595e0a4 BUG: Fixed regex in asv.conf.json (#17300)
TomAugspurger 6a45d36 Remove unnecessary usage of _TSObject (#17297)
jbrockmendel 5f077f3 BUG: clip should handle null values
mgasvoda a10fa92 BUG: fillna returns frame when inplace=True if value is a dict (#1615…
8dfb95b CLN: Index.append() refactoring (#16236)
toobaz 8326c83 DEPS: set min versions (#17002)
jreback 8fbd8f8 CLN: replace %s syntax with .format in core.tools, algorithms.py, bas…
jschendel 3625190 BUG: Fix strange behaviour of Series.iloc on MultiIndex Series (#1714…
7364711 DOC: Add module doc-string to tseries/api.py
gfyoung e5797fa MAINT: Clean up docs in pandas/errors/__init__.py
gfyoung 9be531a CLN: replace %s syntax with .format in missing.py, nanops.py, ops.py …
jschendel a9574b0 Make pd.Period immutable (#17239)
jbrockmendel 3e31383 Bug: groupby multiindex levels equals rows (#16859)
e5030b3 BUG: Cannot use tz-aware origin in to_datetime (#16842)
ivybae 7be53ed Replace usage of total_seconds compat func with timedelta method (#17…
jbrockmendel f4adbb9 CLN: replace %s syntax with .format in core/indexing.py (#17357)
cbertinato b1b3325 DOC: Point to dev-docs in issue template (#17353)
gfyoung 76cc924 CLN: remove total_seconds compat from json (#17341)
chris-b1 0309dae CLN: Move test_intersect_str_dates (#17366)
jschendel c523bfc BUG: Respect dups in reindexing CategoricalIndex (#17355)
gfyoung 5a6f2ac Unify Index._dir_* with Series implementation (#17117)
jbrockmendel ce8ccba BUG: make order of index from pd.concat deterministic (#17364)
toobaz a585e09 Fix typo that causes several NaT methods to have incorrect docstrings…
jbrockmendel 8199559 CLN: replace %s syntax with .format in io/formats/format.py (#17358)
cbertinato 6ec1044 PKG: Added pyproject.toml for PEP 518 (#16745)
TomAugspurger c33af56 DOC: Update Overview page in documentation (#17368)
iuliakhomenko 0f8205c API: Have MultiIndex consturctors always return a MI (#17236)
TomAugspurger 54f68b4 CLN: replace %s syntax with .format in io/formats/css.py, excel.py, p…
cbertinato b717ebc BUG: not correctly using OrderedDict in test_series_apply (#17384)
sylviawhoa b61af0e Remove boxplot from _dataframe_apply_whitelist (#17381)
jbrockmendel c80e8d0 API: Localize Series when calling to_datetime with utc=True (#6415) (…
mroeschke 3a0dc92 TST: Enable tests in test_tools.py (#17405)
jschendel 365f2fe TST: remove tests and docs for legacy (pre 0.12) hdf5 support (#17404)
topper-123 d994323 Tslib unused (#17402)
jbrockmendel e94e572 DOC: Cleaned references to pandas <v0.12 in docs (#17375)
topper-123 6a02ffa Remove unused _day and _month attrs (#17431)
jbrockmendel 519c57f DOC: Clean-up references to v12 to v14 (both included) (#17420)
topper-123 f22b895 BUG: Plotting Timedelta on y-axis #16953 (#17430)
s-weigand 8edd85a COMPAT: handle pyarrow deprecation of timestamps_to_ms in .from_panda…
jreback 047727a DOC/TST: Add examples to MultiIndex.get_level_values + related change…
topper-123 91a2300 documentation changes only
jowens 41058ab documentation changes only
jowens 4926913 documentation changes only, limited to 80 cols
jowens 14235ec more documentation edits
jowens 196c835 minor documentation edits
jowens fed4b03 better return type explanation in code, added issue number to tests
jowens c2d9cc6 cleaning up legacy documentation issues
jowens d4b213b remove 'if'
jowens b16f6d5 newlines for clarity
jowens 092889a Merge branch 'read_html_with_colspan_rowspan' of github.com:jowens/pa…
jowens File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
documentation changes only, limited to 80 cols
- Loading branch information
commit 4926913281a32d9b7225d93c60a9eebb967e57bf
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -296,7 +296,8 @@ def _extract_tr(self, table): | |
| | ||
| Returns | ||
| ------- | ||
| rows : a list of row elements of a table, usually <tr> or <th> elements. | ||
| rows : a list of row elements of a table, usually <tr> or <th> | ||
| elements. | ||
| """ | ||
| raise AbstractMethodError(self) | ||
| | ||
| | @@ -376,9 +377,11 @@ def _parse_raw_thead_tbody_tfoot(self, table_html): | |
| Returns | ||
| ------- | ||
| tuple of (header, body, footer) | ||
| header : list of rows, each of which is a list of parsed header elements | ||
| header : list of rows, each of which is a list of parsed header | ||
| elements | ||
| body : list of rows, each of which is a list of parsed body elements | ||
| footer : list of rows, each of which is a list of parsed footer elements | ||
| footer : list of rows, each of which is a list of parsed footer | ||
| elements | ||
| ||
| """ | ||
| header_rows = [] | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Add newline above this one. | ||
| body_rows = [] | ||
| | ||
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this line (multiple returns are always tuples)