Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(http): fix issues with importing CSV over HTTP #3790

Draft
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

bluestreak01
Copy link
Member

@bluestreak01 bluestreak01 commented Oct 2, 2023

  • fixed double-disconnect, which lead to segmentation fault
  • remove superfluous receive buffer check and subsequent error
  • refactored HTTP import tests to use our httpclient
  • when CSV is imported into new table and both timestamp and partition method are specified the table will be created as WAL. Bug was in the fact that timestamp was optional somehow.
  • introduced wal URL param to force non-WAL table to be created even if both timestamp and partition method are provided.
  • made values for all URL parameters to be case-insensitive, e.g. true, TRUE, tRue etc all valid boolean values.
  • extended timestamp format inventory with 51 new format patterns
  • fixed issue with date parsing where single quote mark is used as a delimiter

New changes:

  • added truncate parameter to /imp endpoint to allow truncating target table prior to import (if it exists)
  • added skipLineExtraValues parameter to /imp as alias to skipLev
  • added skipLines parameter to /imp endpoint to allow skipping first N lines of csv file (e.g. if these lines are mangled)
  • added suggested mapping and errors to json and text responses.
  • fixed issues with import re-try if target table is locked
  • added list of non-default type adapters (byte, short, float, geohash, date, ...) that are used when target type is known (from target table metadata or user-supplied mapping), so that they get validated before actual import to point out potential type mismatches . These adapters aren't on the default list because they'd clash and shouldn't be chosen automatically.
  • added epoch millis, micros and nanos adapters/formats
  • added file_column_ignore field to schemaV2 column mapping to allow user to ignore unnecessary columns. Such columns are excluded from type detection. Note: table_column_insert_null wasn't added and validation doesn't fail if some column is not mapped because it'd be cumbersome and annoying to users. Any missing target table columns will be set to null/type's default value.

Sample text output (for a messed-up mapping):

+-----------------------------------------------------------------------------------------------------------------+
|      Location:  |                                              test  |        Pattern  | Locale  |      Errors  |
|   Partition by  |                                              NONE  |                 |         |              |
|      Timestamp  |                                              NONE  |                 |         |              |
+-----------------------------------------------------------------------------------------------------------------+
|   Rows handled  |                                                 0  |                 |         |              |
|  Rows imported  |                                                 0  |                 |         |              |
+-----------------------------------------------------------------------------------------------------------------+
|  Idx  |            Csv name  |     Csv type  |        Table column  |   Table type  |                   Status  |
+-----------------------------------------------------------------------------------------------------------------+
|    0  |               col_a  |          INT  |               col_a  |          INT  |                       OK  |
|    1  |               short  |        SHORT  |               short  |      UNKNOWN  |            BAD TABLE REF  |
|    2  |           bad_short  |       STRING  |           bad_short  |      UNKNOWN  |            BAD TABLE REF  |
|    3  |                byte  |         BYTE  |                byte  |      UNKNOWN  |            BAD TABLE REF  |
|    4  |            bad_byte  |       STRING  |            bad_byte  |      UNKNOWN  |            BAD TABLE REF  |
|    5  |                 int  |          INT  |         wrong_table  |      UNKNOWN  |            BAD TABLE REF  |
|    6  |               other  |         CHAR  |             UNKNOWN  |      UNKNOWN  |                 UNMAPPED  |
|    7  |                   d  |       STRING  |                dup1  |       STRING  |            DUP TABLE REF  |
|    8  |                dup1  |       STRING  |                dup1  |       STRING  |            DUP TABLE REF  |
|       |           wrong_csv  |      UNKNOWN  |               col_c  |      UNKNOWN  |              BAD CSV REF  |
|       |               bogus  |      UNKNOWN  |               bogus  |      UNKNOWN  |BAD CSV REF,BAD TABLE REF  |
+-----------------------------------------------------------------------------------------------------------------+
|  Errors                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------+
|  column mapping is invalid [unmapped_schema_column_indexes=[0,1,2,3,5,6],unmapped_csv_column_indexes=[6],ambig  |
|  us_table_column_names=[dup1]]                                                                                  |
+-----------------------------------------------------------------------------------------------------------------+

More at https://github.com/questdb/rfc/discussions/110

@bluestreak01 bluestreak01 changed the title fix(code): fix issues with importing CSV over HTTP fix(http): fix issues with importing CSV over HTTP Oct 6, 2023
ideoma
ideoma previously approved these changes Oct 6, 2023
core/src/main/java/io/questdb/cairo/CairoEngine.java Outdated Show resolved Hide resolved
}

private void doSend(int timeout) {
private void doSend() {
// System.out.println("send()");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: debris

// .query("query", "cpu%20limit%20400000")
.query("query", "cpu limit 2")
// .query("query", "cpu")
// query execution example
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: debris

.query("timestamp", timestampColumnName)
.query("partitionBy", partitionBy)
.query("overwrite", "false")
// .query("skipLev", "false")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: debris

@@ -401,29 +401,6 @@ public void testImportsCreateAsSelectAndDrop() throws Exception {
});
}

@Test
public void testImportsHeaderIsNotFullyReceivedIntoReceiveBuffer() throws Exception {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a valid test case, why is it removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a valid test, we should (and are able) to parse csv input with buffer size set to 1 byte. The exception was artificial. The header parser uses its own buffer, which is large enough for the test.

After I removed artificial check this test stopped making sense

bluestreak01 and others added 15 commits October 10, 2023 10:44
# Conflicts:
#	core/src/main/java/io/questdb/PropServerConfiguration.java
#	core/src/main/java/io/questdb/cairo/CairoEngine.java
#	core/src/main/java/io/questdb/cutlass/http/HttpConnectionContext.java
#	core/src/main/java/io/questdb/cutlass/http/HttpHeaderParser.java
#	core/src/main/java/io/questdb/cutlass/http/client/HttpClient.java
#	core/src/main/java/io/questdb/cutlass/http/client/HttpClientMain.java
#	core/src/main/java/io/questdb/cutlass/http/processors/JsonQueryProcessor.java
#	core/src/main/java/io/questdb/cutlass/http/processors/PrometheusMetricsProcessor.java
#	core/src/main/java/io/questdb/cutlass/http/processors/TextImportProcessor.java
#	core/src/main/java/io/questdb/cutlass/line/tcp/LineTcpMeasurementScheduler.java
#	core/src/main/java/io/questdb/cutlass/text/CairoTextWriter.java
#	core/src/main/java/io/questdb/cutlass/text/SerialCsvFileImporter.java
#	core/src/main/java/io/questdb/cutlass/text/TextLoader.java
#	core/src/main/java/io/questdb/cutlass/text/types/DateAdapter.java
#	core/src/main/java/io/questdb/cutlass/text/types/DateUtf8Adapter.java
#	core/src/main/java/io/questdb/cutlass/text/types/TimestampAdapter.java
#	core/src/main/java/io/questdb/cutlass/text/types/TimestampUtf8Adapter.java
#	core/src/main/java/io/questdb/cutlass/text/types/TypeAdapter.java
#	core/src/main/java/io/questdb/cutlass/text/types/TypeManager.java
#	core/src/main/java/io/questdb/std/datetime/microtime/TimestampFormatCompiler.java
#	core/src/main/java/io/questdb/std/str/CharSink.java
#	core/src/main/java/module-info.java
#	core/src/test/java/io/questdb/test/PropServerConfigurationTest.java
#	core/src/test/java/io/questdb/test/cutlass/http/IODispatcherTest.java
#	core/src/test/java/io/questdb/test/cutlass/http/TestHttpClient.java
#	core/src/test/java/io/questdb/test/cutlass/text/TextLoaderTest.java
- removed schema vs target table validation
- added target table column conflict check
- added tests
- removed 'insert_null' json field
- added truncate import option
# Conflicts:
#	core/src/main/java/io/questdb/cutlass/http/HttpConstants.java
#	core/src/main/java/io/questdb/cutlass/http/client/HttpClient.java
#	core/src/main/java/io/questdb/cutlass/http/processors/JsonQueryProcessor.java
#	core/src/main/java/io/questdb/cutlass/http/processors/TextImportProcessor.java
#	core/src/main/java/io/questdb/cutlass/text/CairoTextWriter.java
#	core/src/main/java/io/questdb/cutlass/text/TextLoader.java
#	core/src/main/java/io/questdb/cutlass/text/TextStructureAnalyser.java
#	core/src/main/java/io/questdb/cutlass/text/types/DateUtf8Adapter.java
#	core/src/main/java/io/questdb/cutlass/text/types/TimestampUtf8Adapter.java
#	core/src/main/java/io/questdb/cutlass/text/types/TypeAdapter.java
#	core/src/main/java/io/questdb/cutlass/text/types/TypeManager.java
#	core/src/main/java/io/questdb/std/Chars.java
#	core/src/test/java/io/questdb/test/cairo/Overrides.java
#	core/src/test/java/io/questdb/test/cutlass/http/IODispatcherTest.java
#	core/src/test/java/io/questdb/test/cutlass/http/TestHttpClient.java
@ideoma
Copy link
Collaborator

ideoma commented Jan 19, 2024

[PR Coverage check]

😍 pass : 1657 / 1899 (87.26%)

file detail

path covered line new line coverage
🔵 io/questdb/cutlass/http/client/HttpClientMain.java 0 15 00.00%
🔵 io/questdb/std/datetime/millitime/GenericDateFormat.java 0 6 00.00%
🔵 io/questdb/cutlass/http/WaitProcessor.java 0 1 00.00%
🔵 io/questdb/std/datetime/microtime/GenericTimestampFormat.java 0 21 00.00%
🔵 io/questdb/std/BytecodeAssembler.java 0 2 00.00%
🔵 io/questdb/cutlass/text/types/DateAdapter.java 1 8 12.50%
🔵 io/questdb/std/datetime/millitime/DateFormatCompiler.java 5 8 62.50%
🔵 io/questdb/std/datetime/microtime/TimestampFormatCompiler.java 36 51 70.59%
🔵 io/questdb/cutlass/http/client/HttpClient.java 67 93 72.04%
🔵 io/questdb/cutlass/text/types/NoopTypeAdapter.java 3 4 75.00%
🔵 io/questdb/PropServerConfiguration.java 14 18 77.78%
🔵 io/questdb/cutlass/text/types/TypeManager.java 67 84 79.76%
🔵 io/questdb/network/IODispatcherConfiguration.java 4 5 80.00%
🔵 io/questdb/cutlass/text/types/FloatAdapter.java 5 6 83.33%
🔵 io/questdb/cutlass/text/schema2/SchemaV2.java 90 104 86.54%
🔵 io/questdb/std/str/Utf8s.java 8 9 88.89%
🔵 io/questdb/cutlass/text/types/TimestampToDateAdapter.java 8 9 88.89%
🔵 io/questdb/std/Chars.java 20 22 90.91%
🔵 io/questdb/std/LowerCaseCharSequenceHashSet.java 9 10 90.00%
🔵 io/questdb/cutlass/text/TextLoader.java 632 700 90.29%
🔵 io/questdb/cutlass/http/processors/TextImportProcessor.java 154 168 91.67%
🔵 io/questdb/cutlass/text/schema2/SchemaV2Parser.java 208 224 92.86%
🔵 io/questdb/cutlass/text/types/GeoHashAdapter.java 13 14 92.86%
🔵 io/questdb/cutlass/http/HttpConnectionContext.java 55 58 94.83%
🔵 io/questdb/cutlass/text/TextStructureAnalyser.java 61 62 98.39%
🔵 io/questdb/cutlass/http/HttpConstants.java 5 5 100.00%
🔵 io/questdb/network/IOContext.java 5 5 100.00%
🔵 io/questdb/std/IntList.java 1 1 100.00%
🔵 io/questdb/cutlass/text/types/TimestampUtf8Adapter.java 6 6 100.00%
🔵 io/questdb/cutlass/http/HttpHeaderParser.java 1 1 100.00%
🔵 io/questdb/cutlass/text/DefaultTextConfiguration.java 1 1 100.00%
🔵 io/questdb/network/AbstractIODispatcher.java 3 3 100.00%
🔵 io/questdb/cutlass/text/SerialCsvFileImporter.java 1 1 100.00%
🔵 io/questdb/std/datetime/DateLocale.java 3 3 100.00%
🔵 io/questdb/std/AbstractLowerCaseCharSequenceHashSet.java 2 2 100.00%
🔵 io/questdb/cutlass/json/JsonLexer.java 1 1 100.00%
🔵 io/questdb/cutlass/http/processors/JsonQueryProcessor.java 1 1 100.00%
🔵 io/questdb/cairo/ColumnType.java 1 1 100.00%
🔵 io/questdb/cutlass/text/types/DateUtf8Adapter.java 9 9 100.00%
🔵 io/questdb/cutlass/text/schema2/Column.java 43 43 100.00%
🔵 io/questdb/cutlass/text/types/TypeAdapter.java 2 2 100.00%
🔵 io/questdb/cutlass/text/types/InputFormatConfiguration.java 14 14 100.00%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/cutlass/text/ParallelCsvFileImporter.java 20 20 100.00%
🔵 io/questdb/cutlass/http/HttpMultipartContentParser.java 1 1 100.00%
🔵 io/questdb/cutlass/text/SchemaV1Parser.java 8 8 100.00%
🔵 io/questdb/cutlass/text/TextDelimiterScanner.java 2 2 100.00%
🔵 io/questdb/cutlass/pgwire/CircuitBreakerRegistry.java 1 1 100.00%
🔵 io/questdb/std/datetime/DateLocaleFactory.java 4 4 100.00%
🔵 io/questdb/cutlass/text/types/ShortAdapter.java 4 4 100.00%
🔵 io/questdb/cutlass/text/types/TimestampAdapter.java 7 7 100.00%
🔵 io/questdb/cutlass/text/AbstractTextLexer.java 18 18 100.00%
🔵 io/questdb/std/Numbers.java 23 23 100.00%
🔵 io/questdb/cutlass/http/MultipartParserState.java 3 3 100.00%
🔵 io/questdb/cutlass/text/types/ByteAdapter.java 4 4 100.00%
🔵 io/questdb/cutlass/text/CopyTask.java 2 2 100.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants