Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃殌 Feature: write a Numba parser for HITEMP Files #510

Open
erwanp opened this issue Aug 17, 2022 · 8 comments
Open

馃殌 Feature: write a Numba parser for HITEMP Files #510

erwanp opened this issue Aug 17, 2022 · 8 comments
Assignees
Labels
databases related to line databases enhancement New feature or request good first issue Good for newcomers performance
Milestone

Comments

@erwanp
Copy link
Member

erwanp commented Aug 17, 2022

馃敄 Feature description

Parsing HITRAN/HITEMP file is currently very slow.

#505 improved it a bit for HITRAN files, using the HAPI implementation which is faster than RADIS's initial implementation.
However, HITEMP is unchanged, and remains a large bottleneck.

We use np.fromfile which is very slow.
A detailed study was done few months ago : https://stackoverflow.com/questions/71411907/dramatic-drop-in-numpy-fromfile-performance-when-switching-from-python-2-to-pyth/71505529#71505529

Turns out that a pure-Python implementation of the parser, accelerated by Numba, could be 1,000 faster than the np.fromfile approach
Note : the np.fromfile inefficiency is maximal when reading small chunks of data; with the large HITEMP database we won't get a x1000 speed-up, but it may be faster still !


Implementation

Re-write the _read_hitran_file() function with a pure Python, then jit-Numba it.

馃憠 Why you want this feature!!

We currently spend ~3hrs parsing HITEMP CO2 & H2O files (!)
Could really help to speed this up.

@erwanp erwanp added enhancement New feature or request performance databases related to line databases labels Aug 17, 2022
@erwanp erwanp added this to the 0.14 milestone Aug 17, 2022
@erwanp
Copy link
Member Author

erwanp commented Aug 17, 2022

Actually we have have unexpected perf improvement already with numpy 1.23 according to numpy/numpy#13319 (comment)

Unfortunately Radis requires Numpy<1.22.3 for the moment : #490

@anandxkumar anandxkumar modified the milestones: 0.14, 0.15 Oct 28, 2022
@erwanp erwanp added the good first issue Good for newcomers label Mar 15, 2023
@jonotassia
Copy link

Hi @erwanp,

I'm happy to give this one a go. Before I get too into the weeds rewriting the _read_hitran_file() function, I see that there is a separate issue, #548 , requesting removing the upper bound on NumPy, which you'd implied may be okay to do now with further testing.

Per your last post in this thread, before I attempt to implement the pure python solution, should I wait to get confirmation back that the Vaex dependency is sorted so we can test the native NumPy solution before reworking the function?

@erwanp erwanp modified the milestones: 0.15, 0.16 Jul 30, 2023
@erwanp
Copy link
Member Author

erwanp commented Aug 14, 2023

@dcmvdbekerom

@dcmvdbekerom dcmvdbekerom self-assigned this Aug 15, 2023
@dcmvdbekerom
Copy link
Member

Parsing with custom c code (including SIMD intrinsics) should be extremely fast.

Also: we could process one dataset while downloading the next (maybe we already do this?)
https://docs.python.org/3/library/asyncio.html
https://pypi.org/project/aiohttp/

@code29563
Copy link

code29563 commented Mar 18, 2024

To what extent is this still an issue, and have there been any updates that change the right way to approach it?

For hitran, has _read_hitran_file been correctly identified as the bottleneck? A cProfile of some test code isn't showing is to be significant:

from radis import SpectrumFactory
from radis.api.hitranapi import hit2df

sf = SpectrumFactory(
        molecule = 'CO2',
        wavenum_min=2380,
        wavenum_max=2400,
        mole_fraction=400e-6,
        path_length=100,  # cm
        isotope=[1],
    )

sf.fetch_databank('hitran')
hit2df('.radisdb/hitran/downloads__can_be_deleted/CO2/CO2_1.data', cache='regen')

The vast majority of time spent on parse_hitran_file turns out to be on _ndarray2df, not the two calls to _read_hitran_file. Same when run for H2O rather than CO2.

For hitemp, after testing radis.io.hitemp.fetch_hitemp (again for both CO2 and H2O), of the time spent on parse_to_local_file it seems most of it is spent on the 'Post-processing' calls to parse_local_quanta and parse_global_quanta, so is that what needs to be rewritten?

@dcmvdbekerom
Copy link
Member

I have worked on a C++/SIMD implementation which was very fast, I will post more details over the weekend

@minouHub
Copy link
Collaborator

@dcmvdbekerom any news on that? Is the issue still open?

@dcmvdbekerom
Copy link
Member

dcmvdbekerom commented Apr 4, 2024

Here's a simd C++ code that reads super fast; download became the bottleneck after that:

// loadHitranSIMD.cpp : This file contains the 'main' function. Program execution begins and ends there.
//


/*
0x              1x              2x              3x              4x              5x              6x              7x
0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789
MMIvvvvv.vvvvvv S.SSE~SSS  A.AAE~AA.gggg.ggggEEEEE.EEEEnnnnddddddddnnnnn11122LL33RR11111122ll33rrPPPCCNNNNpppccnnnnbbbjjjw
 21 1774.000000 1.36E-089  4.63E+00.0660.064433580.55680.710.000000 0.56  0 0 0 0 0     0 0 0 0 0 50 11065 47 1 901  P 79e
 21 1774.000000 3.82E-086  5.13E+00.0625.062931999.61920.680.000000 0.58  0 0 0 0 0     0 0 0 0 0 47 1 617 44 1 511  P 92e
 21 1774.000000 1.67E-093  3.45E+00.0564.064435442.11380.630.000000 0.64  0 0 0 0 0     0 0 0 0 0 50 1 821 47 1 674  P111e
 21 1774.000000 5.82E-084  2.25E+00.0686.082930660.11990.700.000000 0.64  0 0 0 0 0     0 0 0 0 0 48 2 797 45 2 715  R 46f
 21 1774.000000 1.82E-092  6.15E+00.0685.072134952.41320.720.000000 0.58  0 0 0 0 0     0 0 0 0 0 52 11795 49 11604  R 61e
 21 1774.000000 1.47E-076  1.22E+00.0666.065227127.35770.710.000000 0.56  0 0 0 0 0     0 0 0 0 0 40 1 539 37 1 497  R 75e
 21 1774.000000 4.98E-084  8.65E+00.0653.063931084.69870.700.000000 0.56  0 0 0 0 0     0 0 0 0 0 47 2 455 44 2 411  R 81f
 21 1774.000000 4.26E-090  1.98E+00.0613.062933687.54260.670.000000 0.59  0 0 0 0 0     0 0 0 0 0 47 21217 44 21096  R 95f
 21 1774.000000 1.90E-091  3.06E+00.0483.067834497.51750.560.000000 0.70  0 0 0 0 0     0 0 0 0 0 43 2 604 40 2 566  R141f
 21 1774.000000 3.86E-098  1.06E+01.0625.062937831.03190.680.000000 0.58  0 0 0 0 0     0121213 1 54 21843 51 21597  P 92f
*/


#include <iostream>
#include <fstream>
#include <string.h>
#include <iomanip>
#include <chrono>
#include <intrin.h>

#include <string>
#include <filesystem>


namespace fs = std::filesystem;


int main()
{
    std::ios::sync_with_stdio(false);

    std::cout << "Hello World!\n";
    double total_time = 0.0;


    std::string path = "C:/CDSD4000/par";

    //std::ifstream is("cdsd_hitemp_08", std::ifstream::binary);
    std::ofstream os_v("v_arr.dat", std::ofstream::binary);
    std::ofstream os_A("A_arr.dat", std::ofstream::binary);
    std::ofstream os_ga("ga_arr.dat", std::ofstream::binary);
    std::ofstream os_gs("gs_arr.dat", std::ofstream::binary);
    std::ofstream os_E0("E0_arr.dat", std::ofstream::binary);
    std::ofstream os_na("na_arr.dat", std::ofstream::binary);
    std::ofstream os_da("da_arr.dat", std::ofstream::binary);
    std::ofstream os_ns("ns_arr.dat", std::ofstream::binary);
    std::chrono::steady_clock sc;

    const int block_size = 124;


    float exp_table[256];
    for (int i = 0; i < 0x100; i++) {
        exp_table[i] = (float)pow(10.0, i - 127);
     }

   char a = -3;
    std::wcout << exp_table[a + 127] << "\n";

    
    for (const auto& entry : fs::directory_iterator(path)) {
        std::cout << entry.path().filename() << " "; //<< std::endl;




        //std::ifstream is("cdsd_01774_01776", std::ifstream::binary);
        std::ifstream is(entry.path(), std::ifstream::binary);
        //FILE* infile = fopen((const char*)&entry.path(), "r");

        //https://stackoverflow.com/questions/42593655/why-is-my-c-disk-write-test-much-slower-than-a-simply-file-copy-using-bash

        char Buffer[1024 * 1024];

        //is.rdbuf()->pubsetbuf(Buffer, 1024*1024);


        is.seekg(0, is.end);
        int length = is.tellg();
        is.seekg(0, is.beg);

        int lines = length / block_size;
        int chunk = 8 * block_size;

        char* buffer = new char[chunk];
        char* buffer2;
        char* v_substr = new char[20];
        //float v_vec[8];
        
        //int test[2] = { 0xABC, -0xABC };
        //os_A.write(reinterpret_cast<char*>(&test), 2 * sizeof(int));

        __m256i offset_vec = _mm256_mullo_epi32(_mm256_set_epi32(7, 6, 5, 4, 3, 2, 1, 0), _mm256_set1_epi32(block_size));
        __m256i sub_s0_vec = _mm256_set1_epi8('0');

        __m256i mul_1_10_1_10_vec = _mm256_set1_epi32(0x010A010A);
        __m256i mul_1_10_0_1_vec = _mm256_set1_epi32(0x010A0001);
        __m256i mul_1_10_0_0_vec = _mm256_set1_epi32(0x010A0000);

        __m256i mul_10_1000_vec = _mm256_set1_epi32(0x000A03E8);
        __m256i mul_1_100_vec = _mm256_set1_epi32(0x00010064);

        
        __m256i minus_mask0_vec = _mm256_set_epi32(0x0C0C0C0C, 0x08080808, 0x04040404, 0x00000000, 0x0C0C0C0C, 0x08080808, 0x04040404, 0x00000000);
        __m256i minus_test0_vec = _mm256_set1_epi32(0x0000002D);

        __m256i minus_mask1_vec = _mm256_set_epi32(0x0D0D0D0D, 0x09090909, 0x05050505, 0x01010101, 0x0D0D0D0D, 0x09090909, 0x05050505, 0x01010101);
        __m256i minus_test1_vec = _mm256_set1_epi32(0x00002D00);

        __m256i epi32_125_vec = _mm256_set1_epi32(125);
        __m256 mul_em2_vec = _mm256_set1_ps(1e-2);
        __m256 mul_em4_vec = _mm256_set1_ps(1e-4);

        char* base_addr;

        auto start_file = sc.now();

        while (!is.eof()) {

            //std::cout << "v = ";

            is.read(buffer, chunk);
            //https://stackoverflow.com/questions/18688763/why-is-istream-ostream-slow

            auto start = sc.now();

            __m256i minus_vec;
            __m256i temp1_vec;
            __m256i temp2_vec;
            __m256i temp3_vec;

            __m256 tempf1_vec;
            __m256 tempf2_vec;
            __m256 tempf3_vec;


            /////////////////////////////////////////////////////////////////////////////

            // v
            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 3), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_max_epi8(temp1_vec, _mm256_setzero_si256());
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_1_10_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_10_1000_vec);
            tempf1_vec = _mm256_cvtepi32_ps(temp1_vec);

            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 7), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_0_1_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            tempf2_vec = _mm256_cvtepi32_ps(temp1_vec);

            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 11), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_1_10_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            tempf3_vec = _mm256_cvtepi32_ps(temp1_vec);

            tempf2_vec = _mm256_fmadd_ps(tempf3_vec, mul_em4_vec, tempf2_vec);
            __m256 v_vec = _mm256_fmadd_ps(tempf2_vec, mul_em2_vec, tempf1_vec);

            /////////////////////////////////////////////////////////////////////////////

            //A
            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 27), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_0_1_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec); //possibly a faster intrinsic is available
            tempf1_vec = _mm256_cvtepi32_ps(temp1_vec);

            //get sign:
            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 31), offset_vec, 1);
            minus_vec = _mm256_cmpeq_epi8(temp1_vec, minus_test1_vec);
            minus_vec = _mm256_shuffle_epi8(minus_vec, minus_mask1_vec);

            //get exponent:
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_0_0_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec); //possibly a faster intrinsic is available
            
            temp1_vec = _mm256_xor_si256(temp1_vec, minus_vec);  //apply sign
            temp1_vec = _mm256_add_epi32(temp1_vec, _mm256_srli_epi32(minus_vec, 31)); //apply sign
            
            temp1_vec = _mm256_add_epi32(temp1_vec, epi32_125_vec); //add 127, div by 100 (e-2)
            tempf2_vec = _mm256_i32gather_ps(exp_table, temp1_vec, 4);
            
            __m256 A_vec = _mm256_mul_ps(tempf1_vec, tempf2_vec);

            ///////////////////////////////////////////////////////

            //g_air
            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 36), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_1_10_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            tempf1_vec = _mm256_cvtepi32_ps(temp1_vec);
            __m256 ga_vec = _mm256_mul_ps(tempf1_vec, mul_em4_vec);

            //g_self
            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 41), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_1_10_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            tempf1_vec = _mm256_cvtepi32_ps(temp1_vec);
            __m256 gs_vec = _mm256_mul_ps(tempf1_vec, mul_em4_vec);

            ///////////////////////////////////////////////////////

            //E0
            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 45), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            //temp1_vec = _mm256_max_epi8(temp1_vec, _mm256_setzero_si256());
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_1_10_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_10_1000_vec);
            tempf1_vec = _mm256_cvtepi32_ps(temp1_vec);

            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 49), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_0_1_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            tempf2_vec = _mm256_cvtepi32_ps(temp1_vec);

            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 51), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_0_0_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            tempf3_vec = _mm256_cvtepi32_ps(temp1_vec);

            tempf2_vec = _mm256_fmadd_ps(tempf3_vec, mul_em2_vec, tempf2_vec);
            __m256 E0_vec = _mm256_fmadd_ps(tempf2_vec, mul_em2_vec, tempf1_vec);

            ///////////////////////////////////////////////////////

            //na
            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 55), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_0_1_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            tempf1_vec = _mm256_cvtepi32_ps(temp1_vec);
            __m256 na_vec = _mm256_mul_ps(tempf1_vec, mul_em2_vec);

            //da
            //First digit can be a minus sign, in which case we need to catch it
            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 59), offset_vec, 1);
            minus_vec = _mm256_cmpeq_epi8(temp1_vec, minus_test0_vec);
            minus_vec = _mm256_shuffle_epi8(minus_vec, minus_mask0_vec);

            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_max_epi8(temp1_vec, _mm256_setzero_si256());
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_0_1_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            temp1_vec = _mm256_xor_si256(temp1_vec, minus_vec);
            temp1_vec = _mm256_add_epi32(temp1_vec, _mm256_srli_epi32(minus_vec, 31));
            tempf1_vec = _mm256_cvtepi32_ps(temp1_vec);

            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 63), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_1_10_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            temp1_vec = _mm256_xor_si256(temp1_vec, minus_vec);
            temp1_vec = _mm256_add_epi32(temp1_vec, _mm256_srli_epi32(minus_vec, 31));
            tempf2_vec = _mm256_cvtepi32_ps(temp1_vec);

            //powers of 10:  2. 1 0 _  3  2  1  0
            //target:        0.-1-2 _ -3 -4 -5 -6
            tempf1_vec = _mm256_fmadd_ps(tempf2_vec, mul_em4_vec, tempf1_vec);
            __m256 da_vec = _mm256_mul_ps(tempf1_vec, mul_em2_vec);

            //ns
            temp1_vec = _mm256_i32gather_epi32(reinterpret_cast<int*>(buffer + 68), offset_vec, 1);
            temp1_vec = _mm256_sub_epi8(temp1_vec, sub_s0_vec);
            temp1_vec = _mm256_maddubs_epi16(temp1_vec, mul_1_10_0_1_vec);
            temp1_vec = _mm256_madd_epi16(temp1_vec, mul_1_100_vec);
            tempf1_vec = _mm256_cvtepi32_ps(temp1_vec);
            __m256 ns_vec = _mm256_mul_ps(tempf1_vec, mul_em2_vec);

            ///////////////////////////////////////////////////////
            // 9 ~ 0xF
            // 99 ~ 0xFF
            // 999 ~ 0xFFF
            // 9999 ~ 0xFFFF
            // 99999 ~ 0xFFFFF
            // 999999 ~ 0xFFFFF

            // FFFF_FFFF  FFFF_FFFF   FFFF_FFFF  FFFF        
            //    111111  22LL_33RR   22ll_33rr   iii


            
    //0x              1x              2x              3x              4x              5x              6x              7x
    //0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789
    //MMIvvvvv.vvvvvv S.SSE~SSS  A.AAE~AA.gggg.ggggEEEEE.EEEEnnnnddddddddnnnnn11122LL33RR11111122ll33rrPPPCCNNNNpppccnnnnbbbjjjw
    // 21 1774.000000 1.36E-089  4.63E+00.0660.064433580.55680.710.000000 0.56  0 0 0 0 0     0 0 0 0 0 50 11065 47 1 901  P 79e
    // 21 1774.000000 3.82E-086  5.13E+00.0625.062931999.61920.680.000000 0.58  0 0 0 0 0     0 0 0 0 0 47 1 617 44 1 511  P 92e
    // 21 1774.000000 1.67E-093  3.45E+00.0564.064435442.11380.630.000000 0.64  0 0 0 0 0     0 0 0 0 0 50 1 821 47 1 674  P111e
    // 21 1774.000000 5.82E-084  2.25E+00.0686.082930660.11990.700.000000 0.64  0 0 0 0 0     0 0 0 0 0 48 2 797 45 2 715  R 46f
    // 21 1774.000000 1.82E-092  6.15E+00.0685.072134952.41320.720.000000 0.58  0 0 0 0 0     0 0 0 0 0 52 11795 49 11604  R 61e
    // 21 1774.000000 1.47E-076  1.22E+00.0666.065227127.35770.710.000000 0.56  0 0 0 0 0     0 0 0 0 0 40 1 539 37 1 497  R 75e
    // 21 1774.000000 4.98E-084  8.65E+00.0653.063931084.69870.700.000000 0.56  0 0 0 0 0     0 0 0 0 0 47 2 455 44 2 411  R 81f
    // 21 1774.000000 4.26E-090  1.98E+00.0613.062933687.54260.670.000000 0.59  0 0 0 0 0     0 0 0 0 0 47 21217 44 21096  R 95f



            auto end = sc.now();
            auto time_span = static_cast<std::chrono::duration<double>>(end - start);
            total_time += time_span.count();


            os_v.write(reinterpret_cast<char*>(&v_vec), 8 * sizeof(float));
            os_A.write(reinterpret_cast<char*>(&A_vec), 8 * sizeof(float));
            os_ga.write(reinterpret_cast<char*>(&ga_vec), 8 * sizeof(float));
            os_gs.write(reinterpret_cast<char*>(&gs_vec), 8 * sizeof(float));
            os_E0.write(reinterpret_cast<char*>(&E0_vec), 8 * sizeof(float));
            os_na.write(reinterpret_cast<char*>(&na_vec), 8 * sizeof(float));
            os_da.write(reinterpret_cast<char*>(&da_vec), 8 * sizeof(float));
            os_ns.write(reinterpret_cast<char*>(&ns_vec), 8 * sizeof(float));
            //os_v.write((const char*)v_vec, 8 * sizeof(float));
            //std::cout << "\n";


        }


        auto end_file = sc.now();
        auto time_span = static_cast<std::chrono::duration<double>>(end_file - start_file);
        std::cout << length*1e-6/time_span.count() <<" MB/s \n";



        //if (is)
        //    std::cout << "all characters read successfully.";
        //else
        //    std::cout << "error: only " << is.gcount() << " could be read";
        //is.close();

       
        // ...buffer contains the entire file...

        //delete[] buffer;
    }
    std::cout << "\n\ntime spent conversions: " << total_time;
    
}

Here are some download scripts:
get_hitemp_files.zip

I will try to tidy up a little in a bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
databases related to line databases enhancement New feature or request good first issue Good for newcomers performance
Projects
None yet
Development

No branches or pull requests

6 participants