Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mappings and bulk to quickstart page #2417

Merged

Conversation

iuliaferoli
Copy link
Contributor

Added a few more examples including dealing with mappings and indexing multiple documents at once (in two alternative ways).
These changes have also been added to the 00 notebook in search-labs, that is also linked to under interactive examples.

Response to issue #2139 requesting changes to documentation

Copy link

A documentation preview will be available soon.

Help us out by validating the Buildkite preview and reporting issues here.
Please also be sure to double check all images to ensure they are correct in the preview.

Request a new doc build by commenting
  • Rebuild this PR: run docs-build
  • Rebuild this PR and all Elastic docs: run docs-build rebuild

run docs-build is much faster than run docs-build rebuild. A rebuild should only be needed in rare situations.

If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

docs/sphinx/quickstart.rst Outdated Show resolved Hide resolved
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks! Only a few more details to iron out.

Did you know that you could see the result of your changes here? https://elasticsearch-py--2417.org.readthedocs.build/en/2417/quickstart.html

Comment on lines 64 to 79
mappings = {
"properties": {
"foo": {
"type" : "text"
},
"bar" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please format the code using https://github.com/psf/black to fix the indentation? You may need a comma after 256 to get the results you want.

@@ -57,7 +85,7 @@ This is how you create the `my_index` index:

.. code-block:: python

client.indices.create(index="my_index")
client.indices.create(index="my_index", mappings = mappings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally all code snippets should be directly runnable, what do you think about defining mappings in the same code block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quickstart, I think there should be only one way to create an index, what do you think?

I would then remove "Creating an index" and rename "Create a mapping for your index" to "Create an index with mappings"


def generate_docs(documents, index_name):
for i, document in enumerate(documents):
yield dict(_index=index_name, _id=f"{i}", _source=document)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to yield {"_index": index_name, "_id": f"{i}", "_source": document}. Is there a specific reason to prefer using dict?

for i, document in enumerate(documents):
yield dict(_index=index_name, _id=f"{i}", _source=document)

helpers.bulk(client, generate_docs(books, index_name))
Copy link
Member

@pquentin pquentin Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the mapping, this code is not runnable. Maybe generate_docs should be defined like that?

def generate_docs():
    for i in range(10):
        yield {
            "_index": "my_index",
            "foo": f"foo {i}",
            "bar": "bar",
        }

helpers.bulk(client, generate_docs())

The advantages of this version:

  • It can be copy/pasted directly
  • It reuses the index created above
  • It's easy to adapt to generate much more documents
  • It does not specify a doc id, which is better for performance


helpers.bulk(client, generate_docs(books, index_name))

These helpers are the recommended simple and streamlined way to abstract otherwise complicated and verbose functions such as `client.bulk`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike Markdown, you need two backticks in reStructuredText to format as code: single backticks only add emphasis. What do you think of this wording?

Suggested change
These helpers are the recommended simple and streamlined way to abstract otherwise complicated and verbose functions such as `client.bulk`.
These helpers are the recommended way to perform bulk ingestion. While it is also possible to perform bulk ingestion using ``client.bulk`` directly, the helpers handle retries, ingesting chunk by chunk and more. See the :ref:`helpers` page for more details.

I've considered linking to the client.bulk() docs using :meth:~elasticsearch.Elasticsearch.bulk` but I did not find it clearer as it's only rendered as "bulk()" with a link to the docs.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Only two things left.

@@ -57,7 +85,7 @@ This is how you create the `my_index` index:

.. code-block:: python

client.indices.create(index="my_index")
client.indices.create(index="my_index", mappings = mappings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quickstart, I think there should be only one way to create an index, what do you think?

I would then remove "Creating an index" and rename "Create a mapping for your index" to "Create an index with mappings"

}
}

client.indices.create(index="my_index", mappings = mappings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
client.indices.create(index="my_index", mappings = mappings)
client.indices.create(index="my_index", mappings=mappings)

 removing the double index instructoins
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! LGTM.

@pquentin pquentin changed the title Update quickstart.rst Add mappings and bulk to quickstart page Feb 9, 2024
@pquentin pquentin merged commit a844537 into elastic:main Feb 9, 2024
12 checks passed
github-actions bot pushed a commit that referenced this pull request Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants