Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add range submodule #180

Closed
wants to merge 2 commits into from
Closed

Add range submodule #180

wants to merge 2 commits into from

Conversation

ririsoft
Copy link
Contributor

@ririsoft ririsoft commented Jun 9, 2020

This introduce a new 'range' module
to support encoding, decoding and validation of HTTP range requests.

This could be a helper to implement http-rs/tide#63 and enhance tide's serve_dir with range requests.

This is also an experiment toward typed headers discussed in #99.

I am using a simple design where the typed header implements TryFrom and ToHeaderValues. This has the advantage to be fully compatible with the current header APIs. However I believe we could push Rust type system a little further.

For instance we could enhance the Headers type with a new method and trait:

trait TypedHeader: ToHeaderValues {
    const HeaderName: &'static str;
}

impl Headers {
    // TODO: find a better name
    pub fn get_typed<T>(&self) -> Option<T>
    where
            T: TypedHeader,
    {
        // TODO: implementation
    }
}

One could then retrieve a typed value directly from the headers :

let range: Option<Range> = headers.get_typed();

I am not sure if it works, but I could add a few commits on this pull request if you want a try.

Thanks in advance for your comments !
Cheers.

@Fishrock123
Copy link
Member

We should probably do it in a separate PR, and I'm not sure how idiomatic this would be, but Headers::get::<TypedHeader>() would seem reasonable to me.

@ririsoft
Copy link
Contributor Author

We should probably do it in a separate PR, and I'm not sure how idiomatic this would be, but Headers::get::<TypedHeader>() would seem reasonable to me.

Indeed this is orthogonal to this PR and deserve a separate discussion.

@ririsoft ririsoft mentioned this pull request Jun 10, 2020
72 tasks
@jbr jbr added the enhancement New feature or request label Jun 19, 2020
@goto-bus-stop goto-bus-stop changed the base branch from master to main August 9, 2020 12:58
@ririsoft
Copy link
Contributor Author

Hello @yoshuawuyts,

In the scope of #99 I see you are progressing well on #222, #224 , #225, #220.
What do you expect from me at this point: You would be happy that I rework this pull request once the above PRs are merged and reuse the same conditional APIs ? Or you plan to implement this yourself ?

Some part of this PR need to be reworked, I identified a few bugs (regarding RFCs), I prefer to wait for your comments before going forward.

Cheers,

@yoshuawuyts
Copy link
Member

@ririsoft it's been a few weeks since we've actually released the patches you linked above, and we're definitely on track in that direction. Apologies that it took a while to figure out how we wanted the headers structured. If you're still interested in adding typed headers for Range that'd be a very welcome contribution! It'd be so good to have http-rs/tide#687 too!

@ririsoft
Copy link
Contributor Author

@ririsoft it's been a few weeks since we've actually released the patches you linked above, and we're definitely on track in that direction. Apologies that it took a while to figure out how we wanted the headers structured. If you're still interested in adding typed headers for Range that'd be a very welcome contribution! It'd be so good to have http-rs/tide#687 too!

Sure, I followed the discussions and will rewrite this PR with latest design. I am just a bit busy at work those days, I plan to come back on this next week.
Cheers.

@ririsoft
Copy link
Contributor Author

I have resumed the work here: https://github.com/ririsoft/http-types/tree/headers-range-wip
and will squash/rebase/merge back to this PR once ready for review (a couple of days).

@ririsoft
Copy link
Contributor Author

Sorry guys, I worked less on this than initially expected but I am progressing and almost there.

@ririsoft
Copy link
Contributor Author

Hi @yoshuawuyts,

The work is ready for review on my side.
You might want to flag this as unstable so that we can experiment the API with Tide before freezing it.

I am very open to any suggestions of course. Also the RFC specification is not straightforward, you might want to have a 4 eyes check on my implementation.

Cheers.

const NONE: &'static str = "none";

/// Create a new AcceptRange which does not accept range requests.
pub fn new() -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new setting the value to "none" might be confusing here since the widely used default is "bytes". Maybe we should name it new_none or a better idea ?

}

/// Returns the supported unit if any.
pub fn other(&self) -> &Option<String> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of returning &Option<String>>. Maybe Option<&str> is better ?

Comment on lines 158 to 115
let s = match self.other {
Some(ref other) => other.as_str(),
None if self.bytes => Self::BYTES,
None => Self::NONE,
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be factorized with the Display implementation. Let me known if you want that.


/// Create a ByteRanges from a string.
pub(crate) fn from_str(s: &str) -> crate::Result<Self> {
let fn_err = || {
Copy link
Contributor Author

@ririsoft ririsoft Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no error code specified for invalid Content-Range header in a response. I chose to use the same error code as the one used for bad range requests. Maybe I should add this as a comment ?

self.ranges.get(0).copied()
}

/// Validates that the ranges are withing the expected document size.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: within

@yoshuawuyts yoshuawuyts changed the title New range request module Add range submodule Nov 25, 2020
Copy link
Member

@yoshuawuyts yoshuawuyts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey ririsoft, thanks for updating this PR. I spent some time reviewing it today, and I have some thoughts.

Bytes Prefix

When reviewing this PR I was looking for the Range header, but instead I found ByteRange and ByteRanges. It wasn't immediately clear to me which is which, and for consistency with the other typed headers I would've expected the header name to exist without the Byte prefix.

This has some implications for the API, but I'd like to find a way for us to drop the Byte prefix from the header names.

Assume the RangeUnit is always bytes?

The RFC states that RangeUnits can be expected to be expanded in the future through RFC. Though in the past 6 years this has not happened, which makes it questionable to which degree we need to consider this. We could make the assumption that the only two types that may ever be sent are "none" and bytes.

What if we assume the units may be extensible?

If we don't create specific structs for BytesRange, etc. then this route means points us more directly towards enums. We could define a RangeDirective enum which is #[non_exhaustive] and contains None and Bytes members. That is the type that's communicated in the Range headers.

Additionally we can add a From<(u64, u64)> impl to construct a RangeDirective instance, and make the bound in Range::push take impl Into<RangeDirective>.

Extensibility of range units

However this leads to an issue: currently the range set of headers is hard-coded to only operate on bytes. Hence the prefix. I guess the intended path to add support for more range units is to add additional FooRange and BarRange headers.

Instead I would like to suggest switching it around: create a RangeUnits enum which is tagged #[non_exhaustive] and None and Bytes members, and then have Range , ContentRange, and AcceptRanges headers. This will allow us to add more range types in the future without breaking anything.

AcceptRanges constructor

It seems AcceptRanges is working around not having a RangeUnits enum. I think AcceptRanges::new should take a RangeUnits enum, and we can keep a AcceptRanges::new_bytes as a shorthand.

suffix-length

I was reading about ranges on mdn and I'm confused about how the suffixes are exposed in our API? Do you have a good sense of how they work, and how we could implement them?

Conclusion

Despite having feedback on some of the design, I think we're very well on our way to getting where we need to be! — I'm incredibly excited for this work, and how it'll enable us to implement http-rs/tide#687. Thanks so much for your work on this!

@ririsoft
Copy link
Contributor Author

Hi @yoshuawuyts thank you very much for your review. I am pretty much in line globally.

Assume the RangeUnit is always bytes?

The RFC states that RangeUnits can be expected to be expanded in the future through RFC. Though in the past 6 years this has not happened, which makes it questionable to which degree we need to consider this. We could make the assumption that the only two types that may ever be sent are "none" and bytes.

I initially worked on a version with a RangeUnit enum, like the one you mentioned, opening the door to other range units if the RFC evolves in the future. But I realized that this is a lot of boilerplate for every day use since only bytes unite are used today in practice. So I went for a compromise with this Byte prefix thing which is not very much satisfying at the end, I agree.

Instead I would like to suggest switching it around: create a RangeUnits enum which is tagged #[non_exhaustive] and None and Bytes members, and then have Range , ContentRange, and AcceptRanges headers. This will allow us to add more range types in the future without breaking anything.

I would go for that design too, but I am not sure about having None as an enum variant: None is not a range unit type. It is a possible value for the AcceptRanges header to notify that range requests are not accepted, but it is not used otherwise for Range and ContentRange headers. We could have AcceptRanges::from_headers return Ok(None) if the Accept-Range header has the none value, without any impact client side.

However, by removing None from the enum we would have only one variant which looks strange too.

A third option would be something like:

enum RangeUnit {
    Bytes
    Other(String)
}

This allows for usage of non standard units, but not great either.

If we go for the RangeUnit design we will need to handle it with the Range and ContentRange headers too. Those type do not only returns a range unit but also a range set (ByteRanges in my current implementation) and a single Range. At the end I do not see how we can avoid several if let and match code for the end user using only bytes unit in practice. Do you have something in mind that could help here ?

suffix-length

I was reading about ranges on mdn and I'm confused about how the suffixes are exposed in our API? Do you have a good sense of how they work, and how we could implement them?

You can see them as relative seek from either start or end. So far I do not provide further helper over it, I am just exposing them with an optional value. In Tide we will have to match over them to either SeekFrom::Start or SeekFrom::End. We can extend the type with further helper functions if need be without a breaking change I believe.

Conclusion

Despite having feedback on some of the design, I think we're very well on our way to getting where we need to be! — I'm incredibly excited for this work, and how it'll enable us to implement http-rs/tide#687. Thanks so much for your work on this!

Thank you for your great feedback. This leaves us with the following choices:

  1. We provide implementation for bytes unit only, and we fully assume a breaking change will probably need to be introduced if a new unit is added to the RFC. This is a limited risk at the benefit of simple code covering 100% of usage for the last 6 years.
  2. We go for a RangeUnit enum #[non_exhaustive], at the benefit of allowing a source compatible introduction of a new unit, but at the risk to "complexify" the API for every day use.

Not an easy decision. A little daemon on my left shoulder says "come on, take the challenge and go with those damn enums, you are a rustacean or not ?", a little angel on my right shoulder says "come on, you know this is over-engineering, right ?"

Do you want me to have a try with the RangeUnit enum design to see how it goes, or you want to listen to the little angel ?
In the latter case I will rename ByteRanges to Range and ByteRange to RangeItem (or any better name you might have), and simplify the API to handle bytes only.

What do you think ?

After this we need to talk about this comment which could impact the design of the If-Range header. But one discussion at a time :)

@yoshuawuyts
Copy link
Member

Assume the RangeUnit is always bytes?

However, by removing None from the enum we would have only one variant which looks strange too.

I think that's fine though, especially if we mark the enum as #[non_exhaustive] and maybe add a comment this may be extended in the future.

At the end I do not see how we can avoid several if let and match code for the end user using only bytes unit in practice. Do you have something in mind that could help here ?

Not necessarily; I think at this layer it's more important that the implementation is correct and understandable, than necessarily providing the best end-user ergonomics. Then in Tide we can use the implementation we've defined here to truly make it ergonomic.

suffix-length

We can extend the type with further helper functions if need be without a breaking change I believe.

Fantastic! Happy to punt this topic for now then (:

Conclusion

Do you want me to have a try with the RangeUnit enum design to see how it goes?

I think this is indeed the route I'd like to see explored. Thanks so much for taking the time to work through this; I feel I say this everytime it comes up, but I'm incredibly excited for this!

This introduce a new 'range' module
to support encoding, decoding and validation of HTTP range requests.
@ririsoft
Copy link
Contributor Author

ririsoft commented Dec 7, 2020

I think this is indeed the route I'd like to see explored.

Here we are ! The new design is enum based but different than what we discussed above. I would like to apologies for that and hope you won't be too disappointed.

I came to this design because each header exposes different types that I could not unify across one single Unit enum.
In this design each header is an enum, all with a Bytes variant but they all differ in the inner types exposed. All bytes related implementation belongs to a dedicated bytes module.

There is a few points that I was not sure about, such having new constructors taking a impl Into<...> for each header.

Do not hesitate to challenge my proposal I am fully open for suggestions.

Comment on lines +69 to +72
let fn_err = || {
Error::from_str(
StatusCode::BadRequest,
"Invalid Range header for byte ranges",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my previous commits I add several error cases to handle, let closure here is no more justified I should move this block to the match block below

Comment on lines +69 to +72
}
}

impl FromStr for BytesRange {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should just provide a pub(crate) fn from_str implementation and not do it as impl FromStr to remain consistant with the rest of the code.

Comment on lines +188 to +208
/// # fn main() -> http_types::Result<()> {
/// #
/// use http_types::range::{BytesRange, BytesRangeSet, Range};
/// use http_types::{Method, Request, StatusCode, Url};
/// use std::convert::TryInto;
///
/// let mut range_set = BytesRangeSet::new();
/// range_set.push(0, 500);
/// let range = Range::Bytes(range_set);
///
/// let mut req = Request::new(Method::Get, Url::parse("http://example.com").unwrap());
/// range.apply(&mut req);
///
/// let range = Range::from_headers(req)?.unwrap();
///
/// if let Range::Bytes(range_set) = range {
/// let err = range_set.match_size(350).unwrap_err();
/// assert_eq!(err.status(), StatusCode::RequestedRangeNotSatisfiable);
/// }
/// #
/// # Ok(()) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make this example shorter without using a Request

@yoshuawuyts
Copy link
Member

@ririsoft In #180 (review) I proposed to go for RangeUnits + RangeDirective types so that one can hold the unit definition, and the other can work as the direct inputs to the headers.

In #180 (comment) it sounds like you attempted to unify these into a single type, which then didn't work out -- am I reading that correctly?

@ririsoft
Copy link
Contributor Author

@ririsoft In #180 (review) I proposed to go for RangeUnits + RangeDirective types so that one can hold the unit definition, and the other can work as the direct inputs to the headers.

In #180 (comment) it sounds like you attempted to unify these into a single type, which then didn't work out -- am I reading that correctly?

Hello @yoshuawuyts ,
I do not understand. Were you able to review my latest updates with enum ?
Does it matches your expectations ? Otherwise would you be able to provide some code snippets for one of the header type please. I really don't see where you want to go, I am really sorry about that.

@yoshuawuyts
Copy link
Member

@ririsoft that's all good! -- I think what I may do is fork your PR, and then file a patch against your PR. I think perhaps at this point that may be the easiest way to communicate what I mean 😅

@ririsoft
Copy link
Contributor Author

@ririsoft that's all good! -- I think what I may do is fork your PR, and then file a patch against your PR. I think perhaps at this point that may be the easiest way to communicate what I mean 😅

Yeah we are reaching the limit of asynchronous talk here I am afraid :). A small patch says a lot more sometimes 👍

Also I had a look at what they did on hyperium: https://docs.rs/headers/0.3.3/headers/struct.Range.html.
Interestingly they are using the RangeBounds type from std. It may look like a good idea but I am not sure how you can represent a "negative bound" such as "the last five bytes" that you can have with http range requests such -5. To be further discussed with your patch ...

@ririsoft
Copy link
Contributor Author

I am not using http-rs anymore and I am thus not interested in working on this PR further.

@ririsoft ririsoft closed this May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants