
我们的客户代表的应用程序堆栈位于Microsoft(Azure)的云中,该客户的代表面临一个问题:最近,来自欧洲的一些客户的部分请求开始以错误400(
错误请求 )结束。 所有应用程序均以.NET编写,并部署在Kubernetes中。
API就是一个应用程序,最终所有流量都通过该API到达。 由.NET客户端配置并托管在pod中的
Kestrel HTTP服务器侦听此流量。 通过调试,我们感到幸运的是,有一个特定的用户存在稳定的重现问题。 但是,交通链使一切变得复杂:

Ingress中的错误如下所示:
{ "number_fields":{ "status":400, "request_time":0.001, "bytes_sent":465, "upstream_response_time":0, "upstream_retries":0, "bytes_received":2328 }, "stream":"stdout", "string_fields":{ "ingress":"app", "protocol":"HTTP/1.1", "request_id":"f9ab8540407208a119463975afda90bc", "path":"/api/sign-in", "nginx_upstream_status":"400", "service":"app", "namespace":"production", "location":"/front", "scheme":"https", "method":"POST", "nginx_upstream_response_time":"0.000", "nginx_upstream_bytes_received":"120", "vhost":"api.app.example.com", "host":"api.app.example.com", "user":"", "address":"83.41.81.250", "nginx_upstream_addr":"10.240.0.110:80", "referrer":"https://api.app.example.com/auth/login?long_encrypted_header", "service_port":"http", "user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36", "time":"2019-03-06T18:29:16+00:00", "content_kind":"cache-headers-not-present", "request_query":"" }, "timestamp":"2019-03-06 18:29:16", "labels":{ "app":"nginx", "pod-template-generation":"6", "controller-revision-hash":"1682636041" }, "namespace":"kube-nginx-ingress", "nsec":6726612, "source":"kubernetes", "host":"k8s-node-55555-0", "pod_name":"nginx-v2hcb", "container_name":"nginx", "boolean_fields":{} }
同时,茶est给:
HTTP/1.1 400 Bad Request Connection: close Date: Wed, 06 Mar 2019 12:34:20 GMT Server: Kestrel Content-Length: 0
即使是最大程度的冗长,Kestrel错误也包含了
很少的有用信息 :
{ "number_fields":{"ThreadId":76}, "stream":"stdout", "string_fields":{ "EventId":"{\"Id\"=>17, \"Name\"=>\"ConnectionBadRequest\"}", "SourceContext":"Microsoft.AspNetCore.Server.Kestrel", "ConnectionId":"0HLL2VJSST5KV", "@mt":"Connection id \"{ConnectionId}\" bad request data: \"{message}\"", "@t":"2019-03-07T13:06:48.1449083Z", "@x":"Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Malformed request: invalid headers.\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection.TryParseRequest(ReadResult result, Boolean& endConnection)\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.<ProcessRequestsAsync>d__185`1.MoveNext()", "message":"Malformed request: invalid headers." }, "timestamp":"2019-03-07 13:06:48", "labels":{ "pod-template-hash":"2368795483", "service":"app" }, "namespace":"production", "nsec":145341848, "source":"kubernetes", "host":"k8s-node-55555-1", "pod_name":"app-67bdcf98d7-mhktx", "container_name":"app", "boolean_fields":{} }
似乎只有tcpdump可以帮助解决此问题...但是我将在流量链上重复一遍:

调查中
显然,最好在Kubernetes部署了Pod的
那个特定节点上侦听流量:转储量将使得可以很快找到至少一些东西。 确实,在考虑它时,注意到了这样一个框架:
GET /back/user HTTP/1.1 Host: api.app.example.com X-Request-ID: 27ceb14972da8c21a8f92904b3eff1e5 X-Real-IP: 83.41.81.250 X-Forwarded-For: 83.41.81.250 X-Forwarded-Host: api.app.example.com X-Forwarded-Port: 443 X-Forwarded-Proto: https X-Original-URI: /front/back/user X-Scheme: https X-Original-Forwarded-For: 83.41.81.250 X-Nginx-Geo-Client-Country: Spain X-Nginx-Geo-Client-City: M.laga Accept-Encoding: gzip CF-IPCountry: ES CF-RAY: 4b345cfd1c4ac691-MAD CF-Visitor: {"scheme":"https"} pragma: no-cache cache-control: no-cache accept: application/json, text/plain, */* origin: https://app.example.com user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36 referer: https://app.example.com/auth/login accept-language: en-US,en;q=0.9,en-GB;q=0.8,pl;q=0.7 cookie: many_encrypted_cookies; .AspNetCore.Identity.Application=something_encrypted; CF-Connecting-IP: 83.41.81.250 True-Client-IP: 83.41.81.250 CDN-Loop: cloudflare HTTP/1.1 400 Bad Request Connection: close Date: Wed, 06 Mar 2019 12:34:20 GMT Server: Kestrel Content-Length: 0
在仔细检查转储后,发现了
M.laga
一词。 很容易猜到在西班牙没有城市M.laga(但是有
Málaga )。 掌握了这个想法之后,我们查看了Ingress配置,其中我们看到一个月前(应客户要求)插入的
“无害”代码段 :
ingress.kubernetes.io/configuration-snippet: | proxy_set_header X-Nginx-Geo-Client-Country $geoip_country_name; proxy_set_header X-Nginx-Geo-Client-City $geoip_city;
当您禁用这些标头的转发时,一切就变好了! (很快就知道,应用程序本身不再需要这些标头。)
现在,让我们
以更一般的方式来看问题。 如果您向
localhost:80
发出telnet请求,则可以在应用程序内部轻松复制它
localhost:80
:
GET /back/user HTTP/1.1 Host: api.app.example.com cache-control: no-cache accept: application/json, text/plain, */* origin: https://app.example.com Cookie: test=Desiree
...
401 Unauthorized
退货,与预期的一样。 如果我们这样做会发生什么:
GET /back/user HTTP/1.1 Host: api.app.example.com cache-control: no-cache accept: application/json, text/plain, */* origin: https://app.example.com Cookie: test=Désirée
?
400 Bad request
返回-在应用程序日志中,我们将得到我们已经知道的错误:
{ "@t":"2019-03-31T12:59:54.3746446Z", "@mt":"Connection id \"{ConnectionId}\" bad request data: \"{message}\"", "@x":"Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Malformed request: invalid headers.\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection.TryParseRequest(ReadResult result, Boolean& endConnection)\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.<ProcessRequestsAsync>d__185`1.MoveNext()", "ConnectionId":"0HLLLR1J974L9", "message":"Malformed request: invalid headers.", "EventId":{ "Id":17, "Name":"ConnectionBadRequest" }, "SourceContext":"Microsoft.AspNetCore.Server.Kestrel", "ThreadId":71 }
总结
具体来说,Kestrel
无法正确处理UTF-8中具有正确字符的HTTP标头,这些标头包含在相当多的城市的名称中。
在我们的案例中,另一个因素是客户端当前不打算更改应用程序中Kestrel的实现。 但是,AspNetCore本身(
编号4318 ,
编号7707 )中的问题表明,这无济于事...
总结:注释不再是关于Kestrel或UTF-8的特定问题(在2019年,?!),而是在寻找问题的过程中
对每个步骤的
关注和一致研究迟早会见效。 祝你好运!
聚苯乙烯
另请参阅我们的博客: